0% found this document useful (0 votes)
14 views30 pages

NLP Tools and Applications

The document provides details about the course NLP Tools and Applications including course objectives, outcomes, prerequisites, mode of delivery and assessment methods. The course aims to explore fundamental NLP concepts, learn data preprocessing using NLTK, understand syntactic and semantic analysis, text classification and topic modeling, sentiment analysis and machine translation using Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views30 pages

NLP Tools and Applications

The document provides details about the course NLP Tools and Applications including course objectives, outcomes, prerequisites, mode of delivery and assessment methods. The course aims to explore fundamental NLP concepts, learn data preprocessing using NLTK, understand syntactic and semantic analysis, text classification and topic modeling, sentiment analysis and machine translation using Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

B.E.

Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

COURSE DETAILS – NLP TOOLS AND APPLICATIONS

Subject Code : 19CSH03

Subject Name : NLP Tools and Applications

Semester : B.E. (CSE, BME, ECE, EEE, IT, MECH) V Semester

Prepared By : Mrs.S.Santhi

Reviewed By : Mr.N.Umakanth

Approved By : Dr.J.Raja Sekar

Effective Date : 03-07-2023

Revision No. : 0.3

Course Objectives:

1. To explore the fundamental concepts of Natural Language Processing


2. To learn the different data pre-processing steps in lexical analysis
3. To understand the working of syntactic and semantic analysis using NLTK
4. To familiar with text classification and topic modeling methods
5. To work with sentiment analysis and machine translation using python

Course Outcomes:

1. Familiarize with concept of Natural Language Processing


2. Pre-process the data from the collected dataset using NLTK
3. Extract the features and do the syntactic and semantic analysis using NLTK
4. Classify the text using text classification algorithm and find the recent topic using LSA and LDA
5. Find the different emotions and sentiment using sentiment analysis and translate from one
natural language to other using machine translation

Course Prerequisite:

Python Programming

Mode of Delivery:

1. Oral presentation
2. Tutorial
3. Hands on/Demonstration

Assessment Methods:

1. Internal Test
2. Assignment
Mepco Schlenk Engineering College (Autonomous), Sivakasi

Course Outcomes–Programme Outcomes mapping


(3- Substantially, 2-Moderately, 1-Slightly)
For COMPUTER SCIENCE AND ENGINEERING:

Highest Assessment PO PSO


CO Weightage
Cognitive Mode of Delivery Components
No. for AC 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
Level (AC)

1. A Oral presentation Internal Test 1.0 2 2

Oral presentation, Internal Test 0.7


2. A Hands-on / 2 3 3 2 2 2 2
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


3. A Hands-on / 2 3 2 3 2 2 2
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


4. A Hands-on / 2 3 2 3 1 2 2 3 2
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


5. A Hands-on / 1 2 3 3 2 2 2
Demonstration Assignment 0.3

For BIO MEDICAL ENGINEERING:

Highest Assessment PO PSO


CO Weightage
Cognitive Mode of Delivery Components
No. for AC 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
Level (AC)

1. A Oral presentation Internal Test 1.0 2 2

Oral presentation, Internal Test 0.7


2. A Hands-on / 2 3 3 2 2 1
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


3. A Hands-on / 2 3 2 3 2 1
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


4. A Hands-on / 2 3 2 3 1 2 1
Demonstration Assignment 0.3

Oral presentation,
5. A Hands-on / Internal Test 0.7 1 2 3 3 1
Demonstration

For ELECTRONICS AND COMMUNICATION ENGINEERING:

Highest Assessment PO PSO


CO Weightage
Cognitive Mode of Delivery Components
No. for AC 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
Level (AC)

1. A Oral presentation Internal Test 1.0 2 2

2. A Oral presentation, Internal Test 0.7 2 3 3 2 2 1


Hands-on /
Assignment 0.3
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

Demonstration
Oral presentation, Internal Test 0.7
3. A Hands-on / 2 3 2 3 2 1
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


4. A Hands-on / 2 3 2 3 1 2 2
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


5. A Hands-on / 1 2 3 3 2
Demonstration Assignment 0.3

For ELECTRICAL AND ELECTRONICS ENGINEERING:

Highest Assessment PO PSO


CO Weightage
Cognitive Mode of Delivery Components
No. for AC 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
Level (AC)

1. A Oral presentation Internal Test 1.0 2 2

Oral presentation, Internal Test 0.7


2. A Hands-on / 2 3 3 2 2
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


3. A Hands-on / 2 3 2 3 2
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


4. A Hands-on / 2 3 2 3 1 2
Demonstration Assignment 0.3

Oral presentation,
5. A Hands-on / Internal Test 0.7 1 2 3 3
Demonstration

For INFORMATION TECHNOLOGY:

Highest Assessment PO PSO


CO Weightage
Cognitive Mode of Delivery Components
No. for AC 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
Level (AC)

1. A Oral presentation Internal Test 1.0 2 2

Oral presentation, Internal Test 0.7


2. A Hands-on / 2 3 3 2 2 2
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


3. A Hands-on / 2 3 2 3 2 2
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


4. A Hands-on / 2 3 2 3 1 2 2 2
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


5. A Hands-on / 1 2 3 3 2 2
Demonstration Assignment 0.3
Mepco Schlenk Engineering College (Autonomous), Sivakasi

For MECHANICAL ENGINEERING:

Highest Assessment PO PSO


CO Weightage
Cognitive Mode of Delivery Components
No. for AC 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
Level (AC)

1. A Oral presentation Internal Test 1.0 2 2

Oral presentation, Internal Test 0.7


2. A Hands-on / 2 3 3 2 2 1
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


3. A Hands-on / 2 3 2 3 2 1
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


4. A Hands-on / 2 3 2 3 1 2 1
Demonstration Assignment 0.3

Oral presentation, Internal Test 0.7


5. A Hands-on / 1 2 3 3 1
Demonstration Assignment 0.3

Concept Map
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

19C
SH03 – SYLLABUS (As per MEPCO Autonomous Syllabus)

L T P C
19CSH03 NLP Tools and Applications
2 2 0 3
UNIT-I INTRODUCTION TO NLP 9

Natural language processing – History of NLP – Early NLP systems – Phases of natural language
processing – Evaluation of NLP systems - Origins and challenges of NLP – Basic English concepts –
Language and Grammar - Processing Indian Languages.

UNIT-II LEXICAL ANALYSIS USING NLTK 9

Introduction and installation of NLTK – Data Pre-processing: Tokenization – Part of Speech (PoS) Tagging -
Word Frequency Counting – Stop Words Removal – Text Normalization – Spelling Correction - Stemming –
Lemmatization – Named Entity Recognition.

UNIT-III SYNTACTIC AND SEMANTIC ANALYSIS USING NLTK 9

Feature Extraction: Building Bag of Words (BoW) Model – Building TF-IDF Model – Word Embeddings using
word2vec - Sentence Boundary Detection – Parsing - Lexical Resources: WordNet – FrameNet - Word
Synonyms and Antonyms using NLTK – Word Negation Tracking - Word Sense Disambiguation

UNIT-IV TEXT CLASSIFICATION AND TOPIC MODELING 9

Introduction to Text Classification – Machine Learning Overview – Classification Metrics – Confusion Matrix
– Developing a Text Classifier – Saving and Loading Models - Introduction to Topic Modelling – Topic
Discovery – Topic Modelling Algorithms: Latent Semantic Analysis – Latent Dirichlet Algorithms.

UNIT-V SENTIMENT ANALYSIS AND MACHINE TRANSLATION 9

Introduction to Sentiment Analysis – Need and Growth of Sentiment Analysis – TextBlob – Understanding
Data for Sentiment Analysis – Training Sentiment Models – Introduction to Machine Translation - Problems
in Machine Translation - Machine Translation Approaches - Translation involving Indian Languages using
Python

TOTAL: 45 PERIODS
Mepco Schlenk Engineering College (Autonomous), Sivakasi

TEXTBOOK:

1. Ela Kumar, “Natural Language Processing”, I.K International, New Delhi 2011.
2. Sohom Ghosh, Dwight Gunning, “Natural Language Processing Fundamentals”, Packt Publishing
Limited, 2019.
3. Steven Bird, Ewan Klein and Edward Loper, “Natural Language Processing with Python”, O'Reilly
Media, 1st Edition, 2009.
REFERENCE BOOKS:

1. Tanveer Siddiqui, U.S. Tiwary, “Natural Language Processing and Information Retrieval”, Oxford
University Press, 2008.

2. Daniel Jurafsky and James H Martin, “Speech and Language Processing: An introduction to Natural
Language Processing, Computational Linguistics and Speech Recognition”, Prentice Hall, 2 nd Edition
2008.

3. Christopher D. Manning and Hinrich Schutze, “Foundations of Statistical Natural Language


Processing”, MIT Press, 2003.

WEB REFERENCES:

1. http://nptel.ac.in/courses/106101007/
2. http://www.tutorialspoint.com/artificial_intelligence/
artificial_intelligence_natural_language_processing.htm
3. http://nlp.stanford.edu/
4. http://ocw.mit.edu/courses/electrical-engineering-and-computer-science
5. https://www.udemy.com/course/nlp-natural-language-processing-with-python/
6. https://www.packtpub.com/in/big-data-and-business-intelligence/natural-language-processing-
fundamentals

Cognitive Level for Test Questions:

Cognitive Level Test 1 Test 2 Test 3


Remember Up to10% Up to10% Up to10%
Understand Up to 60% Up to 60% Up to 60%
Apply Min 30% Min 30% Min 30%

COURSE SCHEDULE

No. of Mode of Date of


S.No Title Reference CO No.
periods delivery Coverage
UNIT-I : INTRODUCTION TO NLP
1. Natural language processing 1 Ch 1.1 of R1 1 1
2. History of NLP 1 Ch 1.1 of T1 1 1
3. Early NLP systems 1 Ch 1.4 of T1 1 1
Phases of natural language
4. 1 Ch 1.6 of T1 1 1
processing
5. Evaluation of NLP systems 1 Ch 1.7 of T1 1 1
6. Origins and challenges of NLP 1 Ch 1.2, 1.4 of R1 1 1
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

No. of Mode of Date of


S.No Title Reference CO No.
periods delivery Coverage
7. Basic English concepts 1 Ch 2 of T1 1 1
8. Language and Grammar 1 Ch 1.3 of R1 1 1
9. Processing Indian Languages 1 Ch 1.6 of R1 1 1
Total 9
UNIT-II : LEXICAL ANALYSIS USING NLTK
Introduction and installation of
10. 1 Ch 1 of T2 1, 3 2
NLTK
Data Pre-processing: Ch 1 of T2
11. 1 1, 3 2
Tokenization
12. Part of Speech (PoS) Tagging 1 Ch 1 of T2 1, 3 2
13. Word Frequency Counting 1 Ch 1 of T2 1, 3 2
Stop Words Removal & Text Ch 1 of T2
14. 1 1, 3 2
Normalization
15. Spelling Correction 1 Ch 1 of T2 1, 3 2
16. Stemming – Lemmatization 1 Ch 1 of T2 1, 3 2
17. Named Entity Recognition 1 Ch 1 of T2 1, 3 2
18. Data pre-processing – Tutorial 2 Ch 1 of T2 2 2
Total 10
UNIT-III : SYNTACTIC AND SEMANTIC ANALYSIS USING NLTK
Feature Extraction: Building Bag
19. 1 Ch 2 of T2 1, 3 3
of Words (BoW) Model
20. Building TF-IDF Model 1 Ch 2 of T2 1, 3 3
Word Embeddings using
21. 1 Ch 2 of T2 1, 3 3
word2vec
22. Sentence Boundary Detection 1 Ch 1 of T2 1, 3 3
23. Parsing 1 Ch 4.4 of R1 1 3
24. Syntactic Analysis – Tutorial 1 Ch 1,2 of T2 2 3
Lexical Resources: WordNet – Ch 12.2,
25. 1 1 3
FrameNet 12.3 of R1
Word Synonyms and Antonyms
26. 1 Ch 1 of T2 1, 3 3
using NLTK
27. Word Negation Tracking 1 Ch 1 of T2 1, 3 3
28. Word Sense Disambiguation 1 Ch 1 of T2 1 3
29. Semantic Analysis – Tutorial 1 Ch 1 of T2 2 3
Total 11
UNIT-IV: TEXT CLASSIFICATION AND TOPIC MODELING
Introduction to Text
30. 1 Ch 3 of T2 1 4
Classification
31. Machine Learning Overview 1 Ch 3 of T2 1 4
Classification Metrics - Confusion
32. 1 Ch 3 of T2 1 4
Matrix
33. Developing a Text Classifier 1 Ch 3 of T2 1, 3 4
34. Saving and Loading Models 1 Ch 3 of T2 1, 3 4
35. Text Classification – Tutorial 1 Ch 3 of T2 2 4
Introduction to Topic Modelling:
36. 1 Ch 5 of T2 1 4
Topic Discovery
Mepco Schlenk Engineering College (Autonomous), Sivakasi

No. of Mode of Date of


S.No Title Reference CO No.
periods delivery Coverage
Topic Modelling Algorithms:
37. 2 Ch 5 of T2 1, 3 4
Latent Semantic Analysis
38. Latent Dirichlet Algorithms 1 Ch 5 of T2 1, 3 4
39. Topic Modeling – Tutorial 1 Ch 5 of T2 2 4
Total 11
UNIT-V : SENTIMENT ANALYSIS AND MACHINE TRANSLATION
Introduction to Sentiment
40. 1 Ch 8 of T2 1 5
Analysis
41. TextBlob 1 Ch 8 of T2 1, 3 5
Understanding Data for
42. 1 Ch 8 of T2 1, 3 5
Sentiment Analysis
43. Training Sentiment Models 1 Ch 8 of T2 1, 3 5
44. Sentiment Analysis – Tutorial 1 Ch 8 of T2 2 5
Introduction to Machine
45. 1 Ch 8.1 of R1 1 5
Translation
46. Problems in Machine Translation 1 Ch 8.2 of R1 1 5
47. Machine Translation Approaches 2 Ch 8.4 of R1 1 5
Translation involving Indian
48. 1 Ch 8.9 R1 1, 3 5
Languages using Python
49. Machine Translation – Tutorial 1 Ch 8 of R1 2 5
Total 11
Total Periods: 52

DELIVERY PLAN FOR THE LEARNING UNITS (LU)

Unit-1 INTRODUCTION AND LANGUAGE MODELING

This Module explains the history of natural language processing and how to use early NLP
systems. It details the phases of natural language processing and syntactic analysis. It gives the
evaluation of NLP system and Basic English concepts. This Module explains the origin and challenges
of natural language processing and how to process Indian languages.
LU-1 Natural Language Processing Period: 1
LU Outcomes Level: U CO Number: 1
1. Use NLP for human computer interaction
2. Concerned with the development of computational models
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.N
Test Questions BTL PI *
o
1. Define NLP. R 1.3.1
2. Give the examples of natural language sentences for complexity analysis. U 4.1.1
3. List the approaches of Natural Language Processing. Explain in detail. U 4.1.2
4. Define knowledge based technique. R 1.3.1
5. Draw the block diagram of an NLP system. U 4.1.1
6. What are the major tasks performed in NLP applications? U 1.3.1
7. Differentiate between the rationalist and empiricist approaches to natural U 4.1.2
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

language processing.
* - Only for CSE students
LU-2 History of NLP Period: 1
LU Outcomes Level: U CO Number: 1
1. Use the different versions of NLP systems
2. Compare the various systems using its features
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.N
Test Questions BTL PI
o
1. What is the different programming languages used in the first era? R 1.3.1
2. What is meant by symbolic and stochastic? R 1.3.1
3. Write short notes on third era. U 1.3.1
4. Which NLP version is used to generate the web applications? U 4.1.1
5. What are the tasks performed in language understanding problems? U 4.1.2

LU-3 Early NLP systems Period: 1


LU Outcomes Level: U CO Number: 1
1. Develop the isolated NLP systems
2. Compare the various NLP systems using its features
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.N
Test Questions BTL PI
o
1. Give some examples of interaction with ELIZA. U 1.3.1
2. What are the two actions performed in the ELIZA system? U 4.1.1
3. Which system is supported for question answering systems using AI techniques? U 4.1.1
4. How to perform the dialogue system in SHRDLU system? U 4.1.3
5. Write the features of SHRLDU. U 4.1.1
6. What are the characteristics of practical front ends? U 1.3.1
7. Draw the diagram of general NLP system. U 4.1.2
8. What are the linguistic structures supported for the NLP system? U 1.3.1

LU-4 Phases of natural language processing Period: 1


LU Outcomes Level: U CO Number: 1
1. Analyze the natural language using different phases
2. Compare the word and sentence level features using lexical and syntactic analysis
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.N
Test Questions BTL PI
o
1. What is meant by phonology? How to perform phonological analysis? U 4.1.1
2. Define morphemes. What are the types of morphemes? R 4.1.1
3. What is morphological parsing? R 4.1.1
Mepco Schlenk Engineering College (Autonomous), Sivakasi

4. What are the different parts of transducer? U 4.1.2


5. What is lexicon? R 1.3.1
6. Specify the languages which supported for syntactic analysis. U 4.1.2
7. Write short notes on semantic, pragmatic and discourse analysis. U 4.1.3

LU-5 Evaluation of NLP systems Period: 1


LU Outcomes Level: U CO Number: 1
1. Measure the qualities of the NLP algorithms or systems
2. Apply various methods for evaluating NLP system
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.N
Test Questions BTL PI
o
1. What is language understanding or language generation? R 1.3.1
2. Differentiate intrinsic and extrinsic evaluation system. U 4.1.2
3. Differentiate black box and glass box evaluation. U 4.1.2
4. Which evaluation is used to check the NLP system meets the parameters and U 4.1.1
qualities?
5. Differentiate automatic and manual evaluation. U 4.1.2
6. What is objective and subjective evaluation? U 1.3.1
LU-6 Origins and challenges of NLP Period: 1
LU Outcomes Level: U CO Number: 1
1. Compare Natural Language Generation and Natural Language Understanding
2. Solve the NLP challenges in speech processing
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.N
Test Questions BTL PI
o
1. Difference between NLG and NLU U 1.3.1
2. Define computational linguistics. R 1.3.1
3. Compare knowledge driven and data driven categories in computational models. U 1.3.1
4. How to use information retrieval in NLP? U 4.1.3
5. Write the problems and difficulties in natural language processing. U 4.1.2
6. What is meant by quantifier scoping problem? U 4.1.1
7. What is ambiguity? Explain different types of ambiguity. U 4.1.1
8. What makes natural language processing difficult? U 4.1.3

LU-7 Basic English concepts Period: 1


LU Outcomes Level: A CO Number: 1
1. Analyze the language using basic concepts of English
2. Identify different parts of speech using English concepts
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.N
Test Questions BTL PI
o
1. Write short notes on fundamental terminology of English grammar. U 1.3.1
2. What is sentence? What are the types of sentence? U 1.3.1
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

3. Consider the following sentences and identify the parts of the sentence: A 4.2.1
1. The black and white dog was barking fiercely at the stranger.
2. The cobra saw the dog coming closer and raised itself into striking
position.
3. Annie is an English teacher.
4. Write short notes on structure of Dictionary. U 4.1.1
5. Explain in detail about wordnet. U 4.1.2
6. Specify the dictionary features in English language and explain each feature. U 4.1.2

LU-8 Language and Grammar Period: 1


LU Outcomes Level: A CO Number: 1
1. Separate the languages from its content
2. Analyze the natural language using different phases
3. Compare the surface and deep structures of sentence.
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.N
Test Questions BTL PI
o
1. What is meant by phonology? How to perform phonological analysis? U 4.1.1
2. Define morphemes. What are the types of morphemes? U 4.1.1
3. Explain in detail different analysis of natural language processing. U 4.1.2
4. Specify the languages which supported for syntactic analysis. U 4.1.2
5. What are types of grammar in NLP? U 4.1.1
6. Differentiate surface structure and deep structure in syntactic structure. U 4.1.3
7. Explain in detail components of transformational grammars. U 4.1.3
8. Construct the parse structure and passive transformations for the sentence “The A 4.2.1
police will catch the snatcher”.
9. List the motivation behind the development of computational models of U 4.1.2
languages.
10. What is the role of transformational rules in transformational grammar? Explain U 4.1.2
with the help of example.

LU-9 Processing Indian Languages Period: 1


LU Outcomes Level: U CO Number: 1
1. Find the difference between English language and other Indian languages
2. Apply different formal grammars for Indian Languages
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.N
Test Questions BTL PI
o
1. What differences are identified in English language and other languages? U 1.3.1
2. What is Paninian Grammar? R 1.3.1
3. Explain in detail about applications of natural language processing. U 4.1.3
4. What is machine translation? R 1.3.1
5. Differentiate information retrieval and information extraction. U 4.1.2
6. Write short notes on question answering system. U 4.1.1
Mepco Schlenk Engineering College (Autonomous), Sivakasi

Unit-2: LEXICAL ANALYSIS USING NLTK

This Module explains the history of natural language processing and how to use early NLP
systems. It details the phases of natural language processing and syntactic analysis. It gives the
evaluation of NLP system and Basic English concepts. This Module explains the origin and challenges
of natural language processing and how to process Indian languages.

LU-10 Introduction and installation of NLTK Period: 1


LU Outcomes Level: U CO Number: 1
1. Use the NLTK for performing lexical analysis
2. Install the python based tool like nltk
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is nltk? What operations are supported in NLTK? U 1.4.1
2. How to install NLTK in windows? U 1.3.1
3. What is the purpose of pip command in python? U 1.4.1
4. How to install NLTK in MAC/LINUX? U 1.3.1
5. Write the steps for installing the NLTK in Anaconda. U 1.3.1

LU-11 Data Pre-processing: Tokenization Period: 1


LU Outcomes Level: A CO Number: 2
1. Preprocess the data using tokenization
2. Tokenize the sentences into words using various tokenizers
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by pre-processing? U 4.1.2
2. Explain in detail about various pre-processing steps with example. R 4.1.2
3. What is meant by noise removal in text data? U 2.4.2
4. What is tokenization? What common operations are performed in tokenization? U 4.1.2
5. What is normalization and its operations? U 4.1.2
6. What is stop word removal? Give an example. U 4.1.2
7. What is stemming? List the different stemmers with example. U 4.1.2
8. Differentiate stemming and lemmatization. U 2.2.4
9. What is lowercasing? Give an example. U 2.2.3
10. Write the different steps in PoS tagging. U 4.1.2
11. Explain in detail about various types of tokenizers with example. U 5.1.1
12. Tokenize the following sentences using sentence tokenization and word A 5.1.1,
tokenization in python 14.2.2
a. The sky is blue
b. I would like to visit the United States
c. I went to temple and meet Rani
13. Tokenize the following sentences using Treebank and regular expression A 5.1.1,
tokenizer in python 14.2.2
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

a. The sky is blue


b. I would like to visit the United States
c. I went to temple and meet Rani
14. How to import the whitespace tokenizer in nltk? U 4.1.2
15. What are the issues in tokenization? U 4.1.3

LU-12 Part of Speech (PoS) tagging Period: 1


LU Outcomes Level: A CO Number: 2
1. Identify the PoS tags for tokens
2. Evolve the ambiguities in PoS tagging
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by PoS tagging? How it is performed in English language? U 2.2.3
2. Write the Penn Treebank tag sets for English language. U 2.2.3
3. Differentiate closed and open class words with example. U 2.2.4
4. What is meant by ambiguity in PoS tagging? Give an example. U 1.4.1
5. Write the python program with nltk for finding Part of Speech tagging for the A 5.1.1,
following sentences 14.2.2
a. The sky is blue
b. I would like to visit the United States
c. I went to temple and meet Rani
6. How to import PoS tagging in nltk? U
LU-13 Word Frequency Counting Period: 1
LU Outcomes Level: A CO Number: 2
1. Count the number of times each token occurs
2. Count the PoS tags and word phrases using n-grams
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by word frequency counting? U 1.3.1
2. What is meant by frequency distribution? What are the methods used for U
frequency distribution?
1.4.1
3. Write the steps for counting PoS tags. U 2.2.3
4. What is bigrams and trigrams? Give an example. U 2.2.4
5. Find the word frequency for the paragraph using nltk: A 5.1.1,
Sivakasi is a municipal corporation in Virudhunagar District in the Indian state of 13.2.5
Tamil Nadu. This town is known for its firecracker, matchbox and printing
industries. The industries in Sivakasi employ over 2,50,000 people with an
estimated turn over of ?20 billion (US$280 million). Sivakasi was established in
the 15th century during the reign of the Pandya king Harikesari Parakkirama
Pandian. The town was a part of Madurai empire and has been ruled at various
times by the Later Pandyas, Vijayanagara Empire, Madurai Nayaks, Chanda
Sahib, Carnatic kingdom and the British. A major riot during the British Raj took
place in 1899.
6. Count the number of PoS tags for the question no 5. A 5.1.1,
14.3.2

LU-14 Stop Words Removal & Text Normalization Period: 1


LU Outcomes Level: A CO Number: 2
Mepco Schlenk Engineering College (Autonomous), Sivakasi

1. Remove the stop words using stop word removal technique


2. Perform text normalization for the text data
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What are stop words? Give an example. R 2.2.3
2. Why do remove stop words in natural language? U 4.1.2
3. When should remove stop words? U 4.1.3
4. Remove the stop words for the paragraph using nltk: A 5.1.1,
Sivakasi is a municipal corporation in Virudhunagar District in the Indian state of 14.3.2
Tamil Nadu. This town is known for its firecracker, matchbox and printing
industries. The industries in Sivakasi employ over 2,50,000 people with an
estimated turn over of ?20 billion (US$280 million). Sivakasi was established in
the 15th century during the reign of the Pandya king Harikesari Parakkirama
Pandian. The town was a part of Madurai empire and has been ruled at various
times by the Later Pandyas, Vijayanagara Empire, Madurai Nayaks, Chanda
Sahib, Carnatic kingdom and the British. A major riot during the British Raj took
place in 1899.
5. What is meant by text normalization? Give an example. R 1.3.1
6. What are the various ways for text normalization? U 1.4.1
7. Write the python program for text normalization. A 13.2.5
8. What is the purpose of replace() function in text normalization? U 4.1.3

LU-15 Spelling Correction Period: 1


LU Outcomes Level: A CO Number: 2
1. Use the spell checker and auto corrector for spelling correction
2. Identify different types of spelling errors
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What do you mean by spelling correction? R 1.3.1
2. What are the tools supported for spelling correction? U 1.4.1
3. Explain in detail about types of spelling errors with an example. U 2.2.3
4. Write the different forms of spelling errors. U 4.1.2
5. Write the python program for doing spelling correction for the following A 5.1.1,
sentence: 13.2.5
Ntural Luanguage Processin deals with the art of extracting insightes from
Natural Languaes

LU-16 Stemming - Lemmatization Period: 1


LU Outcomes Level: A CO Number: 2
1. Preprocess the natural language text using stemming and lemmatization
2. Compare stemming and lemmatization
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by stemming? Give an example U 1.3.1
2. Differentiate over stemming and under stemming U 2.2.4
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

3. Explain in detail about various stemming algorithms with an example. U 4.1.2


4. Write the python program for various stemming technique and extract the stem A 5.1.1,
words for the following paragraph: 14.3.2
A book is a medium for recording information in the form of writing or images,
typically composed of many pages bound together and protected by a cover.
The technical term for this physical arrangement is codex (plural, codices). In
the history of hand-held physical supports for extended written compositions or
records, the codex replaces its predecessor, the scroll. A single sheet in a codex
is a leaf and each side of a leaf is a page.
5. What is meant by lemmatization? Give an example. U 1.4.1
6. Write the python program for lemmatization and extract the stem words for the A 5.1.1,
question number 4. 14.3.2
7. Compare stemming and lemmatization. U 2.2.4
8. List the applications of stemming and lemmatization. U 5.1.2

LU-17 Named Entity Recognition Period: 1


LU Outcomes Level: A CO Number: 2
1. Identify the named entity for the sentences
2. Perform IOB tagging for named entity recognition
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by named entity recognition? U 1.4.1
2. What is the purpose of nu_chunk in nltk? U 2.2.3
3. Explain in detail about IOB tagging with an example. U 4.3.1
4. How do you resolve the ambiguities in NER? U 4.1.2
5. What are the complex problems in Named Entity Recognition? U 4.2.2
6. List the entity types in Named Entity Recognition. U 2.2.3
7. List the various approaches in Named Entity Recognition. U 4.1.2
8. Explain in detail about challenges of named entity recognition in Indian U
languages.
4.1.3
9. Write the python program for IoB tagged Named Entity Recognition using A 5.1.2,
conlltags2tree function. 13.2.5
10. Identify the NER for the following sentences: A
5.1.2,
a. Joe and John are working at Mepco
b. We are reading a book published by Packt which is based on python. 13.2.5

LU-18 Data pre-processing – Tutorial Period: 1


LU Outcomes Level: A CO Number: 2
Perform various data preprocessing techniques using python
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. Tokenize the following sentences using sentence tokenization and word A 5.1.2,
tokenization in python 13.2.5
a. The sky is blue
b. I would like to visit the United States
c. I went to temple and meet Rani
Mepco Schlenk Engineering College (Autonomous), Sivakasi

2. Write the python program with nltk for finding Part of Speech tagging for the A 5.1.2,
sentences in Question 1. 13.2.5
3. Write a python program to take a paragraph of text as the input and perform A 5.1.2,
the spell correction, text normalization and remove the stop words. 14.3.2
4. Write the python program for various stemming technique and extract the stem A 5.1.2,
words for the input paragraph. 14.3.2

Unit-3: SYNTACTIC AND SEMANTIC ANALYSIS USING NLTK

This module describes the general and specific feature extraction from the text data. It deals
with Bag of Words and IF-IDF model extracting specific features from the text data. It explains word
embeddings and sentence boundary detection. It deals various parsing algorithms like top down and
bottom up parsing. This module also explains lexical resources and performs word synonyms and
antonyms, word negation tracking and word sense disambiguation.

LU-19 Feature Extraction: Building Bag of Words (BoW) Model Period: 1


LU Outcomes Level: A CO Number: 3
1. Extract the general features of the text using NLTK
2. Extract the specific features of the text using Bag of Words
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by feature extraction? U 1.3.1
2. Why we do feature extraction in natural languages? U 1.4.1
3. Explain in detail about categorization of data with an example. U 2.2.3
4. What is the need for cleaning the text data? Write the steps for cleaning the text U 2.3.1
data.
5. Explain in detail about general feature extraction using NLTK. U 2.3.2
6. You have to download csv from the web and do the following activities: A 13.2.5
a. Import pandas, nltk and other necessary libraries
b. Create the dataframe from the downloaded csv file
c. Find the number of occurrences of each Part of Speech. You can see the
PoS that nltk provides by loading it from
help/tagset/upenn_tagset.pickle.
d. Find the amount of punctuation marks.
e. Find the amount of uppercase and lowercase words.
f. Find the number of letters, digits, words and whitespaces.
7. What is meant by Bag of Words (BoW) models? U 2.2.3
8. Create the BoW for the following sentences using sklearn and python: Data A 4.3.4,
Science is an overlap between Arts and Science. Generally, Artd graduates are 13.2.4
right brained and science graduates are left-brained. Excelling in both Arts and
Science at a time becomes difficult. Natural language processing is a part of
Data Science.
9. Explain in detail about zipf’s law with an example. U 4.2.2

LU-20 Building TF-IDF Model Period: 1


LU Outcomes Level: A CO Number: 3
1. Find the TF and IDF value for the natural text
2. Extract the specific features using TF-IDF model
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)


Sl.No Test Questions BTL PI
1. What is term frequency and inverse document frequency? U 2.2.3
2. Create the TF-IDF model for the following sentences using sklearn and python: A 4.2.2,
Data Science is an overlap between Arts and Science. Generally, Artd graduates 13.2.5
are right brained and science graduates are left-brained. Excelling in both Arts
and Science at a time becomes difficult. Natural language processing is a part of
Data Science.
3. You will extract the specific features from the texts present in the dataset. The A 4.2.2,
dataset that will be using here is fetch_20newsgroups, provided by sklearn 13.2.5
library. Follow these steps to implement this activity:
a. Import the necessary packages
b. Fetch the dataset provided by sklearn and store the data in a
Dataframe.
c. Clean the data in the Dataframe.
d. Create a BoW model
e. Create a TF_IDF model
f. Compare both models on the basis of the 20 most frequently occurring
words.

LU-21 Word Embeddings using word2vec Period: 1


LU Outcomes Level: A CO Number: 3
1. Map the words to vectors of real numbers
2. Apply word2vwc method for word embeddings
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by word embeddings? Why it is needed? U 1.3.1
2. Explain in detail about types of word embeddings with an example. U 2.2.3
3. What does a counter vector model do? U 4.2.2
4. Explain in detail about TF-IDF vectorization in word embeddings. U 4.3.1
5. Construct the co-occurrence matrix for the sentence “He is not lazy. He is A 13.2.5
intelligent. He is smart”
6. Discuss in detail about prediction vector-based embeddings with an example. U 4.2.2
7. Explain in detail about CBOW and skip gram methods with an example. U 2.3.1

LU-22 Sentence Boundary Detection Period: 1


LU Outcomes Level: A CO Number: 3
1. Detect one sentence ends and another sentence begins
2. Apply tokenization for sentence boundary detection
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by sentence boundary detection? With an example. U 1.4.1
Mepco Schlenk Engineering College (Autonomous), Sivakasi

2. Which tokenization method will be helpful for doing sentence boundary U 4.1.2
detection?
3. Write a python program for sentence boundary detection and apply the following A 4.3.4
sentence: “we are reading a book. Do you know who is the publisher? It is
Packt. Packt is based on python”

LU-23 Parsing Period: 1


LU Outcomes Level: A CO Number: 3
1. Apply top down and bottom-up parsing for syntactic analysis
2. Solve the ambiguity problem using parsing
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What are the two analysis performed in the parsing? U 2.2.3
2. Write short notes on process of natural language understanding. U 4.1.2
3. Explain in detail about parsing algorithms. U 2.2.3
4. Write the top down parses for the sentence “Ram ate the apple” and construct A 14.2.2
the top down parsing tree for the same.
5. Write the bottom up parses for the sentence “Ram ate the apple” and construct A 14.2.2
the bottom up parsing tree for the same.
6. Compare top down parsing and bottom up parsing. U 2.2.4
7. Develop a parse tree structure for the sentence “I heard the story listening to A 13.2.5
the radio”.
8. How to implement the parser? U 2.3.1
9. What are the different levels of syntax identified for the parsing technique? U 4.2.2
10. Explain in detail about various types of ambiguity. U 4.3.4
11. What is early parser? Write an algorithm for early parsing. U 2.3.1
12. What is CYK parser? Write an algorithm for CYK parsing. U 2.3.1
13. Construct the CYK algorithm and parsing the sentence Ram wrote an essay. A 2.3.1,
13.2.5
14. Discuss the disadvantages of the basic top-down parser with the help of an U 5.3.2
appropriate example.

LU-24 Syntactic Analysis – Tutorial Period: 1


LU Outcome Level: A CO Number: 3
Analyse the text content syntactically using Python
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. Create the BoW for the following sentences using sklearn and python: Data A 4.2.2,
Science is an overlap between Arts and Science. Generally, Arts graduates are 13.2.5
right brained and science graduates are left-brained. Excelling in both Arts and
Science at a time becomes difficult. Natural language processing is a part of
Data Science.
2. Create the TF-IDF model for the following sentences using sklearn and python: A 4.2.2,
Data Science is an overlap between Arts and Science. Generally, Artd graduates 13.2.5
are right brained and science graduates are left-brained. Excelling in both Arts
and Science at a time becomes difficult. Natural language processing is a part of
Data Science.
3. Write an python program for sentence boundary detection and apply the A 4.3.4,
following sentence: “we are reading a book. Do you know who is the publisher?
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

It is Packt. Packt is based on python” 13.2.5


4. Construct the CYK algorithm and parsing the sentence Ram wrote an essay. A 2.3.1,
13.2.5
LU-25 Lexical Resources: WordNet – FrameNet Period: 1
LU Outcomes Level: A CO Number: 3
1. Use the lexical resources such as WordNet and Framenet in NLP
2. Find the different relations in wordnet and framenet
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. Explain in detail about WordNet 2.0 in lexical resources. U 1.3.1
2. Discuss noun, verb, adjective and adverb relations in WordNet. U 1.4.1
3. Build the hypernym chain for the word ‘river’ and give the explanation for each A 2.3.2
sense.
4. Construct the troponym relations for the word ‘laugh’ and give the explanation A 2.3.2
for each sense.
5. Explain in detail about applications of WordNet in natural language processing. U 4.1.3
6. Difference between WordNet and Framenet. U 2.2.4
7. Give the frame elements of communication frame in FramNet. U 2.3.1

LU-26 Word Synonyms and Antonyms using NLTK Period: 1


LU Outcomes Level: A CO Number: 3
1. Find the synonyms and antonyms of the words using NLTK
2. Apply wordnet package for finding synonyms and antonyms of the words
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is wordnet? R 1.3.1
2. What is the purpose of wordnet and synset? U 1.4.1
3. How to install wordnet in nltk? U 2.3.1
4. Write a Python NLTK program to find the sets of synonyms and antonyms of a A 13.2.5
given word
5. How to import wordnet in nltk? U 2.4.2

LU-27 Word Negation Tracking Period: 1


LU Outcomes Level: A CO Number: 3
1. Negate the words in English natural languages
2. Use negation handling in sentiment analysis
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by word negation? U 1.4.1
2. How to handle word negation in sentiment analysis? U 4.1.3
Mepco Schlenk Engineering College (Autonomous), Sivakasi

3. Write an python program for word negation tracking using NLTK A 14.3.2
4. Write an algorithm for finding word negation in sentiment analysis. A 13.2.5

LU-28 Word Sense Disambiguation Period: 1


LU Outcomes Level: A CO Number: 3
1. Apply the different methods for eliminating ambiguity
2. Compare the supervised and unsupervised learning process
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by ambiguity and disambiguate? Give example. U 2.2.4
2. What are the different artificial methods applied for disambiguation? U 4.2.2
3. Compare supervised and unsupervised learning. U 2.2.4
4. Define pseudo words. R 1.3.1
5. What is the use of lower bound and upper bound baseline? U 2.2.4
6. Describe Bayes decision rule. R 1.4.1
7. Explain in detail about Bayesian disambiguation algorithm U 4.1.3
8. Explain in detail about information theoretic approach. U 4.1.3
9. The lower bound of disambiguation accuracy depends on how much information A 4.3.4,
is available. Describe a situation in which the lower bound could be lower than 12.1.2
the performance that results from classifying all occurrences of a word as
instances of its most frequent sense. (Hint: What knowledge is needed to
calculate that lower bound?)
10. Create an artificial training and test set using pseudo words. Evaluate one of the A 5.3.2
supervised algorithms on it.
11. Explain in detail about Disambiguation based on sense definitions. U 4.2.2
12. Explain in detail about thesaurus-based disambiguation. U 4.2.2
13. How to perform disambiguation based on translations in a second-language U 5.1.2
corpus
14. Explain in detail about One sense per discourse, one sense per collocation. U 2.2.4
15. How to find the log likelihood value using EM algorithm in unsupervised U 4.1.3
disambiguation?
16. Download a version of Roget’s Thesaurus from the web (see the website), and A 5.3.2,
implement and evaluate a thesaurus-based algorithm. 13.2.5
17. Discuss the validity of the “one sense per discourse” constraint for different A 5.3.2,
types of ambiguity (types of usages, homonyms etc.). Construct examples 14.3.2
where the constraint is expected to do well and examples where it is expected to
do poorly.
18. Explain in detail about Lesk’s and walker’s algorithm for sense disambiguation. U 4.2.2

LU-29 Semantic Analysis – Tutorial Period: 1


LU Outcome Level: A CO Number: 3
Perform semantic analysis of text data using Python
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. Write a Python NLTK program to find the sets of synonyms and antonyms of a A 13.2.5
given word
2. Write an python program for word negation tracking using NLTK A 13.2.5
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

3. Write an python program for finding word sense disambiguation for the A 13.2.5
ambiguity word using nltk.

Unit-4: TEXT CLASSIFICATION AND TOPIC MODELING

This module introduces the text classification and its algorithms. It deals with machine
learning algorithm for text classification and describes some metrics for text classification. It also
deals developing a text classifier and save & load the models into text classification. This module also
explains topic modeling and its algorithms.

LU-30 Introduction to Text Classification Period: 1


LU Outcomes Level: A CO Number: 4
1. Perform different supervised and unsupervised algorithm for text classification
2. Develop the text classifiers
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by text classification? U 1.4.1
2. What are the different NLP libraries supported for text classification? U 2.2.3
3. Explain in detail about various text classification algorithms with an example. U 4.2.2
4. How do you build the model for text classification? U 5.1.2

LU-31 Machine Learning Overview Period: 1


LU Outcomes Level: A CO Number: 4
1. Apply different machine learning algorithm for text classification
2. Compare supervised and unsupervised algorithms for text classification
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by machine learning? What is the categorization of machine U 2.2.3
learning algorithms?
2. Explain in detail about unsupervised learning algorithm with examples. U 4.2.2
3. Explain in detail about various hierarchical clustering with example. U 4.2.2
4. Consider the exercise, you will create four clusters from text documents of A 13.2.5
sklearn “fetch_20 newsgroups” dataset. You will make use of hierarchical
clustering. Once the clusters are created, you will compare them with their
actual categories. Follow these steps to implement this exercise:
a. Import necessary libraries
b. Download the list of stop words from nltk
c. Specify the categories of news articles
d. Fetch the dataset
e. View the fetched data
f. Check the categories of news articles
g. Store the news data
h. Count the number of occurrences of each category
i. Create a TF-IDF matrix and transform it to a Data frame
j. Find the cosine similarities of TF-IDF
k. Truncate the dendogram for keeping last four clusters
l. Obtain the cluster labels
Mepco Schlenk Engineering College (Autonomous), Sivakasi

5. Consider the question number 4 and do the same operations using K-means A 13.2.5
clustering.
6. What is meant by supervised learning? Explain in detail about Logistic U 4.2.2
regression, Naïve Bayes Classifiers and K-nearest classifiers with an example.
7. In this exercise, you will classify reviews of musical instruments on Amazon with A 2.3.2
the help of logistic regression, Naïve Bayes and KNN and do the following steps:
a. Import necessary libraries
b. Read the data file in JSON format using Pandas
c. Use a Lambda function to extract tokens from each reviewtext
d. Create a data frame from the TF-IDF matrix
e. Create a TF-IDF matrix and transform it into a Dataframe
f. Fit the logistic regression, Gaussian Naïve Bayes and KNN model
g. Compare the results of classification model
8. What is meant by regression? Explain in detail about Linear Regression with an U 4.2.2
example.
9. You will use regression to predict the overall scores of reviews of musical A 14.3.2
instruments on Amazon. Follow these steps to implement this exercise:
a. Import the necessary libraries
b. Read the data file in JSON format using Pandas
c. Use a Lambda function to extract tokens from each reviewtext
d. Create a data frame from the TF-IDF matrix
e. Fit the linear regression model
f. Intercept the linear regression
g. Do the prediction using tf-idf Data frame
10. You take the question number 9 options and do the tree based methods like A 14.3.2
Decision Tree, Random Forest, GBM and XGBoost and write the code in python.

LU-32 Classification Metrics - Confusion Matrix Period: 1


LU Outcomes Level: A CO Number: 4
1. Analyze the metrics for classification algorithms
2. Build the confusion matrix for the given problem
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. How do you evaluate the performance of the classification algorithm? U 5.3.2
2. What is meant by confusion matrix? U 1.3.1
3. Define the term accuracy. U 1.3.1
4. What is meant by precision and recall? Give an example. U 1.3.1
5. What do you mean by F1 score, RoC curve? U 1.3.1
6. What is root mean square error (RMSE)? How do you evaluate the accuracy of U
the regression model using RMSE?
1.3.1
7. What is meant by Maximum Absolute Percentage Error (MAPE)? U 1.3.1
8. Write an python program for calculating RMSE and MAPE using sklearn package. A 13.2.5

LU-33 Developing a Text Classifier Period: 1


LU Outcomes Level: A CO Number: 4
1. Develop text classifier using machine learning algorithms
2. Do the feature extraction in text classification
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

Sl.No Test Questions BTL PI


1. How do you do the feature engineering in text classification? U 4.2.2
2. Write an python program for removing correlated features for sklearn fetch- A 13.2.5
20newsgroups dataset and do the following:
a. Import necessary libraries
b. Remove the stop words
c. Specify the categories of new articles
d. Extract the tokens for the dataset using lambda function
e. Create TF-IDF matrix and transform into Dataframe
f. Calculate the correlation matrix for TF-IDF representation
g. Identify a pair of terms with high correlation
h. Generate the data frame after removing correlated tokens
3. What is meant by dimensionality reduction? Explain in detail about Principal U 2.3.2
Component Analysis (PCA) for dimensionality reduction.
4. Write an python program for calculating PCA for sklearn fetch-20newsgroups A 13.2.5
dataset and do the following:
a. Import necessary libraries
b. Remove the stop words
c. Specify the categories of new articles
d. Extract the tokens for the dataset using lambda function
e. Create TF-IDF matrix and transform into Dataframe
f. Calculate the correlation matrix for TF-IDF representation
g. Use the PCA function for extracting two principal components from the
earlier data
5. How do decide the best model for prediction? U 13.2.4

LU-34 Saving and Loading Models Period: 1


LU Outcomes Level: A CO Number: 4
1. Save the models for future use
2. Load the models for doing prediction
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. How do you load and save the models for prediction? U 14.3.2
2. How do you select the training and test data for doing prediction? U 5.3.2
3. Consider the work, first you will create tf-idf representation of sentences. Then A 13.2.5
you will save this model on disk. Later you will load it from the disk. Follow
these steps and implement this work:
a. Import the necessary package
b. Define the corpus for various sentences
c. Fit a tf-idf model
d. Save the tf-idf on disk
e. You will load this model from the disk to the memory and use it.
f. Save this model on disk using pickle.

LU-35 Text Classification – Tutorial Period: 1


LU Outcome Level: A CO Number: 4
Perform the classification of text data using python
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
Mepco Schlenk Engineering College (Autonomous), Sivakasi

1. In this exercise, you will classify reviews of musical instruments on Amazon with A 14.3.2
the help of logistic regression, Naïve Bayes and KNN and do the following steps:
h. Import necessary libraries
i. Read the data file in JSON format using Pandas
j. Use a Lambda function to extract tokens from each reviewtext
k. Create a data frame from the TF-IDF matrix
l. Create a TF-IDF matrix and transform it into a Dataframe
m. Fit the logistic regression, Gaussian Naïve Bayes and KNN model
n. Compare the results of classification model
2. You will use regression to predict the overall scores of reviews of musical A 14.3.2
instruments on Amazon. Follow these steps to implement this exercise:
h. Import the necessary libraries
i. Read the data file in JSON format using Pandas
j. Use a Lambda function to extract tokens from each reviewtext
k. Create a data frame from the TF-IDF matrix
l. Fit the linear regression model
m. Intercept the linear regression
n. Do the prediction using tf-idf Data frame
3. Write a python program for removing correlated features for sklearn fetch- A 13.2.5
20newsgroups dataset.
4. Write an python program for calculating PCA for sklearn fetch-20newsgroups A 13.2.5
dataset

LU-36 Introduction to Topic Modelling: Topic Discovery Period: 1


LU Outcomes Level: U CO Number: 4
1. Use supervised and unsupervised algorithm for topic modeling
2. Find the set of topics for doing classification
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by topic modeling? U 1.4.1
2. What are the algorithms supported for topic modeling? U 2.2.3
3. Explain in detail about topic discovery with example U 4.2.2
4. What is meant by discovering themes? U 1.3.1
5. Write short notes on exploratory data analysis in topic modeling. U 2.2.3
6. What is document clustering? What are the types of document clustering? U 4.1.2
7. What is dimensionality reduction? U 1.4.1
8. Define the term Bag of Words. U 1.3.1

LU-37 Topic Modelling Algorithms: Latent Semantic Analysis Period: 2


LU Outcomes Level: A CO Number: 4
1. Do the topic modeling using LSA and LDA
2. Identify the recent topics using LSA
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

Sl.No Test Questions BTL PI


1. What is meant by latent semantic indexing? U 1.4.1
2. How LSA is working? U 4.2.2
3. Explain in detail about Singular Value Decomposition and its matrix U 4.3.1
representation.
4. Analyze a dataset of Reuters new articles. You will perform topic modeling using A 5.1.2,
LSI. For this you will make use of the LsiModel class provided by the gensim 13.2.5
library. Follow these steps to implement this activity:
a. Import the necessary libraries
b. To clean the text
c. To read the articles using the package BeautifulSoup
d. Create a BeautifulSoup instance
e. Load all documents into a list
f. Prepare the documents for the model.
g. Create a model that uses LsiModel
h. Load the documents into the model and run the LSA process
i. Separate the word tokens into the number of topics
j. Use the print_topics function to see the model’s information
k. Decide how many topics you wanted
l. Calculate the coherence scores for a range between 20 and 25
m. Plot the coherence scores and see how many topics would be best.

LU-38 Latent Dirichlet Algorithms Period: 1


LU Outcomes Level: A CO Number: 4
1. Do the topic modeling using LDA
2. Identify the recent topics using LSA
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by LDA? How it works? U 4.1.3
2. What is dirichlet distribution? What is the usage for the topic modeling? U 4.1.3
3. You will perform topic modeling on airline tweets in order to discover what topics A 5.1.2,
are being in them. Use the LDA algorithm provided by the gensim library. Follow 13.2.5
these steps to implement this activity:
a. Use Pandas library to read the tweets into a Dataframe
b. Read the Tweets.csv file into a Dataframe
c. To clean the text
d. Display the results using head function
e. Use the gensim library to build the model
f. Use the LDA model to create a dictionary of each token in the dataset.
g. Create an LdaModel instance that will learn on 10 topics
n. Use the print_topics function to see the model’s information
o. Calculate the coherence scores for a range between 10 and 30
p. To plot the coherence scores

LU-39 Topic Modelling – Tutorial Period: 1


LU Outcomes Level: A CO Number: 4
Mepco Schlenk Engineering College (Autonomous), Sivakasi

Identify the topic from the text by creating a model by python


Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. Analyze a dataset of Reuters new articles and perform topic modeling using LSI. A 5.1.2,
Make use of the LsiModel class provided by the gensim library to clean the text, 13.2.5
create the LsiModel and calculate the coherence scores.
2. Perform topic modeling on airline tweets in order to discover what topics are A 5.1.2,
being in them. Build the model by using LDA algorithm provided by the gensim 13.2.5
and calculate the coherence scores. Also plot the coherence scores.

Unit-5: SENTIMENT ANALYSIS AND MACHINE TRANSLATION

This module introduces sentiment analysis and its algorithms. It describes need and growth of
sentiment analysis. It explains the python based tool like Textblob for doing sentiment analysis. It
explains training of sentiment models and understanding the data in sentiment analysis. There is also
mention about the different machine translation methods such as direct and statistical translation. This
module also explains translation involving in Indian languages.

LU-40 Introduction to Sentiment Analysis Period: 1


LU Outcomes Level: U CO Number: 5
1. Identify different sentiments in sentiment analysis
2. Apply sentiment analysis in social media applications
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by sentiment analysis? U 1.3.1
2. What are the types of sentiments in sentiment analysis? U 1.4.1
3. Why sentiment analysis is required? U 4.1.2
4. Explain in detail about different steps for doing sentiment analysis. U 2.2.3
5. Write short notes on need and growth of sentiment analysis. U 2.2.3
6. What are the types of sentiments in sentiment analysis? U 4.2.2
7. What is meant by emotion? U 1.3.1
8. Differentiate action and passivity. U 2.2.4
9. Differentiate subjectivity and objectivity. U 2.2.4
10. Explain in detail about various algorithms for sentiment analysis. U 4.2.2
11. What is meant by polarity and intensity? U 1.3.1
12. Discuss in detail about various applications of sentiment analysis. U 4.2.2
13. What are the NLP tools supported for sentiment analysis? U 4.3.1
14. Write short notes on python and deep learning NLP libraries for sentiment U
analysis.
5.1.2

LU-41 TextBlob Period: 1


LU Outcomes Level: A CO Number: 5
1. Use the TextBlob for sentiment analysis
2. Compare the NLTK and TextBlob
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

Sl.No Test Questions BTL PI


1. What is meant by Textblob? U 2.2.3
2. How to import TextBlob library in python? U 4.3.1
3. Find the sentiment polarity for the following sentence using TextBlob: A 14.3.2
“but you are Late Flight again!! Again and Again! Where are the crew?”
4. Tweet the sentiment analysis using the TextBlob library for airlines and do the A 13.2.5
implantation for the following activity:
a. Import necessary libraries
b. Load the CSV file
c. Fetch the text column from the DataFrame
d. Extract and remove the handles from the fetched data.
e. Perform sentiment analysis and get the new Dataframe.
f. Join both Dataframes.
g. Apply appropriate conditions and view positive, negative and neutral tweets.

LU-42 Understanding Data for Sentiment Analysis Period: 1


LU Outcomes Level: A CO Number: 5
1. Use the TextBlob for sentiment analysis
2. Do the sentiment analysis for supervised datasets
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by supervised datasets? U 4.1.3
2. How to load the data for sentiment analysis in python? U 4.1.3
3. Load the data for sentiment analysis and do the following activity using python: A 13.2.5
a. Import necessary libraries
b. Specify where the sentiment data is located
c. Load the IMDb reviews
d. Display top 10 records in the DataFrame
e. You show the negative review as scores of 0 and positive reviews as
score of 1
f. Check the total records of IMDB review file
g. Format the data
h. Load any other reviews also

LU-43 Training Sentiment Models Period: 1


LU Outcomes Level: A CO Number: 5
1. Construct the sentiment model
2. Train the sentiment model for achieving high accuracy
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. What is meant by sentiment model? U 2.3.1
2. Explain in detail about steps for training the sentiment models. U 4.1.3
3. You have to do the training a sentiment model using TF-IDF and Logistic A 13.2.5
Regression and do the following activity:
a. Import necessary libraries
b. Load any three datasets
c. Concatenate different datasets into one dataset
d. Do the random selection from the dataset
e. Remove the unnecessary characters
f. Take the copies of the data
Mepco Schlenk Engineering College (Autonomous), Sivakasi

g. Develop a model for TF-IDF and Logistic Regression


h. Split the training and testing data sets
i. Fit the model
j. Do the prediction
k. Find the accuracy of both models

LU-44 Sentiment Analysis – Tutorial Period: 1


LU Outcome Level: A CO Number: 5
Do sentiment analysis for the social media data using python
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. Find the sentiment polarity for the following sentence using TextBlob: A 13.2.5
“but you are Late Flight again!! Again and Again! Where are the crew?”
2. Perform Sentiment Analysis of a movie dataset “IMDB” using python. A 14.3.2
3. Train a sentiment model using TF-IDF and Logistic Regression for any dataset. A 14.3.2

LU-45 Introduction to Machine Translation Period: 1


LU Outcomes Level: A CO Number: 5
1. Perform machine translation for converting text or speech from one language to another
language.
2. Perform the syntactic transformations from one language order into other language
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. Why the machine translation is hard? U 4.1.2
2. What is typology? Give example. U 1.4.1
3. Explain in detail about lexical divergences in machine translation. U 4.2.2
4. Construct the vauquois triangle for machine translation. U 4.3.1
5. Difference between syntactic transfer and lexical transfer. U 2.2.4
6. Construct the syntactic transformations from English Order to Japanese Order A
for the sentence “He adores listening to music”
13.2.5
7. Give some informal description of syntactic transformations. U 4.2.2

LU-46 Problems in Machine Translation Period: 1


LU Outcomes Level: A CO Number: 5
1. Perform machine translation for converting text or speech from one language to another
language.
2. Identify the problems in machine translation
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. Why the machine translation is hard? U 4.1.3
2. Convert the English text to French text in word string using machine translation. A 13.2.5
3. What do you mean by word order and word sense? R 1.4.1
4. How do you solve the ambiguity problem in machine translation? U 4.2.2
5. Compare the direct machine translation system with the transfer system. U 2.2.4
6. Develop a small Hindi to English dictionary consisting of 50 words. Use it to A 14.2.2
develop a direct machine translation system for simple sentences involving
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)

words in the dictionary.


7. Use the MT system available on the Web to translate a set of sentences and A 14.2.2
compare their output. List the problems you noticed.

LU-47 Machine Translation Approaches Period: 2


LU Outcomes Level: A CO Number: 5
1. Use the different machine translation techniques
2. Perform the direct translation in bilingual dictionary.
3. Apply the noisy channel model in statistical machine translation
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. Translate the following sentence from English into Spanish using direct A 14.2.2
translation: “Mary didn’t slap the green witch”.
2. Write the procedure for direct translation with word sense disambiguation. U 5.1.2
3. Using the formula P (f|e) compute P(Jean aime Marie| John loves Mary). A 4.2.2
4. Explain in detail about problems of statistical machine model. U 4.1.3
5. Explain in detail about noisy channel model of statistical alignment with neat U 4.1.2
diagram.
6. Consider the following English sentence: A 14.2.2
Ram slept in the garden
To translate this sentence into Hindi using direct translation.
7. Draw and explain schematic diagram for the transfer based model. U 2.3.2
8. Discuss in detail interlingua based machine translation. U 4.2.2
9. Apply the example based machine translation for the following sentences: A 14.2.2
a. Rohit sings a song
b. Sheela is playing
c. Sheela is singing a song
d. Sheela sings a Hindhi songs
10. Explain in detail semantic or knowledge based machine translation systems. U 4.2.2

LU-48 Translation involving Indian Languages using Python Period: 1


LU Outcomes Level: A CO Number: 5
1. Focus on translation of English to Hindi language pair
2. Translate Indian languages into English language based on features
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. Discuss in detail ANGLABHARTI and SHAKTI translation of Indian languages. U 4.2.2
2. What are the stages performed in SHAKTI implementation? U 4.1.3
3. Write short notes on MaTra language translation. U 4.1.2
4. Explain in detail about MANTRA and Anusaarak language translation. U 4.1.3
5. Write an python program that takes an English sentence and reorders it to A 14.2.2
match word order in Hindi.
6. What are four broad families of Indian languages? U 2.3.1
7. How the Indian languages related to the morphological variants? U 4.1.2
8. What do you mean by language divergence problem? Explain with the help of U 2.2.3
appropriate examples.
Mepco Schlenk Engineering College (Autonomous), Sivakasi

9. List characteristics that are common among Indian languages. U 2.2.3

LU-49 Machine Translation – Tutorial Period: 1


LU Outcome Level: A CO Number: 5
Translate the languages using python libraries
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions BTL PI
1. Develop a small Hindi to English dictionary consisting of 50 words. Use it to A 14.2.2
develop a direct machine translation system for simple sentences involving
words in the dictionary.
2. Translate the following sentence from English into Spanish using direct A 14.2.2
translation: “Mary didn’t slap the green witch”.
3. Apply the example based machine translation for the following sentences: A 13.2.5
e. Rohit sings a song
f. Sheela is playing
g. Sheela is singing a song
h. Sheela sings a Hindhi songs
4. Write an python program that takes an English sentence and reorders it to 14.2.2
A
match word order in Hindi.

You might also like