NLP Tools and Applications
NLP Tools and Applications
Prepared By : Mrs.S.Santhi
Reviewed By : Mr.N.Umakanth
Course Objectives:
Course Outcomes:
Course Prerequisite:
Python Programming
Mode of Delivery:
1. Oral presentation
2. Tutorial
3. Hands on/Demonstration
Assessment Methods:
1. Internal Test
2. Assignment
Mepco Schlenk Engineering College (Autonomous), Sivakasi
Oral presentation,
5. A Hands-on / Internal Test 0.7 1 2 3 3 1
Demonstration
Demonstration
Oral presentation, Internal Test 0.7
3. A Hands-on / 2 3 2 3 2 1
Demonstration Assignment 0.3
Oral presentation,
5. A Hands-on / Internal Test 0.7 1 2 3 3
Demonstration
Concept Map
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)
19C
SH03 – SYLLABUS (As per MEPCO Autonomous Syllabus)
L T P C
19CSH03 NLP Tools and Applications
2 2 0 3
UNIT-I INTRODUCTION TO NLP 9
Natural language processing – History of NLP – Early NLP systems – Phases of natural language
processing – Evaluation of NLP systems - Origins and challenges of NLP – Basic English concepts –
Language and Grammar - Processing Indian Languages.
Introduction and installation of NLTK – Data Pre-processing: Tokenization – Part of Speech (PoS) Tagging -
Word Frequency Counting – Stop Words Removal – Text Normalization – Spelling Correction - Stemming –
Lemmatization – Named Entity Recognition.
Feature Extraction: Building Bag of Words (BoW) Model – Building TF-IDF Model – Word Embeddings using
word2vec - Sentence Boundary Detection – Parsing - Lexical Resources: WordNet – FrameNet - Word
Synonyms and Antonyms using NLTK – Word Negation Tracking - Word Sense Disambiguation
Introduction to Text Classification – Machine Learning Overview – Classification Metrics – Confusion Matrix
– Developing a Text Classifier – Saving and Loading Models - Introduction to Topic Modelling – Topic
Discovery – Topic Modelling Algorithms: Latent Semantic Analysis – Latent Dirichlet Algorithms.
Introduction to Sentiment Analysis – Need and Growth of Sentiment Analysis – TextBlob – Understanding
Data for Sentiment Analysis – Training Sentiment Models – Introduction to Machine Translation - Problems
in Machine Translation - Machine Translation Approaches - Translation involving Indian Languages using
Python
TOTAL: 45 PERIODS
Mepco Schlenk Engineering College (Autonomous), Sivakasi
TEXTBOOK:
1. Ela Kumar, “Natural Language Processing”, I.K International, New Delhi 2011.
2. Sohom Ghosh, Dwight Gunning, “Natural Language Processing Fundamentals”, Packt Publishing
Limited, 2019.
3. Steven Bird, Ewan Klein and Edward Loper, “Natural Language Processing with Python”, O'Reilly
Media, 1st Edition, 2009.
REFERENCE BOOKS:
1. Tanveer Siddiqui, U.S. Tiwary, “Natural Language Processing and Information Retrieval”, Oxford
University Press, 2008.
2. Daniel Jurafsky and James H Martin, “Speech and Language Processing: An introduction to Natural
Language Processing, Computational Linguistics and Speech Recognition”, Prentice Hall, 2 nd Edition
2008.
WEB REFERENCES:
1. http://nptel.ac.in/courses/106101007/
2. http://www.tutorialspoint.com/artificial_intelligence/
artificial_intelligence_natural_language_processing.htm
3. http://nlp.stanford.edu/
4. http://ocw.mit.edu/courses/electrical-engineering-and-computer-science
5. https://www.udemy.com/course/nlp-natural-language-processing-with-python/
6. https://www.packtpub.com/in/big-data-and-business-intelligence/natural-language-processing-
fundamentals
COURSE SCHEDULE
This Module explains the history of natural language processing and how to use early NLP
systems. It details the phases of natural language processing and syntactic analysis. It gives the
evaluation of NLP system and Basic English concepts. This Module explains the origin and challenges
of natural language processing and how to process Indian languages.
LU-1 Natural Language Processing Period: 1
LU Outcomes Level: U CO Number: 1
1. Use NLP for human computer interaction
2. Concerned with the development of computational models
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.N
Test Questions BTL PI *
o
1. Define NLP. R 1.3.1
2. Give the examples of natural language sentences for complexity analysis. U 4.1.1
3. List the approaches of Natural Language Processing. Explain in detail. U 4.1.2
4. Define knowledge based technique. R 1.3.1
5. Draw the block diagram of an NLP system. U 4.1.1
6. What are the major tasks performed in NLP applications? U 1.3.1
7. Differentiate between the rationalist and empiricist approaches to natural U 4.1.2
B.E. Computer Science and Engineering - V Semester - Instructional System Design (2023-2024)
language processing.
* - Only for CSE students
LU-2 History of NLP Period: 1
LU Outcomes Level: U CO Number: 1
1. Use the different versions of NLP systems
2. Compare the various systems using its features
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.N
Test Questions BTL PI
o
1. What is the different programming languages used in the first era? R 1.3.1
2. What is meant by symbolic and stochastic? R 1.3.1
3. Write short notes on third era. U 1.3.1
4. Which NLP version is used to generate the web applications? U 4.1.1
5. What are the tasks performed in language understanding problems? U 4.1.2
3. Consider the following sentences and identify the parts of the sentence: A 4.2.1
1. The black and white dog was barking fiercely at the stranger.
2. The cobra saw the dog coming closer and raised itself into striking
position.
3. Annie is an English teacher.
4. Write short notes on structure of Dictionary. U 4.1.1
5. Explain in detail about wordnet. U 4.1.2
6. Specify the dictionary features in English language and explain each feature. U 4.1.2
This Module explains the history of natural language processing and how to use early NLP
systems. It details the phases of natural language processing and syntactic analysis. It gives the
evaluation of NLP system and Basic English concepts. This Module explains the origin and challenges
of natural language processing and how to process Indian languages.
2. Write the python program with nltk for finding Part of Speech tagging for the A 5.1.2,
sentences in Question 1. 13.2.5
3. Write a python program to take a paragraph of text as the input and perform A 5.1.2,
the spell correction, text normalization and remove the stop words. 14.3.2
4. Write the python program for various stemming technique and extract the stem A 5.1.2,
words for the input paragraph. 14.3.2
This module describes the general and specific feature extraction from the text data. It deals
with Bag of Words and IF-IDF model extracting specific features from the text data. It explains word
embeddings and sentence boundary detection. It deals various parsing algorithms like top down and
bottom up parsing. This module also explains lexical resources and performs word synonyms and
antonyms, word negation tracking and word sense disambiguation.
2. Which tokenization method will be helpful for doing sentence boundary U 4.1.2
detection?
3. Write a python program for sentence boundary detection and apply the following A 4.3.4
sentence: “we are reading a book. Do you know who is the publisher? It is
Packt. Packt is based on python”
3. Write an python program for word negation tracking using NLTK A 14.3.2
4. Write an algorithm for finding word negation in sentiment analysis. A 13.2.5
3. Write an python program for finding word sense disambiguation for the A 13.2.5
ambiguity word using nltk.
This module introduces the text classification and its algorithms. It deals with machine
learning algorithm for text classification and describes some metrics for text classification. It also
deals developing a text classifier and save & load the models into text classification. This module also
explains topic modeling and its algorithms.
5. Consider the question number 4 and do the same operations using K-means A 13.2.5
clustering.
6. What is meant by supervised learning? Explain in detail about Logistic U 4.2.2
regression, Naïve Bayes Classifiers and K-nearest classifiers with an example.
7. In this exercise, you will classify reviews of musical instruments on Amazon with A 2.3.2
the help of logistic regression, Naïve Bayes and KNN and do the following steps:
a. Import necessary libraries
b. Read the data file in JSON format using Pandas
c. Use a Lambda function to extract tokens from each reviewtext
d. Create a data frame from the TF-IDF matrix
e. Create a TF-IDF matrix and transform it into a Dataframe
f. Fit the logistic regression, Gaussian Naïve Bayes and KNN model
g. Compare the results of classification model
8. What is meant by regression? Explain in detail about Linear Regression with an U 4.2.2
example.
9. You will use regression to predict the overall scores of reviews of musical A 14.3.2
instruments on Amazon. Follow these steps to implement this exercise:
a. Import the necessary libraries
b. Read the data file in JSON format using Pandas
c. Use a Lambda function to extract tokens from each reviewtext
d. Create a data frame from the TF-IDF matrix
e. Fit the linear regression model
f. Intercept the linear regression
g. Do the prediction using tf-idf Data frame
10. You take the question number 9 options and do the tree based methods like A 14.3.2
Decision Tree, Random Forest, GBM and XGBoost and write the code in python.
1. In this exercise, you will classify reviews of musical instruments on Amazon with A 14.3.2
the help of logistic regression, Naïve Bayes and KNN and do the following steps:
h. Import necessary libraries
i. Read the data file in JSON format using Pandas
j. Use a Lambda function to extract tokens from each reviewtext
k. Create a data frame from the TF-IDF matrix
l. Create a TF-IDF matrix and transform it into a Dataframe
m. Fit the logistic regression, Gaussian Naïve Bayes and KNN model
n. Compare the results of classification model
2. You will use regression to predict the overall scores of reviews of musical A 14.3.2
instruments on Amazon. Follow these steps to implement this exercise:
h. Import the necessary libraries
i. Read the data file in JSON format using Pandas
j. Use a Lambda function to extract tokens from each reviewtext
k. Create a data frame from the TF-IDF matrix
l. Fit the linear regression model
m. Intercept the linear regression
n. Do the prediction using tf-idf Data frame
3. Write a python program for removing correlated features for sklearn fetch- A 13.2.5
20newsgroups dataset.
4. Write an python program for calculating PCA for sklearn fetch-20newsgroups A 13.2.5
dataset
This module introduces sentiment analysis and its algorithms. It describes need and growth of
sentiment analysis. It explains the python based tool like Textblob for doing sentiment analysis. It
explains training of sentiment models and understanding the data in sentiment analysis. There is also
mention about the different machine translation methods such as direct and statistical translation. This
module also explains translation involving in Indian languages.