Bai601 NLP

The document outlines a Natural Language Processing course (BAI601) for semester 6, detailing course objectives, teaching strategies, modules, practical components, assessment methods, and suggested learning resources. Students will learn about natural language modeling, applications of NLP, error detection, information retrieval, and machine translation. The course includes both theoretical and practical evaluations, with a focus on hands-on programming and analysis using Python.

Uploaded by

krishna mehar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2K views5 pages

Bai601 NLP

Uploaded by

krishna mehar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

NATURAL LANGUAGE PROCESSING Semester 6

Course Code BAI601 CIE Marks 50

Teaching Hours/Week (L:T:P: S) 3:0:2:0 SEE Marks 50
Total Hours of Pedagogy 40 hours Theory + 8-10 Lab slots Total Marks 100
Credits 04 Exam Hours 03
Examination nature (SEE) Theory
Course objectives:
This course will enable students to,
• Learn the importance of natural language modelling
• Understand the Applications of natural language processing
• Study spelling, error detection and correction methods and parsing techniques in NLP
• Illustrate the information retrieval models in natural language processing
Teaching-Learning Process (General Instructions)
These are sample Strategies that teachers can use to accelerate the attainment of the various course outcomes.
1. Lecturer method (L) need not to be only traditional lecture methods, but alternative effective teaching
methods could be adopted to attain the outcomes.
2. Use of Video/Animation to explain functioning of various concepts.
3. Encourage collaborative (Group Learning) Learning in the class.
4. Ask at least three HOT (Higher order Thinking) questions in the class, which promotes critical thinking.
5. Adopt Problem Based Learning (PBL), which fosters student’s Analytical skills, develop design
thinking skills such as the ability to design, evaluate, generalize, and analyze information rather than
simply recall it.
MODULE-1
Introduction: What is Natural Language Processing? Origins of NLP, Language and Knowledge,
The Challenges of NLP, Language and Grammar, Processing Indian Languages, NLP Applications.
Language Modeling: Statistical Language Model - N-gram model (unigram, bigram), Paninion
Framework, Karaka theory.
Textbook 1: Ch. 1, Ch. 2.
MODULE-2
Word Level Analysis: Regular Expressions, Finite-State Automata, Morphological Parsing, Spelling
Error Detection and Correction, Words and Word Classes, Part-of Speech Tagging.
Syntactic Analysis: Context-Free Grammar, Constituency, Top-down and Bottom-up Parsing, CYK
Parsing.
Textbook 1: Ch. 3, Ch. 4.
MODULE-3
Naive Bayes, Text Classification and Sentiment: Naive Bayes Classifiers, Training the Naive
Bayes Classifier, Worked Example, Optimizing for Sentiment Analysis, Naive Bayes for Other Text
Classification Tasks, Naive Bayes as a Language Model.
Textbook 2: Ch. 4.
MODULE-4

1
Information Retrieval: Design Features of Information Retrieval Systems, Information Retrieval
Models - Classical, Non-classical, Alternative Models of Information Retrieval - Custer model, Fuzzy
model, LSTM model, Major Issues in Information Retrieval.
Lexical Resources: WordNet, FrameNet, Stemmers, Parts-of-Speech Tagger, Research Corpora.
Textbook 1: Ch. 9, Ch. 12.
MODULE-5
Machine Translation: Language Divergences and Typology, Machine Translation using Encoder-
Decoder, Details of the Encoder-Decoder Model, Translating in Low-Resource Situations, MT
Evaluation, Bias and Ethical Issues.
Textbook 2: Ch. 13.

PRACTICAL COMPONENT OF IPCC

Sl.NO Experiments
1 Write a Python program for the following preprocessing of text in NLP:
● Tokenization
● Filtration
● Script Validation
● Stop Word Removal
● Stemming
2 Demonstrate the N-gram modeling to analyze and establish the probability distribution across
sentences and explore the utilization of unigrams, bigrams, and trigrams in diverse English
sentences to illustrate the impact of varying n-gram orders on the calculated probabilities.
3 Investigate the Minimum Edit Distance (MED) algorithm and its application in string
comparison and the goal is to understand how the algorithm efficiently computes the minimum
number of edit operations required to transform one string into another.
● Test the algorithm on strings with different type of variations (e.g., typos, substitutions,
insertions, deletions)
● Evaluate its adaptability to different types of input variations
4 Write a program to implement top-down and bottom-up parser using appropriate context free
grammar.
5 Given the following short movie reviews, each labeled with a genre, either comedy or action:
● fun, couple, love, love comedy
● fast, furious, shoot action
● couple, fly, fast, fun, fun comedy
● furious, shoot, shoot, fun action
● fly, fast, shoot, love action and
A new document D: fast, couple, shoot, fly
Compute the most likely class for D. Assume a Naive Bayes classifier and use add-1 smoothing
for the likelihoods.
6 Demonstrate the following using appropriate programming tool which illustrates the use of
information retrieval in NLP:
● Study the various Corpus – Brown, Inaugural, Reuters, udhr with various methods like
filelds, raw, words, sents, categories
2
● Create and use your own corpora (plaintext, categorical)
● Study Conditional frequency distributions
● Study of tagged corpora with methods like tagged_sents, tagged_words
● Write a program to find the most frequent noun tags
● Map Words to Properties Using Python Dictionaries
● Study Rule based tagger, Unigram Tagger
Find different words from a given plain text without any space by comparing this text with a
given corpus of words. Also find the score of words.
7
Write a Python program to find synonyms and antonyms of the word "active" using WordNet.
8 Implement the machine translation application of NLP where it needs to train a machine
translation model for a language with limited parallel corpora. Investigate and incorporate
techniques to improve performance in low-resource scenarios.
Course outcomes (Course Skill Set):
At the end of the course, the student will be able to:
● Apply the fundamental concept of NLP, grammar-based language model and statistical-based
language model.
● Model morphological analysis using Finite State Transducers and parsing using context-free
grammar and different parsing approaches.
● Develop the Naïve Bayes classifier and sentiment analysis for Natural language problems and text
classifications.
● Apply the concepts of information retrieval, lexical semantics, lexical dictionaries such as
WordNet, lexical computational semantics, distributional word similarity.
● Identify the Machine Translation applications of NLP using Encode and Decoder.
Assessment Details (both CIE and SEE)
The weightage of Continuous Internal Evaluation (CIE) is 50% and for Semester End Exam (SEE) is 50%.
The minimum passing mark for the CIE is 40% of the maximum marks (20 marks out of 50) and for the
SEE minimum passing mark is 35% of the maximum marks (18 out of 50 marks). A student shall be
deemed to have satisfied the academic requirements and earned the credits allotted to each subject/
course if the student secures a minimum of 40% (40 marks out of 100) in the sum total of the CIE
(Continuous Internal Evaluation) and SEE (Semester End Examination) taken together.

CIE for the theory component of the IPCC (maximum marks 50)
● IPCC means practical portion integrated with the theory of the course.
● CIE marks for the theory component are 25 marks and that for the practical component is 25
marks.
● 25 marks for the theory component are split into 15 marks for two Internal Assessment Tests (Two
Tests, each of 15 Marks with 01-hour duration, are to be conducted) and 10 marks for other
assessment methods mentioned in 22OB4.2. The first test at the end of 40-50% coverage of the
syllabus and the second test after covering 85-90% of the syllabus.
● Scaled-down marks of the sum of two tests and other assessment methods will be CIE marks for the
theory component of IPCC (that is for 25 marks).
● The student has to secure 40% of 25 marks to qualify in the CIE of the theory component of IPCC.

CIE for the practical component of the IPCC

● 15 marks for the conduction of the experiment and preparation of laboratory record, and 10 marks
for the test to be conducted after the completion of all the laboratory sessions.

3
● On completion of every experiment/program in the laboratory, the students shall be evaluated
including viva-voce and marks shall be awarded on the same day.
● The CIE marks awarded in the case of the Practical component shall be based on the continuous
evaluation of the laboratory report. Each experiment report can be evaluated for 10 marks. Marks of
all experiments’ write-ups are added and scaled down to 15 marks.
● The laboratory test (duration 02/03 hours) after completion of all the experiments shall be
conducted for 50 marks and scaled down to 10 marks.
● Scaled-down marks of write-up evaluations and tests added will be CIE marks for the laboratory
component of IPCC for 25 marks.
● The student has to secure 40% of 25 marks to qualify in the CIE of the practical component of the IPCC.

SEE for IPCC

Theory SEE will be conducted by University as per the scheduled timetable, with common question
papers for the course (duration 03 hours)
1. The question paper will have ten questions. Each question is set for 20 marks.
2. There will be 2 questions from each module. Each of the two questions under a module (with a
maximum of 3 sub-questions), should have a mix of topics under that module.
3. The students have to answer 5 full questions, selecting one full question from each module.
4. Marks scored by the student shall be proportionally scaled down to 50 Marks

The theory portion of the IPCC shall be for both CIE and SEE, whereas the practical portion will
have a CIE component only. Questions mentioned in the SEE paper may include questions from the
practical component.
Suggested Learning Resources:
Textbook:
1. Tanveer Siddiqui, U.S. Tiwary, “Natural Language Processing and Information Retrieval”,
Oxford University Press.
2. Daniel Jurafsky, James H. Martin, “Speech and Language Processing, An Introduction to
Natural Language Processing, Computational Linguistics, and Speech Recognition”, Pearson
Education, 2023.
Reference Books:
1. Akshay Kulkarni, Adarsha Shivananda, “Natural Language Processing Recipes - Unlocking
Text Data with Machine Learning and Deep Learning using Python”, Apress, 2019.
2. T V Geetha, “Understanding Natural Language Processing – Machine Learning and Deep
Learning Perspectives”, Pearson, 2024.
3. Gerald J. Kowalski and Mark.T. Maybury, “Information Storage and Retrieval systems”,
Kluwer Academic Publishers.
Web links and Video Lectures (e-Resources):
1. https://www.youtube.com/watch?v=M7SWr5xObkA
2. https://youtu.be/02QWRAhGc7g
3. https://www.youtube.com/watch?v=CMrHM8a3hqw
4. https://onlinecourses.nptel.ac.in/noc23_cs45/preview
5. https://archive.nptel.ac.in/courses/106/106/106106211/

4
Activity Based Learning (Suggested Activities in Class)/ Practical Based learning

Text Classification Game (5 Marks)

● Objective: Learn supervised learning and text classification.
● Activity: Provide students with a set of documents (e.g., movie reviews) labeled as positive or
negative. Divide them into groups and have them create a simple classification model using
keywords or phrases. They can then test their model on new reviews.
Grammar Check and Correction (5 Marks)
● Objective: Learn about language structure and NLP tools.
● Activity: Provide sentences with grammatical errors. Students can use grammar checking tools
(like Grammarly or LanguageTool) to identify errors and suggest corrections, discussing why
each suggestion is made.

Boarding Pass 5 Promo
0% (1)
Boarding Pass 5 Promo
28 pages
Oscp Preparation
83% (6)
Oscp Preparation
39 pages
Department
No ratings yet
Department
819 pages
D31EXPX 22 Vs CAT AECI431 00 LoRes 58018
No ratings yet
D31EXPX 22 Vs CAT AECI431 00 LoRes 58018
68 pages
Natural Language Processing and Information
No ratings yet
Natural Language Processing and Information
105 pages
Principles of Design
100% (1)
Principles of Design
25 pages
PDF p2 Guerrero Ch15 Compress
No ratings yet
PDF p2 Guerrero Ch15 Compress
27 pages
Tube Stube Settlers
No ratings yet
Tube Stube Settlers
9 pages
Behaviour Management of An Anxious Child
No ratings yet
Behaviour Management of An Anxious Child
5 pages
Techknowledge Publication: Big Data Analytics
No ratings yet
Techknowledge Publication: Big Data Analytics
156 pages
Annual Report On CSR Activities 2021-22
No ratings yet
Annual Report On CSR Activities 2021-22
16 pages
ISR U 1&2 Tech-Knowledge
No ratings yet
ISR U 1&2 Tech-Knowledge
68 pages
EE8012 - Soft Computing
No ratings yet
EE8012 - Soft Computing
340 pages
Atestat Engleza
No ratings yet
Atestat Engleza
9 pages
60N3LH5 STMicroelectronics
No ratings yet
60N3LH5 STMicroelectronics
16 pages
SE Technical
No ratings yet
SE Technical
234 pages
Science Behind The Construction of A Temple
No ratings yet
Science Behind The Construction of A Temple
3 pages
Cardio (PP012) Quiz 1 Grades
No ratings yet
Cardio (PP012) Quiz 1 Grades
7 pages
Practice Multiple Choice Questions For Test1
100% (1)
Practice Multiple Choice Questions For Test1
7 pages
CSE4022 Natural-Language-Processing ETH 1 AC41
No ratings yet
CSE4022 Natural-Language-Processing ETH 1 AC41
6 pages
Probabilistic Information Retrieval Model
No ratings yet
Probabilistic Information Retrieval Model
51 pages
Jesse
No ratings yet
Jesse
4 pages
BC 304 PI Past Papers
No ratings yet
BC 304 PI Past Papers
29 pages
Biju Expence Details
No ratings yet
Biju Expence Details
2 pages
Prolog - Unification - Backtracking - Recursion - Lists - Cut
No ratings yet
Prolog - Unification - Backtracking - Recursion - Lists - Cut
78 pages
274 - Soft Computing LECTURE NOTES
No ratings yet
274 - Soft Computing LECTURE NOTES
499 pages
Kaushal Chavda
No ratings yet
Kaushal Chavda
137 pages
Cs8080 - Irt - Notes All
No ratings yet
Cs8080 - Irt - Notes All
281 pages
Aids I Book Sem 6
No ratings yet
Aids I Book Sem 6
223 pages
Main Street Magazine Issue 7
No ratings yet
Main Street Magazine Issue 7
8 pages
PPT08-Natural Language Processing
100% (1)
PPT08-Natural Language Processing
44 pages
6CS4 AI Unit-5
No ratings yet
6CS4 AI Unit-5
65 pages
AI ch.1
No ratings yet
AI ch.1
22 pages
Planning Engineer or Business Analyst or Data Analyst or Plannin
No ratings yet
Planning Engineer or Business Analyst or Data Analyst or Plannin
2 pages
Bail657c G Ai
100% (1)
Bail657c G Ai
3 pages
CD Unit - 1
No ratings yet
CD Unit - 1
38 pages
Legal Basis of International Relation
No ratings yet
Legal Basis of International Relation
4 pages
TNPSC Group 2 Mains Preparation Book List For Latest Updated Syllabus - TNPSC Group 4, VAO, Group 2, Group 1, Notificati 1
No ratings yet
TNPSC Group 2 Mains Preparation Book List For Latest Updated Syllabus - TNPSC Group 4, VAO, Group 2, Group 1, Notificati 1
5 pages
NLP Manual Final
No ratings yet
NLP Manual Final
28 pages
Warda Resume
No ratings yet
Warda Resume
4 pages
Bai602 ML I
100% (1)
Bai602 ML I
4 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
2 pages
Cs8080 Unit3 Text Classification and Clustering
No ratings yet
Cs8080 Unit3 Text Classification and Clustering
171 pages
KCS 071 Unit 1 Notes 2022
No ratings yet
KCS 071 Unit 1 Notes 2022
19 pages
Lecture 1: Introduction To NLP: Understand Concepts Applications
No ratings yet
Lecture 1: Introduction To NLP: Understand Concepts Applications
32 pages
Completed Unit II 17.7.17
No ratings yet
Completed Unit II 17.7.17
113 pages
2021 Investment Case For After School Programmes
No ratings yet
2021 Investment Case For After School Programmes
27 pages
Information Retrieval Systems (A70533)
No ratings yet
Information Retrieval Systems (A70533)
11 pages
Practical 3 ANN
No ratings yet
Practical 3 ANN
3 pages
Be - Computer Engineering - Semester 6 - 2022 - May - Artificial Intelligence Ai Pattern 2019
No ratings yet
Be - Computer Engineering - Semester 6 - 2022 - May - Artificial Intelligence Ai Pattern 2019
2 pages
Universal Human Values Notes
No ratings yet
Universal Human Values Notes
84 pages
Homework-3 Cap405: Computer Graphics
No ratings yet
Homework-3 Cap405: Computer Graphics
9 pages
Lecture-1-Introduction To Natural Language Processing-2021
No ratings yet
Lecture-1-Introduction To Natural Language Processing-2021
46 pages
R8 Waray BoSY CRLA 11.24.2021 v4
No ratings yet
R8 Waray BoSY CRLA 11.24.2021 v4
10 pages
RM IPR (21RMI56) SRN QBank
No ratings yet
RM IPR (21RMI56) SRN QBank
2 pages
2.notes CS8080 - Information Retrieval Technique
No ratings yet
2.notes CS8080 - Information Retrieval Technique
164 pages
Cognitive Science
No ratings yet
Cognitive Science
2 pages
Database PYQ
No ratings yet
Database PYQ
63 pages
Education 101 PPT, Jun 2023
No ratings yet
Education 101 PPT, Jun 2023
19 pages
NLP Course File Notes
No ratings yet
NLP Course File Notes
71 pages
Text Based Information Retrieval - Document Mining
No ratings yet
Text Based Information Retrieval - Document Mining
37 pages
CS8080 Information Retrieval Techniques Reg 2017 Question Bank
No ratings yet
CS8080 Information Retrieval Techniques Reg 2017 Question Bank
6 pages
Completed UNIT-III 20.9.17
No ratings yet
Completed UNIT-III 20.9.17
61 pages
Internship Report
No ratings yet
Internship Report
13 pages
NLP Lab Manual Updated
No ratings yet
NLP Lab Manual Updated
34 pages
Faculty Name: Dr. Humera Khanam Subject Name:NLP
No ratings yet
Faculty Name: Dr. Humera Khanam Subject Name:NLP
206 pages
Basic Electrical Engineering (Elec 1001)
No ratings yet
Basic Electrical Engineering (Elec 1001)
5 pages
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 2 - Problem Solving
No ratings yet
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 2 - Problem Solving
9 pages
Unit 3 AI Srs 13-14
No ratings yet
Unit 3 AI Srs 13-14
45 pages
Artificial Intelligence Final
No ratings yet
Artificial Intelligence Final
4 pages
Elektronik Ders Prog 2024 2025guz
No ratings yet
Elektronik Ders Prog 2024 2025guz
24 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
Workbook Coa Cs-It
No ratings yet
Workbook Coa Cs-It
46 pages
CS6007 Information Retrieval
No ratings yet
CS6007 Information Retrieval
8 pages
Bail606 MLL
No ratings yet
Bail606 MLL
3 pages
21ML1601 NLP QB
No ratings yet
21ML1601 NLP QB
34 pages
Aids - VSB Syllabus 2023 - 16.8.24
No ratings yet
Aids - VSB Syllabus 2023 - 16.8.24
88 pages
Ieee Paper
No ratings yet
Ieee Paper
5 pages
NLP Unit 1
No ratings yet
NLP Unit 1
133 pages
MACHINE LEARNING Important Questions
100% (1)
MACHINE LEARNING Important Questions
2 pages
Acct Statement - XX6157 - 29012025
No ratings yet
Acct Statement - XX6157 - 29012025
40 pages
Pattern Recognition and Anomaly Detection Lab
No ratings yet
Pattern Recognition and Anomaly Detection Lab
3 pages
Vtu NLP Questions
100% (1)
Vtu NLP Questions
5 pages
NLP Unit 1 and 2
No ratings yet
NLP Unit 1 and 2
106 pages
AI in Healthcare Syllabus
No ratings yet
AI in Healthcare Syllabus
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
40 pages
21AI643
No ratings yet
21AI643
2 pages
Intro Intern Final Merged
No ratings yet
Intro Intern Final Merged
22 pages
Deep Learning (Nirali)
No ratings yet
Deep Learning (Nirali)
32 pages
IF4071 Deep Learning QP
No ratings yet
IF4071 Deep Learning QP
2 pages
Throne of Secrets Kerri Maniscalco Instant Download
100% (2)
Throne of Secrets Kerri Maniscalco Instant Download
41 pages
Module Two The Best Book To Read
No ratings yet
Module Two The Best Book To Read
65 pages
1.1 Apogamy, Apospory and Parthenogenesis
No ratings yet
1.1 Apogamy, Apospory and Parthenogenesis
21 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet

Bai601 NLP

Uploaded by

Bai601 NLP

Uploaded by

NATURAL LANGUAGE PROCESSING Semester 6

Course Code BAI601 CIE Marks 50

PRACTICAL COMPONENT OF IPCC

CIE for the practical component of the IPCC

SEE for IPCC

Text Classification Game (5 Marks)

You might also like