0% found this document useful (0 votes)
2K views5 pages

Bai601 NLP

The document outlines a Natural Language Processing course (BAI601) for semester 6, detailing course objectives, teaching strategies, modules, practical components, assessment methods, and suggested learning resources. Students will learn about natural language modeling, applications of NLP, error detection, information retrieval, and machine translation. The course includes both theoretical and practical evaluations, with a focus on hands-on programming and analysis using Python.

Uploaded by

krishna mehar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views5 pages

Bai601 NLP

The document outlines a Natural Language Processing course (BAI601) for semester 6, detailing course objectives, teaching strategies, modules, practical components, assessment methods, and suggested learning resources. Students will learn about natural language modeling, applications of NLP, error detection, information retrieval, and machine translation. The course includes both theoretical and practical evaluations, with a focus on hands-on programming and analysis using Python.

Uploaded by

krishna mehar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

NATURAL LANGUAGE PROCESSING Semester 6

Course Code BAI601 CIE Marks 50


Teaching Hours/Week (L:T:P: S) 3:0:2:0 SEE Marks 50
Total Hours of Pedagogy 40 hours Theory + 8-10 Lab slots Total Marks 100
Credits 04 Exam Hours 03
Examination nature (SEE) Theory
Course objectives:
This course will enable students to,
• Learn the importance of natural language modelling
• Understand the Applications of natural language processing
• Study spelling, error detection and correction methods and parsing techniques in NLP
• Illustrate the information retrieval models in natural language processing
Teaching-Learning Process (General Instructions)
These are sample Strategies that teachers can use to accelerate the attainment of the various course outcomes.
1. Lecturer method (L) need not to be only traditional lecture methods, but alternative effective teaching
methods could be adopted to attain the outcomes.
2. Use of Video/Animation to explain functioning of various concepts.
3. Encourage collaborative (Group Learning) Learning in the class.
4. Ask at least three HOT (Higher order Thinking) questions in the class, which promotes critical thinking.
5. Adopt Problem Based Learning (PBL), which fosters student’s Analytical skills, develop design
thinking skills such as the ability to design, evaluate, generalize, and analyze information rather than
simply recall it.
MODULE-1
Introduction: What is Natural Language Processing? Origins of NLP, Language and Knowledge,
The Challenges of NLP, Language and Grammar, Processing Indian Languages, NLP Applications.
Language Modeling: Statistical Language Model - N-gram model (unigram, bigram), Paninion
Framework, Karaka theory.
Textbook 1: Ch. 1, Ch. 2.
MODULE-2
Word Level Analysis: Regular Expressions, Finite-State Automata, Morphological Parsing, Spelling
Error Detection and Correction, Words and Word Classes, Part-of Speech Tagging.
Syntactic Analysis: Context-Free Grammar, Constituency, Top-down and Bottom-up Parsing, CYK
Parsing.
Textbook 1: Ch. 3, Ch. 4.
MODULE-3
Naive Bayes, Text Classification and Sentiment: Naive Bayes Classifiers, Training the Naive
Bayes Classifier, Worked Example, Optimizing for Sentiment Analysis, Naive Bayes for Other Text
Classification Tasks, Naive Bayes as a Language Model.
Textbook 2: Ch. 4.
MODULE-4

1
Information Retrieval: Design Features of Information Retrieval Systems, Information Retrieval
Models - Classical, Non-classical, Alternative Models of Information Retrieval - Custer model, Fuzzy
model, LSTM model, Major Issues in Information Retrieval.
Lexical Resources: WordNet, FrameNet, Stemmers, Parts-of-Speech Tagger, Research Corpora.
Textbook 1: Ch. 9, Ch. 12.
MODULE-5
Machine Translation: Language Divergences and Typology, Machine Translation using Encoder-
Decoder, Details of the Encoder-Decoder Model, Translating in Low-Resource Situations, MT
Evaluation, Bias and Ethical Issues.
Textbook 2: Ch. 13.

PRACTICAL COMPONENT OF IPCC


Sl.NO Experiments
1 Write a Python program for the following preprocessing of text in NLP:
● Tokenization
● Filtration
● Script Validation
● Stop Word Removal
● Stemming
2 Demonstrate the N-gram modeling to analyze and establish the probability distribution across
sentences and explore the utilization of unigrams, bigrams, and trigrams in diverse English
sentences to illustrate the impact of varying n-gram orders on the calculated probabilities.
3 Investigate the Minimum Edit Distance (MED) algorithm and its application in string
comparison and the goal is to understand how the algorithm efficiently computes the minimum
number of edit operations required to transform one string into another.
● Test the algorithm on strings with different type of variations (e.g., typos, substitutions,
insertions, deletions)
● Evaluate its adaptability to different types of input variations
4 Write a program to implement top-down and bottom-up parser using appropriate context free
grammar.
5 Given the following short movie reviews, each labeled with a genre, either comedy or action:
● fun, couple, love, love comedy
● fast, furious, shoot action
● couple, fly, fast, fun, fun comedy
● furious, shoot, shoot, fun action
● fly, fast, shoot, love action and
A new document D: fast, couple, shoot, fly
Compute the most likely class for D. Assume a Naive Bayes classifier and use add-1 smoothing
for the likelihoods.
6 Demonstrate the following using appropriate programming tool which illustrates the use of
information retrieval in NLP:
● Study the various Corpus – Brown, Inaugural, Reuters, udhr with various methods like
filelds, raw, words, sents, categories
2
● Create and use your own corpora (plaintext, categorical)
● Study Conditional frequency distributions
● Study of tagged corpora with methods like tagged_sents, tagged_words
● Write a program to find the most frequent noun tags
● Map Words to Properties Using Python Dictionaries
● Study Rule based tagger, Unigram Tagger
Find different words from a given plain text without any space by comparing this text with a
given corpus of words. Also find the score of words.
7
Write a Python program to find synonyms and antonyms of the word "active" using WordNet.
8 Implement the machine translation application of NLP where it needs to train a machine
translation model for a language with limited parallel corpora. Investigate and incorporate
techniques to improve performance in low-resource scenarios.
Course outcomes (Course Skill Set):
At the end of the course, the student will be able to:
● Apply the fundamental concept of NLP, grammar-based language model and statistical-based
language model.
● Model morphological analysis using Finite State Transducers and parsing using context-free
grammar and different parsing approaches.
● Develop the Naïve Bayes classifier and sentiment analysis for Natural language problems and text
classifications.
● Apply the concepts of information retrieval, lexical semantics, lexical dictionaries such as
WordNet, lexical computational semantics, distributional word similarity.
● Identify the Machine Translation applications of NLP using Encode and Decoder.
Assessment Details (both CIE and SEE)
The weightage of Continuous Internal Evaluation (CIE) is 50% and for Semester End Exam (SEE) is 50%.
The minimum passing mark for the CIE is 40% of the maximum marks (20 marks out of 50) and for the
SEE minimum passing mark is 35% of the maximum marks (18 out of 50 marks). A student shall be
deemed to have satisfied the academic requirements and earned the credits allotted to each subject/
course if the student secures a minimum of 40% (40 marks out of 100) in the sum total of the CIE
(Continuous Internal Evaluation) and SEE (Semester End Examination) taken together.

CIE for the theory component of the IPCC (maximum marks 50)
● IPCC means practical portion integrated with the theory of the course.
● CIE marks for the theory component are 25 marks and that for the practical component is 25
marks.
● 25 marks for the theory component are split into 15 marks for two Internal Assessment Tests (Two
Tests, each of 15 Marks with 01-hour duration, are to be conducted) and 10 marks for other
assessment methods mentioned in 22OB4.2. The first test at the end of 40-50% coverage of the
syllabus and the second test after covering 85-90% of the syllabus.
● Scaled-down marks of the sum of two tests and other assessment methods will be CIE marks for the
theory component of IPCC (that is for 25 marks).
● The student has to secure 40% of 25 marks to qualify in the CIE of the theory component of IPCC.

CIE for the practical component of the IPCC


● 15 marks for the conduction of the experiment and preparation of laboratory record, and 10 marks
for the test to be conducted after the completion of all the laboratory sessions.

3
● On completion of every experiment/program in the laboratory, the students shall be evaluated
including viva-voce and marks shall be awarded on the same day.
● The CIE marks awarded in the case of the Practical component shall be based on the continuous
evaluation of the laboratory report. Each experiment report can be evaluated for 10 marks. Marks of
all experiments’ write-ups are added and scaled down to 15 marks.
● The laboratory test (duration 02/03 hours) after completion of all the experiments shall be
conducted for 50 marks and scaled down to 10 marks.
● Scaled-down marks of write-up evaluations and tests added will be CIE marks for the laboratory
component of IPCC for 25 marks.
● The student has to secure 40% of 25 marks to qualify in the CIE of the practical component of the IPCC.

SEE for IPCC


Theory SEE will be conducted by University as per the scheduled timetable, with common question
papers for the course (duration 03 hours)
1. The question paper will have ten questions. Each question is set for 20 marks.
2. There will be 2 questions from each module. Each of the two questions under a module (with a
maximum of 3 sub-questions), should have a mix of topics under that module.
3. The students have to answer 5 full questions, selecting one full question from each module.
4. Marks scored by the student shall be proportionally scaled down to 50 Marks

The theory portion of the IPCC shall be for both CIE and SEE, whereas the practical portion will
have a CIE component only. Questions mentioned in the SEE paper may include questions from the
practical component.
Suggested Learning Resources:
Textbook:
1. Tanveer Siddiqui, U.S. Tiwary, “Natural Language Processing and Information Retrieval”,
Oxford University Press.
2. Daniel Jurafsky, James H. Martin, “Speech and Language Processing, An Introduction to
Natural Language Processing, Computational Linguistics, and Speech Recognition”, Pearson
Education, 2023.
Reference Books:
1. Akshay Kulkarni, Adarsha Shivananda, “Natural Language Processing Recipes - Unlocking
Text Data with Machine Learning and Deep Learning using Python”, Apress, 2019.
2. T V Geetha, “Understanding Natural Language Processing – Machine Learning and Deep
Learning Perspectives”, Pearson, 2024.
3. Gerald J. Kowalski and Mark.T. Maybury, “Information Storage and Retrieval systems”,
Kluwer Academic Publishers.
Web links and Video Lectures (e-Resources):
1. https://www.youtube.com/watch?v=M7SWr5xObkA
2. https://youtu.be/02QWRAhGc7g
3. https://www.youtube.com/watch?v=CMrHM8a3hqw
4. https://onlinecourses.nptel.ac.in/noc23_cs45/preview
5. https://archive.nptel.ac.in/courses/106/106/106106211/

4
Activity Based Learning (Suggested Activities in Class)/ Practical Based learning

Text Classification Game (5 Marks)


● Objective: Learn supervised learning and text classification.
● Activity: Provide students with a set of documents (e.g., movie reviews) labeled as positive or
negative. Divide them into groups and have them create a simple classification model using
keywords or phrases. They can then test their model on new reviews.
Grammar Check and Correction (5 Marks)
● Objective: Learn about language structure and NLP tools.
● Activity: Provide sentences with grammatical errors. Students can use grammar checking tools
(like Grammarly or LanguageTool) to identify errors and suggest corrections, discussing why
each suggestion is made.

You might also like