0% found this document useful (0 votes)
265 views10 pages

Natural Language processing-Regular-HO

The document outlines a course on natural language processing, including 11 chapters that cover topics such as regular expressions, text classification using naive bayes and logistic regression, language models, word embeddings, word disambiguation, part-of-speech tagging, parsing, machine translation, and applications like question answering and chatbots; it lists 2 textbooks and other references as well as 3 learning outcomes focusing on understanding NLP techniques, algorithms, and applications to machine translation and information extraction.

Uploaded by

bhavana2264
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
265 views10 pages

Natural Language processing-Regular-HO

The document outlines a course on natural language processing, including 11 chapters that cover topics such as regular expressions, text classification using naive bayes and logistic regression, language models, word embeddings, word disambiguation, part-of-speech tagging, parsing, machine translation, and applications like question answering and chatbots; it lists 2 textbooks and other references as well as 3 learning outcomes focusing on understanding NLP techniques, algorithms, and applications to machine translation and information extraction.

Uploaded by

bhavana2264
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

 

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI


WORK INTEGRATED LEARNING PROGRAMMES

COURSE HANDOUT

Part A: Content Design


 
Course Title Natural Language Processing
Course No(s)
Credit Units
Course Author Vijayalakshmi
Version No 1.0
Date

Course Objectives
No Course Objective

CO1 To learn the fundamental concepts and techniques of natural language processing (NLP)

CO2 To learn computational properties of natural languages and the commonly used algorithms
for processing linguistic information

Text Book(s)
T1 Speech and Language processing: An introduction to Natural Language Processing,
Computational Linguistics and speech Recognition by Daniel Jurafsky and James H.
Martin[2​nd​ edition]

Speech and Language processing: An introduction to Natural Language Processing,


Computational Linguistics and speech Recognition by Daniel Jurafsky and James H.
Martin[3rd edition]

T2 Foundations of statistical Natural language processing by Christopher D.Manning and


Hinrich schutze
Reference Book(s) & other resources
Handbook of Natural Language Processing, Second Edition—NitinIndurkhya, Fred J.
R1 Damerau, Fred J. Damerau
R2 Natural Language Processing with Python by Steven Bird, Ewan Klein, Edward Lopper

Modular Content Structure

1. Introduction

1.1 Knowledge in speech and language processing


1.2 Ambiguity
1.3 Models and Algorithms
1.4 Language ,thought and understanding
1.5 The State of the art

2. Regular Expressions, Text Normalization, Edit Distance

2.1 Regular Expressions


2.1.1 Basic Regular Expression Patterns
2.1.2 Disjunction, Grouping, and Precedence
2.1.3 More Operators
2.1.4 Regular Expression Substitution, Capture Groups
2.1.5 Look ahead assertions
2.2 Words
2.3 Corpora
2.4 Text Normalization
2.5 Minimum Edit Distance

3. Text classifications

3.1 Naive Bayes and Sentiment Classication


3.1.1 Naive Bayes Classiers
3.1.2 Naïve Bayes for other texts classification task
3.1.3 Naïve Bayes as a language model
3.1.4 Evaluation: Precision, Recall, F Measure
3.1.5 Statistical significance testing

3.2 Logistic Regression

3.2.1 Classication: the sigmoid


3.2.2 Learning in Logistic Regression
3.2.3 The cross-entropy loss function
3.2.4 Gradient Descent
3.2.5 Regularization
3.2.6 Multinomial logistic regression
3.2.7 Interpreting models
4. Language models

​4.1 N gram language models

4.1.1 N-Grams
4.1.2 Evaluating Language Models
4.1.3 Generalization and Zeros
4.1.4 Unknown Words
4.1.5 Smoothing

4.2 ​Neural Networks and Neural Language Models

4.2.1 The XOR problem


4.2.2 Feed-Forward Neural Networks
4.2.3Training Neural Nets
4.2.4 Neural Language Models

5. Lexical Analysis

5.1Lexical semantics
5.2Vector semantics
5.3 Words and vectors
5.4 Cosine for measuring similarity
5.5 TF-IDF : Weighing terms in the vector
5.6 Application of the tf-idf vector model
5.7 Word2vec
5.8 Visualizing Embedding
5.9 Semantic properties of embedding
5.10Bias and Embedding
5.11 Evaluating Vector Models

6. Word Disambiguation

6.1 Supervised Disambiguation


6.1.1 Bayesian classification
6.1.2FlipFlop algorithm
6.2 Dictionary-Based Disambiguation
6.2.1 Thesaurus-based disambiguation
6.2.2 Disambiguation based on translations in a second-language corpus
6.2.3 One sense per discourse, one sense per collocation
6.3 Unsupervised Disambiguation
6.4 Evaluation
6.4.1 Pseudo words
6.4.2 Upper and lower bounds on performance

7. Grammar
7.1 Introduction to Markov models and Hidden Markov models
7.2 Part-of-Speech Tagging
7.2.1 The Information Sources in Tagging
7.2.2 Markov Model Taggers
7.2.3 The probabilistic model
7.2.4 The Viterbi algorithm
7.2.5 Variations
7.3 Hidden Markov Model Taggers
7.3.1 Applying HMMs to POS tagging
7.3.2 The effect of initialization on HMM training
7.4 Transformation-Based Learning of Tags
7.4.1 Transformations
7.4.2 The learning algorithm
7.4.3 Relation to other models
7.4.4 Automata

8. Probabilistic Context Free Grammars-Statistical Parsing

8.1 Some Features of PCFGs


8.2 Questions for PCFGs
8.3 The Probability of a String
8.3.1 using inside probabilities
8.3.2 using outside probabilities
8.3.3 Finding the most likely parse for a sentence
8.3.4 Training a PCFG 398
8.4 Problems with the Inside-Outside Algorithm

9 Syntactic parsing and Dependency parsing

​9.1 CKY parsing


9.2 Partial parsing
9.3Dependence parsing
9.3.1Dependency Relations
9.3.2Dependency Formalisms
9.3.3 Dependency Treebanks
9.3.4 Transition-Based Dependency Parsing
9.3.5 Graph-Based Dependency Parsing
9.3.6 Evaluation

10. ​Statistical Machine translation


10.1 Introduction
10.2 Approaches
10.3 Language models
10.4 Word alignment
10.5 Translation models
10.5.1 IBM models
10.5.2 Phrase Based systems
10.5.3 Syntax based systems
10.6 Direct translation models
​11. Applications

11.1 Question answering system


11.1.1 IR-based Factoid Question Answering
11.1.2 Knowledge-based Question answering
11.1.3 Using multiple information sources: IBM’s Watson
11.1.4 Evaluation of Factoid Answers

11.2 Dialog Systems and Chatbots


11.2.1 Chatbots
11.2.1.1 Rule based Chatbots
11.2.1.2 Corpus based Chatbots
11.2.1.3 Evaluation of Chatbots
11.2.2 Frame Based Dialog Agents
11.2.2.1 Natural language understanding for filling slots
11.2.2.2 Evaluation
11.2.2.3 Voice XML
11.2.2.4 Evaluating Dialog Systems
11.2.2.5 Dialog System Design

11.3WordNet: Word Relations, Senses, and Disambiguation


11.3.1 Word Senses
11.3.2 WordNet: A Database of Lexical Relations
11.3.3 Word Similarity: Thesaurus Methods
11.3.4 Word Sense Disambiguation: Overview
11.3.5 Supervised Word Sense Disambiguation
11.3.6 WSD: Dictionary and Thesaurus Methods
11.3.7 Semi-Supervised WSD: Bootstrapping
11.3.8 Unsupervised Word Sense Induction

Learning Outcomes:

No Learning Outcomes

LO1 Should have a good understanding of the field of natural language processing.

LO2 Should have an algorithms and techniques used in this field.

LO3 Should also understand the how natural language processing is used in Machine
translation and Information extraction.

Part B: Contact Session Plan


Academic Term
Course Title Natural Language processing
Course No
Lead Instructor

Course Contents

Contact List of Topic Title Topic # Text/Ref


Hour (from content structure in Part A) (from Book/external
content resource
structure in
Part A)

1-2 Introduction Chapter1 T1[2​nd​ edition]


Knowledge in speech and language processing,
Ambiguity, Models and Algorithms, Language,
thought and understanding, The State of the art.

2-4 Regular Expression: Chapter2 T1[3​rd​ edition]

Basic Regular Expression Pattern, Disjunction,


Grouping, and Precedence​, More Operators​, Regular
Expression Substitution, Capture Groups​, Look ahead
assertions, Words ​, Corpora ​, Text Normalization ​,
Minimum Edit Distance

5-6 Text classifications: Chapter 4 T1[3​rd​ edition]

Naive Bayes and Sentiment Classication: Naive


Bayes Classiers​, Naïve Bayes for other texts
classification task, Naïve Bayes as a language model
Evaluation: Precision, Recall, F Measure​, Statistical
significance testing

7-8 ​ ogistic Regression: Classication: the sigmoid ​,


L Chapter5 T1[3​rd​ edition]
Learning in Logistic Regression ,The cross-entropy
loss function ​, Gradient Descent, Regularization ​,
Multinomial logistic regression , Interpreting models

9-10 Language models: Chapter3 T1[3​rd​ edition]


N gram language models: N-Grams ​, Evaluating
Language Models ​, Generalization and
Zeros​,​Unknown Words​,​ Smoothing

11-12 Neural Networks and Neural Language Models: Chapter7 T1[3​rd​ edition]
The XOR problem , Feed-Forward Neural Networks
Training Neural Nets, Neural Language Models

13-14 Lexical analysis: Chapter 6 T1[3​rd​ edition]


​Lexical semantics, Vector semantics​, ​Words and
vectors, Cosine for measuring similarity​, ​TF-IDF :
Weighing terms in the vector, Application of the tf-idf
vector model ,Word2vec,Visualizing Embedding,
Semantic properties of embedding, Bias and
Embedding, Evaluating Vector Models.

15-16 Computational Lexical semantics Chapter7 T2

Word Disambiguation
Supervised Disambiguation :Bayesian classification
,An information-theoretic approach
Dictionary-Based Disambiguation : Thesaurus-based
disambiguation , Disambiguation based on translations
in a second-language corpus ,One sense per discourse,
one sense per collocation
Unsupervised Disambiguation
Evaluation :Pseudo words, Upper and lower bounds
on performance

17-18 Grammar Chapter9 T2

Introduction to Markov Models


Markov Models, Hidden Markov Models, Why use
HMMs? ,General form of an HMM ,The Three
Fundamental Questions for HMMs , Finding the
probability of an observation ,Finding the best state
sequence
MMs: Implementation, Properties, and Variants

19-20 Part-of-Speech Tagging Chapter10 T2


Chapter 8 can T1[3​rd​ edition]
The Information Sources in Tagging , Markov Model also be
Taggers, The probabilistic model , The Viterbi referred
algorithm, Variations, Hidden Markov Model Taggers
, Applying HMMs to POS tagging

The effect of initialization on HMM training


:Transformation-Based Learning of Tags ,
Transformations, The learning algorithm , Relation to
other models, Automata

21-22 Probabilistic Context Free Grammars Chapter 11 T2


Some Features of PCFGs :Questions for PCFGs ,The
Probability of a String , using inside probabilities ,
using outside probabilities , Finding the most likely
parse for a sentence, Training a PCFG ,Problems with
the Inside-Outside Algorithm

23-24 Syntactic Parsing : Chapter 11 T1[3​rd​ edition]


CKY Parsing
Partial Parsing

Dependency Parsing​:
Dependency Relations, Dependency Formalisms, Chapter13 T1[3​rd​ edition]
Dependency Treebanks, Transition-Based Dependency
Parsing, Graph-Based Dependency Parsing ,Evaluation
24-25 Statistical Machine translation : Introduction Chapter 17,18 R2
Approaches, Language models, Word alignment
Translation models :
IBM models, Phrase Based systems, Syntax based
systems, Direct translation models
Example :Chinese Machine translation

25-26 ​Applications: Chapter 23 T1[3​rd​ edition]

Question answering system​: IR-based Factoid R1


Question Answering, Knowledge-based Question
answering, Using multiple information sources: IBM’s
Watson, Evaluation of Factoid Answers.

27-28 Dialog Systems and Chatbots Chapter 24 T1[3​rd​ edition]


Chatbots :Rule based Chatbots, Corpus based
Chatbots, Evaluation of Chatbots

Frame Based Dialog Agents​: Natural language


understanding for filling slots, Evaluation ,Voice
XML, Evaluating Dialog Systems, Dialog System
Design

29-30 WordNet: Word Relations, Senses, and Appendix c T1[3​rd​ edition]


Disambiguation , Word Senses , WordNet: A Database
of Lexical Relations
Word Similarity: Thesaurus Methods
Word Sense Disambiguation: Overview, Supervised
Word Sense Disambiguation WSD: Dictionary and
Thesaurus Methods ,Semi-Supervised WSD:
Bootstrapping Unsupervised Word Sense Induction

31-32 Summary.

Evaluation Scheme
Evaluation Name Type Weight Duration Day, Date, Session,
Component (Quiz, Lab, Project, (Open book, Time
Midterm exam, End Closed book,
semester exam, etc) Online, etc.)

EC – 1 Quizzes / Assignment 20% To be announced

EC – 2 Mid-term Exam Closed book 30% 2 hours To be announced

EC – 3 End Semester Exam Open book 50% 3 hours To be announced


Note​ - Evaluation components can be tailored depending on the proposed model.
Important Information

Syllabus for Mid-Semester Test (Closed Book): Topics in Weeks 1-8 (1-18 Hours)
Syllabus for Comprehensive Exam (Open Book): All topics given in plan of study

Evaluation Guidelines:
1. EC-1 consists of either two Assignments or three Quizzes. Announcements regarding the
same will be made in a timely manner.
2. For Closed Book tests: No books or reference material of any kind will be permitted.
Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.
3. For Open Book exams: Use of prescribed and reference text books, in original (not
photocopies) is permitted. Class notes/slides as reference material in filed or bound form is
permitted. However, loose sheets of paper will not be allowed. Use of calculators is permitted
in all exams. Laptops/Mobiles of any kind are not allowed. Exchange of any material is not
allowed.
4. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the
student should follow the procedure to apply for the Make-Up Test/Exam. The genuineness of
the reason for absence in the Regular Exam shall be assessed prior to giving permission to
appear for the Make-up Exam. Make-Up Test/Exam will be conducted only at selected exam
centres on the dates to be announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self-study
schedule as given in the course handout, attend the lectures, and take all the prescribed evaluation
components such as Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the
evaluation scheme provided in the handout.

</DIV> 

You might also like