NLP Unit-5
NLP Unit-5
1
Amit Pimpalkar
Department Vision and Mission
Our Vision
• To Continually improve the education environment, in order to develop
graduates with strong academic and technical background needed to
achieve distinction in the discipline. The excellence is expected in various
domains like workforce, higher studies or lifelong learning. To strengthen
links between industry through partnership and collaborative
development works.
Our Mission
• To develop strong foundation of theory and practices of computer science
amongst the students to enable them to develop into knowledge,
responsible professionals, lifelong learners and implement the latest
computing technologies for the betterment of the society.
2
Amit Pimpalkar
Course Objectives & Course Outcome
Course Objective Course Outcomes
I. To familiarize the concepts and After the completion of the course, student will be able
techniques of Natural language to
Processing for analyzing words 1. Apply the Principles and Process of Human
based on Morphology and Languages using computers.
CORPUS. 2. Demonstrate the state-of-the-art algorithms and
II. To relate mathematical techniques for text-based processing of natural
foundations, Probability theory languages with respect to morphology.
with Linguistic essentials such 3. Perform POS tagging for a given natural language
as syntactic and semantic 4. Create Linguistics CORPUS based on Text Corpus
analysis of text. method
III. To apply the Statistical learning 5. Realize semantics and pragmatics of natural
methods and cutting-edge languages for text processing
research models to solve NLP 6. Develop a Statistical Methods for Real World NLP
problems Applications.
3
Amit Pimpalkar
Scheme & Syllabus
Load Credit Total Mark Continuous Assessment ESE Mark
3 hrs (Theory) + 0 hr (Tutorial) 3 100 40 60
Text Books:
Reference Books:
6
Amit Pimpalkar
Text & Reference Book
7
Amit Pimpalkar
UNIT 5: Probabilistic Parsing and Disambiguation
8
Amit Pimpalkar
Remember Unit 3: Parsing (Top-down Parsing)
Grammar
S →NP VP
NP → N N
VP →V NP
NP → N
Lexicon
Fed : N
interest : N, V
rates : N
raises : N, V
UNIT 5: Probabilistic CFGs
Handling Ambiguities
• The ambiguities handling algorithms are equipped to represent ambiguities
efficiently but not to resolve them.
• Methods available for resolving ambiguities include:
• Semantics (choose parse that makes sense).
• Statistics: (choose parse that is most likely).
• Probabilistic context-free grammars (PCFGs) offer a solution.
10
Amit Pimpalkar
UNIT 5: Probabilistic CFGs
• A context-free grammar is a tuple <N, T, S, R>
• N : the set of non-terminals
• Phrasal categories: S, NP, VP, ADJ, etc.
• Parts-of-speech (pre-terminals): NN, JJ, DT, VB
• T : the set of terminals (the words)
• S : the start symbol
• Often written as ROOT or TOP
• Not usually the sentence non-terminal S
• R : the set of rules
• Of the form X Y1 Y2 … Yk, with X, Yi N
• Examples: S NP VP, VP VP CC VP
• Also called rewrites, productions, or local trees
• A PCFG adds:
• A top-down production probability per rule P(Y1 Y2 … Yk | X)
11
Amit Pimpalkar
UNIT 5: Probabilistic CFGs
• The probabilistic model
• Assigning probabilities to parse trees
• Getting the probabilities for the model
• Parsing with probabilities
• Slight modification to dynamic programming approach
• Task is to find the max probability tree for an input
• Getting the Probabilities
• From an annotated database (a treebank)
• Learned from a corpus
• Assume PCFG is in Chomsky Normal Form
• (production is either A → B C or A → a)
12
Amit Pimpalkar
UNIT 5: Probabilistic CFGs
A BC and Aa
14
Amit Pimpalkar
UNIT 5: Probabilistic CFGs
Examples:
S AS S AS
S a S AAS
A SA A SA
Ab A aa
15
Amit Pimpalkar
UNIT 5: A Simple PCFG (in CNF)
S NP VP 1.0 NP NP PP 0.4
VP V NP 0.7 NP astronomers 0.1
VP VP PP 0.3 NP ears 0.18
PP P NP 1.0 NP saw 0.04
P with 1.0 NP stars 0.18
V saw 1.0 NP telescope 0.1
16
Amit Pimpalkar
UNIT 5: A Simple PCFG (in CNF)
S NP VP 1.0 NP NP PP 0.4
VP V NP 0.7 NP astronomers 0.1
VP VP PP 0.3 NP ears 0.18
PP P NP 1.0 NP saw 0.04
P with 1.0 NP stars 0.18
V saw 1.0 NP telescope 0.1
17
Amit Pimpalkar
UNIT 5: A Simple PCFG (in CNF)
Tree and String Probabilities S NP VP 1.0 NP NP PP 0.4
VP V NP 0.7 NP astronomers 0.1
VP VP PP 0.3 NP ears 0.18
PP P NP 1.0 NP saw 0.04
P with 1.0 NP stars 0.18
V saw 1.0 NP telescope 0.1
24
Amit Pimpalkar
UNIT 5: Example of Inside Probabilities (CYK Algorithm)
S NP VP 1.0 NP NP PP 0.4 John Cocke, Daniel Younger, Tadao
VP V NP 0.7 NP astronomers 0.1
Kasami, and Jacob Schwartz (1961)
VP VP PP 0.3 NP ears 0.18
PP P NP 1.0 NP saw 0.04
P with 1.0 NP stars 0.18 astronomers saw stars with ears
V saw 1.0 NP telescope 0.1
1 2 3 4 5
1 NP = 0.1
2 NP = 0.04
V = 1.0
3 NP = 0. 18
4 P = 1.0
5 NP = 0. 18
astronomers saw stars with ears
25
Amit Pimpalkar
UNIT 5: Example of Inside Probabilities (CYK Algorithm)
S NP VP 1.0 NP NP PP 0.4
VP V NP 0.7 NP astronomers 0.1 astronomers saw stars with ears
VP VP PP 0.3 NP ears 0.18
PP P NP 1.0 NP saw 0.04
P with 1.0 NP stars 0.18
V saw 1.0 NP telescope 0.1
1 2 3 4 5
1 NP = 0.1 ----
2 NP = 0.04 VP = 0.126
V = 1.0 (1 x 0.7 x 0. 18)
3 NP = 0. 18
4 P = 1.0
5 NP = 0. 18
astronomers saw stars with ears
26
Amit Pimpalkar
UNIT 5: Example of Inside Probabilities (CYK Algorithm)
S NP VP 1.0 NP NP PP 0.4
VP V NP 0.7 NP astronomers 0.1 astronomers saw stars with ears
VP VP PP 0.3 NP ears 0.18
PP P NP 1.0 NP saw 0.04
P with 1.0 NP stars 0.18
V saw 1.0 NP telescope 0.1
1 2 3 4 5
1 NP = 0.1 ---- S = 0.0126
(0.1 x 1.0 x 0.126 )
2 NP = 0.04 VP = 0.126
V = 1.0 (1 x 0.7 x 0. 18)
3 NP = 0. 18
4 P = 1.0
5 NP = 0. 18
astronomers saw stars with ears
27
Amit Pimpalkar
UNIT 5: Example of Inside Probabilities (CYK Algorithm)
S NP VP 1.0 NP NP PP 0.4
VP V NP 0.7 NP astronomers 0.1
astronomers saw stars with ears
VP VP PP 0.3 NP ears 0.18
PP P NP 1.0 NP saw 0.04
P with 1.0 NP stars 0.18
V saw 1.0 NP telescope 0.1
1 2 3 4 5
1 NP = 0.1 S = 0.0126 ----
(0.1 x 1.0 x 0.126 )
2 NP = 0.04 VP = 0.126 ----
V = 1.0 (1 x 0.7 x 0. 18)
3 NP = 0. 18 ----
4 P = 1.0
5 NP = 0. 18
astronomers saw stars with ears
28
Amit Pimpalkar
UNIT 5: Example of Inside Probabilities (CYK Algorithm)
S NP VP 1.0 NP NP PP 0.4
VP V NP 0.7 NP astronomers 0.1
astronomers saw stars with ears
VP VP PP 0.3 NP ears 0.18
PP P NP 1.0 NP saw 0.04
P with 1.0 NP stars 0.18
V saw 1.0 NP telescope 0.1
1 2 3 4 5
1 NP = 0.1 S = 0.0126 ----
(0.1 x 1.0 x 0.126 )
2 NP = 0.04 VP = 0.126 ----
V = 1.0 (1 x 0.7 x 0. 18)
3 NP = 0. 18 ----
4 P = 1.0 PP = 0. 18
(1.0 x 1.0 x 0.18)
5 NP = 0. 18
astronomers saw stars with ears
29
Amit Pimpalkar
UNIT 5: Example of Inside Probabilities (CYK Algorithm)
S NP VP 1.0 NP NP PP 0.4
VP V NP 0.7 NP astronomers 0.1
astronomers saw stars with ears
VP VP PP 0.3 NP ears 0.18
PP P NP 1.0 NP saw 0.04
P with 1.0 NP stars 0.18
V saw 1.0 NP telescope 0.1
1 2 3 4 5
1 NP = 0.1 S = 0.0126 ---- S = 0.0015876
(0.1 x 1.0 x 0.126 ) (ref next slide for calculation)
2 NP = 0.04 VP = 0.126 ---- VP = 0.015876
V = 1.0 (1 x 0.7 x 0. 18) (ref next slide for calculation)
31
Amit Pimpalkar
UNIT 5: Example of Inside Probabilities (CYK Algorithm)
32
Amit Pimpalkar
UNIT 5: Example of Inside Probabilities (CYK Algorithm)
S NP VP 0.8 V includes 0.05
NP DT N 0.3 DT the | a 0.4
VP V NP 0.2 N meals 0.01
N flight 0.02
33
Amit Pimpalkar
UNIT 5: Example of Inside Probabilities (CYK Algorithm)
S NP VP 1.0 Vi sleeps 1.0
VP Vi 0.3 Vt saw 1.0
VP Vt NP 0.5 NN man | woman 0.1
VP VP PP 0.2 NN telescope 0.3
NP DT NN 0.8 NN dog 0.5
NP NP PP 0.2 DT the 1.0
PP IN NP 1.0 IN with 0.6
IN in 0.4
34
Amit Pimpalkar
UNIT 5: Example of Inside Probabilities (CYK Algorithm)
• English Penn Treebank: Standard corpus for testing syntactic parsing consists
of 1.2 M words of text from the Wall Street Journal (WSJ).
• Typical to train on about 40,000 parsed sentences and test on an additional
standard disjoint test set of 2,416 sentences.
• Chinese Penn Treebank: 100K words from the Xinhua news service.
• Other corpora existing in many languages, see the Wikipedia article
“Treebank”
• Treebanks used in improving machine translation systems
38
Amit Pimpalkar
Other Resources to study NLP
• https://onlinecourses.nptel.ac.in/noc21_cs19
• By Prof. Sourav Mukhopadhyay | IIT Kharagpur
• https://onlinecourses.nptel.ac.in/noc20_cs87
• By Prof. Ramaseshan R | Chennai Mathematical Institute (CMI)
• https://docs.microsoft.com/en-us/learn/paths/explore-natural-language-
processing/
• https://www.upgrad.com/machine-learning-nlp-pgc-iiitb
• https://www.amazon.science/latest-news/machine-learning-course-free-
online-from-amazon-machine-learning-university
• https://online.stanford.edu/courses/xcs224n-natural-language-processing-
deep-learning
• https://courses.cs.washington.edu/courses/cse517/
Amit Pimpalkar
39
Other Resources to study NLP
• https://cse.iitk.ac.in/users/cs671/2013/resources.html
• https://home.cs.colorado.edu/~martin/slp.html
• https://web.stanford.edu/~jurafsky/NLPCourseraSlides.html
• https://nlp.stanford.edu/teaching/
• https://tildesites.bowdoin.edu/~allen/nlp/
• https://nlp-iiith.vlabs.ac.in/
• www.purenlp.com
• https://www.nlpworks.com/
• www.compendiumdev.co.uk/nlp
40
Amit Pimpalkar
Other Resources to study NLP
• Natural Language Processing, IIT Kharagpur
• Prof. Pawan Goyal
• https://nptel.ac.in/courses/106105158 (Enrolment Opens: 2023-11-09 to 2024-01-29)
Amit Pimpalkar
Thank You ! 42
43
Amit Pimpalkar