Lecture01 2020 TheNLPPipeline
Lecture01 2020 TheNLPPipeline
u re t h a
e c t e r
7 L p u t
4 m x t : e
C S 4 c o ’ te l i n
g a d s i p e
i n ta n P p
i ld e r s N L
Bu und n a l
‘ i t i o
r ad
e t
T h
รอยัลลิสต์มาร์เก็ตเพลส: เฟซบุ๊ก
เตรียมดำเนินทางการกฎหมายกับรัฐบาล
ไทย หลังบังคับบล็อกการเข้าถึงกลุ่มปิดที่
พูดคุยเกี่ยวกับราชวงศ์
ኣብ ሳዋ ዝወሃብ መበል 12
ክፍሊ ትምህርቲ ክቋረጽ ጎስጓስ
ይካየድ ኣሎ
Qabiyyeen xalayaa dhimma Obbo
Lidatu Ayyaaloorratti MM Abiyyiif
barraa'e maali?
'Dim angen cau tafarndai a bwytai i
ailagor ysgolion'
NLP task:
What is the most likely segmentation/tokenization?
Referential ambiguity
“John saw Jim. He was drinking coffee.” Who was drinking coffee?
S
VP
NP
NOUN NP
Correct analysis
VP
NP
PP
V NP P NP
eat sushi with tuna eat sushi with tuna
VP
VP PP
V NP P NP
eat sushi with chopsticks eat sushi with chopsticks
Incorrect analysis
VP
VP PP
V NP P NP
eat sushi with tuna eat sushi with tuna
VP
NP
PP
V NP P NP
eat sushi with chopsticks eat sushi with chopsticks
Undergeneration
CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/ 12
NLP and automata theory
What kind of grammar/automaton
is required to analyze natural language?
S
VP
NP
NOUN NP
Tokenizer/Segmenter
– to identify words and sentences
Morphological analyzer/POS-tagger
– to identify the part of speech and structure of words
Word sense disambiguation
– to identify the meaning of words
Syntactic/semantic Parser
– to obtain the structure and meaning of sentences
Coreference resolution/discourse model
– to keep track of the various entities and events mentioned
CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/ 17
NLP Pipeline: Assumptions
Each step in the NLP pipeline embellishes the input with
explicit information about its linguistic structure
– POS tagging: parts of speech of word,
– Syntactic parsing: grammatical structure of sentence,….