NPL CH1
NPL CH1
10/19/2023 1
Outline
Introduction to NLP
What is NLP?
Aspects of Language Processing
Goal of NLP
History of NLP
Application of NLP
Open Problems
Knowledge Sources
Computational morphology
10/19/2023 2
What is Natural Language Processing ?
10/19/2023 3
What is Natural Language Processing ?
10/19/2023 4
What is Natural Language Processing ?
“Natural” languages:
Geez, Amharic, Oromifa, Tigrigna,, English, Mandarin, French,
Swahili, Arabic, …
10/19/2023 6
Aspects of Language Processing
A finer-grained decomposition of
the process is useful when taken
into account the current state of
the art in combination with the
need to deal with real language
data as reflected in Figure.
10/19/2023 8
Aspects of Language Processing
10/19/2023 9
Aspects of Language Processing
Syntax:
Sentence structure, phrase, grammar, …
Semantics:
Meaning,
Execute commands
Discourse analysis:
Meaning of a text,
Relationship between sentences (e.g. anaphora)
10/19/2023 10
Aspects of Language Processing
10/19/2023 11
Aspects of Language Processing
10/19/2023 12
Aspects of Language Processing
Syntax
Lemmatization:
Lemmatization usually refers to doing things properly with the use
of a vocabulary and morphological analysis of words, normally
aiming to remove inflectional endings only and to return the base
or dictionary form of a word, which is known as the lemma.
Morphological segmentation:
Separate words into individual morphemes and identify the class of
the morphemes.
The difficulty of this task depends greatly on the complexity of the
morphology (i.e. the structure of words) of the language being
considered.
10/19/2023 13
Aspects of Language Processing
Syntax …
Part-of-speech tagging:
Example, "book" can be a noun ("the book on the table") or verb
("to book a flight")
10/19/2023 14
Aspects of Language Processing
Syntax …
Stemming
Stemming usually refers to a crude heuristic process that chops off the
ends of words in the hope of achieving this goal correctly most of the
time, and often includes the removal of derivational affixes.
Word segmentation
Separate a chunk of continuous text into separate words. For a language
like English, this is fairly trivial, since words are usually separated by
spaces. However, some written languages like Chinese, Japanese and Thai
do not mark word boundaries in such a fashion, and in those languages
text segmentation is a significant task requiring knowledge of the
vocabulary and morphology of words in the language.
10/19/2023 15
Aspects of Language Processing
Semantics (Individual Assignment One : Defined the terms)
Lexical semantics
Machine translation
Named entity recognition
Natural language generation
Natural language understanding
Optical character recognition
Question answering
Recognizing Textual entailment
Relationship extraction
Speech Recognition
Sentiment analysis
Topic segmentation
Word sense disambiguation
10/19/2023 16
Aspects of Language Processing
Discourse :
Automatic summarization
Coreference resolution
Given a sentence or larger chunk of text, determine which words
("mentions") refer to the same objects ("entities"). Anaphora resolution is a
specific example of this task, and is specifically concerned with matching up
pronouns with the nouns or names to which they refer.
The more general task of coreference resolution also includes identifying so-
called "bridging relationships" involving referring expressions.
For example, in a sentence such as "He entered John's house through the
front door", "the front door" is a referring expression and the bridging
relationship to be identified is the fact that the door being referred to is the
front door of John's house (rather than of some other structure that might also
be referred to).
Discourse analysis:
10/19/2023 17
Aspects of Language Processing
Speech Processing
Speech recognition
Speech segmentation
Given a sound clip of a person or people speaking,
separate it into words.
A subtask of speech recognition and typically grouped
with it.
10/19/2023 18
Goal of Natural Language Processing
Ultimate goal: Natural human-to-computer communication.
The goal of natural language processing (NLP) is to design and build
computer systems that are able to analyze natural languages like Geez,
Amharic, German or English, and that generate their outputs in a natural
language, too.
In natural language understanding, the objective is to extract the meaning
of an input sentence or an input text. Usually, the meaning is represented
in a suitable formal representation language so that it can be processed by
a computer.
The goal in text classification is to assign a text document to one out of
several text classes.
Example: for newspaper articles, such classes are sports reports,
finances, and politics.
10/19/2023 19
History of Natural Language Processing
1950s
Early MT: word translation + re-ordering
Chomsky’s Generative grammar
Bar-Hill’s argument
1960-80s
Applications:
BASEBALL: use NL interface to search in a database on baseball games.
LUNAR: NL interface to search in Lunar
ELIZA: simulation of conversation with a psychoanalyst
SHREDLU: use NL to manipulate block world
Message understanding: understand a newspaper article on terrorism
Machine translation
10/19/2023 20
History of Natural Language Processing
1960-80s
Methods
ATN (augmented transition networks): extended context-free grammar
Case grammar (agent, object, etc.)
DCG – Definite Clause Grammar
Dependency grammar: an element depends on another
1990s-now
Statistical methods
Speech recognition
MT systems
Question-answering
…
10/19/2023 21
History of Natural Language Processing
10/19/2023 22
History of Natural Language Processing
Discourse analysis
Pragmatic analysis
10/19/2023 23
History of Natural Language Processing
Treebank Annotation
Part-of-Speech Tagging
Statistical Parsing
Etc…
10/19/2023 24
NLP Applications
10/19/2023 25
NLP Applications
Speech Synthesis
Text to Speech:
10/19/2023 26
Open Problems in NLP
10/19/2023 27
Open Problems in NLP
Ambiguity
Lexical/morphological: change (V,N), training (V,N), even (ADJ,
ADV) …
Syntactic: Helicopter powered by human flies.
Semantic: He saw a man on the hill with a telescope.
Discourse: anaphora, …
10/19/2023 28
Knowledge Sources
When using NLP for a new domain, one also needs to answer what
text source should be used for extracting content.
Of course, not any arbitrary text source is applicable.
In order to qualify as a source, the text type needs to meet the
following two criteria:
Firstly, the text type needs to contain sufficient domain
knowledge.
In other words, if we choose a text type that only infrequently contains content
regarding a given domain, then we are not very likely to extract any significant
amount of knowledge.
• In the past, most research in NLP has been carried out on news corpora. The
topic that is predominant on this text type are issues out of the domain.
Consequently, this text type would be of little value for knowledge extraction.
Secondly, the text type should not only contain knowledge about
the domain that is already widely available in structured format
(such as databases)
10/19/2023 Otherwise, there would hardly be any point in extracting knowledge from those 29
texts as it would already be available.
Computational Morphology
What is it?
Morphology: the study/knowledge of structure/form
• In this case: of words,
• How words are created, structured, analyzed
• Morpheme: basic meaningful unit of language.
Computational morphology: developing/using computer
applications that involve morphology.
Computational applications:
10/19/2023 30
Computational Morphology
Morphological processes:
Affixation: prefix, suffix, infix
Interleaving (KaTaB, uKTaB)
Cliticization (isn’t, s’appelle)
Internal change: (sing/sang, goose/geese)
Suppletion (irregularity): (aller/ir, be/am)
Stress placement: implant, import, contest
Tone placement: dà vs. dá ( will spank vs. spanked)
Reduplication
Full: iji/ijiiji
Partial: lakad/lalakad
10/19/2023 31
Computational Morphology
10/19/2023 32
Computational Morphology
Computational morphology
Processing morphological structure via computer (parsing,
generation)
Traditional approach:
ad-hoc methods,
Cut-and-paste algorithms,
Dictionary lookup,
Inadequate for highly inflected languages.
Even statistical approaches are often un-useful,
Two-level approach w/finite-state techniques,
Machine learning is making inroads,
Sequence labeling, morpheme boundary detection.
10/19/2023 33
Question & Answer
10/19/2023 34
Thank You !!!
10/19/2023 35