0% found this document useful (0 votes)
36 views35 pages

NPL CH1

The document discusses natural language processing (NLP) and provides details about its key aspects. NLP aims to develop computers that can understand and generate human languages. It covers subfields like syntax, semantics, morphology, lexicography, and discourse analysis. The document explains each of these aspects at a high level and how they relate to processing natural written and spoken texts.

Uploaded by

Yohannes Bogale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views35 pages

NPL CH1

The document discusses natural language processing (NLP) and provides details about its key aspects. NLP aims to develop computers that can understand and generate human languages. It covers subfields like syntax, semantics, morphology, lexicography, and discourse analysis. The document explains each of these aspects at a high level and how they relate to processing natural written and spoken texts.

Uploaded by

Yohannes Bogale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Chapter 1 : NLP – Introduction

10/19/2023 1
Outline

Introduction to NLP
What is NLP?
Aspects of Language Processing
Goal of NLP
History of NLP
Application of NLP
Open Problems
Knowledge Sources
Computational morphology

10/19/2023 2
What is Natural Language Processing ?

 Natural Language Processing is one of the subfields of Artificial


Intelligence:
Natural Language Processing (NLP)
Knowledge representation
Automated reasoning
Machine learning Disciplines of AI
Computer Vision
Robotics

10/19/2023 3
What is Natural Language Processing ?

Develop computers that can understand human (“natural”)


language and speak human language.
Computers would be much easier to use with Natural Language
Interface.

Can we “teach” them to understand human language?


Many approaches to NLP are based on Machine Learning.

10/19/2023 4
What is Natural Language Processing ?

NLP is a field of computer science, artificial intelligence and


computational linguistics concerned with the interactions
between computers and human (natural) languages, and, in
particular, concerned with programming computers to fruitfully
process large natural language corpora*.

Sub-field of Artificial Intelligence, but very interdisciplinary.


Computer science, human-computer interaction (HCI), linguistics,
cognitive psychology, speech signal processing, …

 *A corpus is a collection of written or spoken texts.


 With the use of computers, it is possible to compile large
amounts of authentic written and spoken language.
10/19/2023 5
Aspects of Language Processing

 “Natural” languages:
Geez, Amharic, Oromifa, Tigrigna,, English, Mandarin, French,
Swahili, Arabic, …

NOT Java, C++, Perl, … (Programming Languages)

10/19/2023 6
Aspects of Language Processing

 Traditionally, work in NLP has tended to view the process of


language analysis as being decomposable into a number of
stages, mirroring the theoretical linguistic distinctions drawn
between SYNTAX, SEMANTICS, and PRAGMATICS.
The simple view is that the sentences of a text are first analyzed in
terms of their syntax.
This provides an order and structure that is more amenable to an
analysis in terms of semantics, or literal meaning; and

This is followed by a stage of pragmatic analysis whereby the


meaning of the utterance or text in context is determined.
This last stage is often seen as being concerned with DISCOURSE,
whereas the previous two are generally concerned with sentential
matters.
10/19/2023 7
Aspects of Language Processing

Figure 1. The Stages of Analysis in Processing Natural Language


 The tripartite distinction into
syntax, semantics, and pragmatics
only serves at best as a starting
point when to consider the
processing of real NL texts.

 A finer-grained decomposition of
the process is useful when taken
into account the current state of
the art in combination with the
need to deal with real language
data as reflected in Figure.

10/19/2023 8
Aspects of Language Processing

 Word, lexicon: lexical analysis:


 In lexical analysis, tokenization is the process of breaking a stream of
text up into words, phrases, symbols, or other meaningful elements
called tokens. The list of tokens becomes input for further processing
such as parsing or text mining.
 Tokenization is useful both in linguistics (where it is a form of text
segmentation), and in computer science, where it forms part of lexical
analysis.
 Tokenization is the task of chopping it up into pieces, called tokens ,
perhaps at the same time throwing away certain characters, such as
punctuation.
 Example:
Input: Friends, Romans, Countrymen, lend me your ears;
Output:

10/19/2023 9
Aspects of Language Processing

 Syntax:
 Sentence structure, phrase, grammar, …
 Semantics:
 Meaning,
 Execute commands

 Discourse analysis:
 Meaning of a text,
 Relationship between sentences (e.g. anaphora)

10/19/2023 10
Aspects of Language Processing

 Morphology: What is a word?


 奧林匹克運動會(希臘語: Ολυμπιακοί Αγώνες ,簡稱奧運會或 奧運)是國際
奧林匹克委員會主辦的包含多種體育運動項目的國際 性運動會,每四年舉
行一次。• ‫“ = آ‬to her houses”

 Lexicography: What does each word mean?


 He plays bass guitar.
 That bass was delicious!

 Syntax: How do the words relate to each other?


 The dog bit the man. ≠ The man bit the dog.
 But in Russian: человек собаку съел = человек съел собаку
 man ate a dog = man ate a dog

10/19/2023 11
Aspects of Language Processing

• Semantics: How can we infer meaning from sentences?


 I saw the man on the hill with the telescope.
 The ipod is so small! ☺
 The monitor is so small!

 Discourse: How about across many sentences?


 President Obama met with President-Elect Trump today at the
White House. He welcomed him, and showed him around.
 Who is “he”? Who is “him”? How would a computer figure that
out?

10/19/2023 12
Aspects of Language Processing
 Syntax
 Lemmatization:
 Lemmatization usually refers to doing things properly with the use
of a vocabulary and morphological analysis of words, normally
aiming to remove inflectional endings only and to return the base
or dictionary form of a word, which is known as the lemma.
 Morphological segmentation:
 Separate words into individual morphemes and identify the class of
the morphemes.
 The difficulty of this task depends greatly on the complexity of the
morphology (i.e. the structure of words) of the language being
considered.

10/19/2023 13
Aspects of Language Processing

 Syntax …
 Part-of-speech tagging:
 Example, "book" can be a noun ("the book on the table") or verb
("to book a flight")

 Sentence breaking (also known as sentence boundary disambiguation):


 Given a chunk of text, find the sentence boundaries. Sentence
boundaries are often marked by periods or other punctuation
marks, but these same characters can serve other purposes (e.g.
marking abbreviations).

10/19/2023 14
Aspects of Language Processing

 Syntax …
 Stemming
 Stemming usually refers to a crude heuristic process that chops off the
ends of words in the hope of achieving this goal correctly most of the
time, and often includes the removal of derivational affixes.
 Word segmentation
 Separate a chunk of continuous text into separate words. For a language
like English, this is fairly trivial, since words are usually separated by
spaces. However, some written languages like Chinese, Japanese and Thai
do not mark word boundaries in such a fashion, and in those languages
text segmentation is a significant task requiring knowledge of the
vocabulary and morphology of words in the language.

10/19/2023 15
Aspects of Language Processing
 Semantics (Individual Assignment One : Defined the terms)
 Lexical semantics
 Machine translation
 Named entity recognition
 Natural language generation
 Natural language understanding
 Optical character recognition
 Question answering
 Recognizing Textual entailment
 Relationship extraction
 Speech Recognition
 Sentiment analysis
 Topic segmentation
 Word sense disambiguation

10/19/2023 16
Aspects of Language Processing
 Discourse :
 Automatic summarization
 Coreference resolution
 Given a sentence or larger chunk of text, determine which words
("mentions") refer to the same objects ("entities"). Anaphora resolution is a
specific example of this task, and is specifically concerned with matching up
pronouns with the nouns or names to which they refer.
 The more general task of coreference resolution also includes identifying so-
called "bridging relationships" involving referring expressions.
 For example, in a sentence such as "He entered John's house through the
front door", "the front door" is a referring expression and the bridging
relationship to be identified is the fact that the door being referred to is the
front door of John's house (rather than of some other structure that might also
be referred to).
 Discourse analysis:

10/19/2023 17
Aspects of Language Processing
 Speech Processing
 Speech recognition
 Speech segmentation
 Given a sound clip of a person or people speaking,
separate it into words.
 A subtask of speech recognition and typically grouped
with it.

 Text-to-speech (Speech Synthesis)

10/19/2023 18
Goal of Natural Language Processing
 Ultimate goal: Natural human-to-computer communication.
 The goal of natural language processing (NLP) is to design and build
computer systems that are able to analyze natural languages like Geez,
Amharic, German or English, and that generate their outputs in a natural
language, too.
 In natural language understanding, the objective is to extract the meaning
of an input sentence or an input text. Usually, the meaning is represented
in a suitable formal representation language so that it can be processed by
a computer.
 The goal in text classification is to assign a text document to one out of
several text classes.
 Example: for newspaper articles, such classes are sports reports,
finances, and politics.

10/19/2023 19
History of Natural Language Processing

 1950s
 Early MT: word translation + re-ordering
 Chomsky’s Generative grammar
 Bar-Hill’s argument
 1960-80s
 Applications:
 BASEBALL: use NL interface to search in a database on baseball games.
 LUNAR: NL interface to search in Lunar
 ELIZA: simulation of conversation with a psychoanalyst
 SHREDLU: use NL to manipulate block world
 Message understanding: understand a newspaper article on terrorism
 Machine translation

10/19/2023 20
History of Natural Language Processing

 1960-80s
 Methods
 ATN (augmented transition networks): extended context-free grammar
 Case grammar (agent, object, etc.)
 DCG – Definite Clause Grammar
 Dependency grammar: an element depends on another
 1990s-now
 Statistical methods
 Speech recognition
 MT systems
 Question-answering
 …

10/19/2023 21
History of Natural Language Processing

 Traditional NLP approaches: symbolic, grammar, …


 More recent approaches: statistical
 For some applications: statistical approaches are better
(tagging, speech recognition, …)

 For some others, traditional approaches are better (MT)

 Trend: combine statistics with rules (grammar)


 E.g. Probabilistic Context Free Grammar (PCFG)
 Consider some grammatical connections in statistical approaches.

10/19/2023 22
History of Natural Language Processing

 Classical symbolic methods:


 Morphological analyzer

 Parser (syntactic analysis)

 Semantic analysis (transform into a logical form, semantic


network, etc.)

 Discourse analysis

 Pragmatic analysis

10/19/2023 23
History of Natural Language Processing

 Empirical and Statistical Approaches


 Corpus Creation

 Treebank Annotation

 Fundamental Statistical Techniques

 Part-of-Speech Tagging

 Statistical Parsing

 Etc…

10/19/2023 24
NLP Applications

 Intelligent computer systems  Information Retrieval,


 NLU interfaces to databases  Information retrieval (IR) deals
with the representation, storage,
 Computer aided instruction organization of, and access to
information items.
 Intelligent Web searching
Query-answering
 Data mining
 Text categorization
 Machine translation
 Summarization
 Speech recognition
 Data extraction etc…
 Natural language generation
 Question answering
 Text classification
 etc

10/19/2023 25
NLP Applications

 Speech Synthesis
 Text to Speech:

10/19/2023 26
Open Problems in NLP

 Challenges in natural language processing frequently involve:


 Natural language understanding,
 Natural language generation (frequently from formal, machine-
readable logical forms),
 Connecting language and machine perception,
 Managing human-computer dialog systems, or
 Some combination thereof…

10/19/2023 27
Open Problems in NLP

 Ambiguity
 Lexical/morphological: change (V,N), training (V,N), even (ADJ,
ADV) …
 Syntactic: Helicopter powered by human flies.
 Semantic: He saw a man on the hill with a telescope.
 Discourse: anaphora, …

10/19/2023 28
Knowledge Sources

 When using NLP for a new domain, one also needs to answer what
text source should be used for extracting content.
 Of course, not any arbitrary text source is applicable.
 In order to qualify as a source, the text type needs to meet the
following two criteria:
 Firstly, the text type needs to contain sufficient domain
knowledge.
 In other words, if we choose a text type that only infrequently contains content
regarding a given domain, then we are not very likely to extract any significant
amount of knowledge.
• In the past, most research in NLP has been carried out on news corpora. The
topic that is predominant on this text type are issues out of the domain.
Consequently, this text type would be of little value for knowledge extraction.
 Secondly, the text type should not only contain knowledge about
the domain that is already widely available in structured format
(such as databases)
10/19/2023  Otherwise, there would hardly be any point in extracting knowledge from those 29
texts as it would already be available.
Computational Morphology
 What is it?
 Morphology: the study/knowledge of structure/form
• In this case: of words,
• How words are created, structured, analyzed
• Morpheme: basic meaningful unit of language.
 Computational morphology: developing/using computer
applications that involve morphology.
 Computational applications:

 Analysis: parse/break a word into its constituent morphemes.

 Generation: create/generate a word from its constituent morpheme.

10/19/2023 30
Computational Morphology

 Morphological processes:
 Affixation: prefix, suffix, infix
 Interleaving (KaTaB, uKTaB)
 Cliticization (isn’t, s’appelle)
 Internal change: (sing/sang, goose/geese)
 Suppletion (irregularity): (aller/ir, be/am)
 Stress placement: implant, import, contest
 Tone placement: dà vs. dá ( will spank vs. spanked)
 Reduplication
 Full: iji/ijiiji
 Partial: lakad/lalakad

10/19/2023 31
Computational Morphology

 Areas of focus in morphology:


 Derivational
 do+able, adjourn+ment, depos+ition, un+lock, teach+er
 Inflectional
 dog+s, sneez+ed
 Compounding
 overkill, BYU intramural track star
 Cliticization
• I’m, she’ll, they’ve, o’clock

10/19/2023 32
Computational Morphology

 Computational morphology
 Processing morphological structure via computer (parsing,
generation)
 Traditional approach:
 ad-hoc methods,
 Cut-and-paste algorithms,
 Dictionary lookup,
 Inadequate for highly inflected languages.
 Even statistical approaches are often un-useful,
 Two-level approach w/finite-state techniques,
 Machine learning is making inroads,
 Sequence labeling, morpheme boundary detection.

10/19/2023 33
Question & Answer

10/19/2023 34
Thank You !!!

10/19/2023 35

You might also like