0% found this document useful (0 votes)
42 views

Module 2 - Natural Language Processing: Paulo Gomes DEI - FCTUC, 2006/2007

This document provides an overview of a module on natural language processing. It introduces NLP and discusses various topics covered in the module, including morphological analysis, syntactic analysis, semantic analysis, and applications of NLP. It provides examples to illustrate different NLP tasks like part-of-speech tagging, parsing, and semantic analysis.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Module 2 - Natural Language Processing: Paulo Gomes DEI - FCTUC, 2006/2007

This document provides an overview of a module on natural language processing. It introduces NLP and discusses various topics covered in the module, including morphological analysis, syntactic analysis, semantic analysis, and applications of NLP. It provides examples to illustrate different NLP tasks like part-of-speech tagging, parsing, and semantic analysis.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Module 2 – Natural

Language Processing
Paulo Gomes

DEI – FCTUC, 2006/2007

Paulo Gomes ATAI 06/07 1


Module Overview
• Introduction to Natural Language Processing
(NLP)

• Morphological Analysis

• Syntactic Analysis

• Semantic Analysis

• Applications
Paulo Gomes ATAI 06/07 2
Introduction to NLP
• Example:
– “But I have promises to keep, and miles to go before I sleep.”
[Miller, 2001]
– Using the word definitions:
Word Definitions Combinations
But 11 11
I 3 33
have 16 528
promises 7 3696
to 21 77616
keep 17 1319472
and 5 6597360
miles 5 32986800
to 21 692722800
go 29 20088961200
before 10 200889612000
I 3 602668836000
sleep 6 3616013016000

Paulo Gomes ATAI 06/07 3


Introduction to NLP
• How do humans manage this exponential
number of possible meanings?
• Response – Filtering by:
– Lexical Knowledge
– Syntactical Knowledge
– Semantic Knowledge
– Pragmatics Knowledge
–…

Paulo Gomes ATAI 06/07 4


Introduction to NLP
• Another Example:
– Dave Bowman: “Open the pod bay doors, HAL”.
– HAL: “I'm sorry Dave, I'm afraid I can't do that”.
• from “2001: A Space Odyssey”

– What HAL must do to say this:


• Phonetics → signal analysis:
– Sound → Symbols → Reasoning → Symbols → Sound

Paulo Gomes ATAI 06/07 5


Introduction to NLP
• What HAL must do to say this:
– Morphology → understand words:
• “pod bay doors” – meaning of words.
• “I’m” – inflection, linguistic phenomenon.

– Syntax → understand the combination of words.


• “Open the pod bay doors, HAL” – is a valid sentence, with
clear roles for words.

Paulo Gomes ATAI 06/07 6


Introduction to NLP
• What HAL must do to say this:
– Semantics → understand the meaning of sentences:
• “Open the pod bay doors, HAL” – identification of a request,
and what is the request.

– Pragmatics/Discourse → understand the request and


take action according.
• “Open the pod bay doors, HAL” – a request for an action,
that HAL does not want to perform.

Paulo Gomes ATAI 06/07 7


Introduction to NLP
• Knowledge Categories:
– Phonetics and Phonology → language sounds.
– Morphology → words.
– Syntax → structural relations between words.
– Semantics → meaning of words and sentences.
– Pragmatics → how the language is used.
– Discourse → linguistic components above sentences
(anaphora, metaphor …).

Paulo Gomes ATAI 06/07 8


Introduction to NLP
• Main difference between Natural
Languages (NL) and Formal Languages
(FL):

Symbols in FL are not ambiguous as in NL.

Paulo Gomes ATAI 06/07 9


Module Overview
• Introduction to Natural Language Processing
(NLP)

• Morphological Analysis

• Syntactic Analysis

• Semantic Analysis

• Applications
Paulo Gomes ATAI 06/07 10
Morphological Analysis

“The University of Coimbra is 700 years old.”

• Tokenization:
– The | University | of | Coimbra | is | 700 | years | old | .

Refers to the same entity!!!

– Problem:
• Identification of Compound Names (Named Entity
Recognition).

Paulo Gomes ATAI 06/07 11


Morphological Analysis

“The University of Coimbra is 700 years old.”

• Elimination of stop words:


– The | University | of | Coimbra | is | 700 | years | old | .

– University of Coimbra | is | 700 | years | old

Paulo Gomes ATAI 06/07 12


Morphological Analysis

“The University of Coimbra is 700 years old.”

• Stemming or Morphological Analysis :


– University of Coimbra | is | 700 | years | old

– University of Coimbra | be | 700 | year | old

Paulo Gomes ATAI 06/07 13


Morphological Analysis
• Stemming or Morphological Analysis :
– Plurals:
• Years → year
– Verb forms:
• Has → have
– Compounding:
• Bookkeeper → book + keeper
– Word derivation:
• Prefixes:
– Shortness → short
• Suffixes:
– Unbuckle → buckle
• Circumfixes (very rare in english):
– enlighten
• Infixes (very rare in english):
– Piperidine

Paulo Gomes ATAI 06/07 14


Morphological Analysis

• Word/term identification.

• Finding the morphemes (constituents) of a word.

Paulo Gomes ATAI 06/07 15


Module Overview
• Introduction to Natural Language Processing
(NLP)

• Morphological Analysis

• Syntactic Analysis

• Semantic Analysis

• Applications
Paulo Gomes ATAI 06/07 16
Syntactic Analysis
The | University of Coimbra | is | 700 | years | old | .

• Part-Of-Speech (POS) tagging:


– Identification of word lexical classes.

– The → determiner (DT)


– University of Coimbra → singular proper noun (NNP)
– is → verb 3rd person singular present (VBZ)
– 700 → cardinal number (CD)
– years → plural noun (NNS)
ReBuilder TextToDiagram
– old → adjective (JJ) (OpenNLP)
– . → sentence-final ponctuation (.)
Paulo Gomes ATAI 06/07 17
Syntactic Analysis
• Penn Treebank project Tags:

Paulo Gomes ATAI 06/07 18


Syntactic Analysis
The [DT] | University of Coimbra [NNP] | is [VBZ] | 700 [CD]
| years [NNS] | old [JJ] | .

• Parsing:
– Full Parsing
• define the sentence structure using a parsing tree.
– Shallow Parsing
• define the sentence structure using parsing chunks.

Paulo Gomes ATAI 06/07 19


Syntactic Analysis
The [DT] | University of Coimbra [NNP] | is [VBZ] | 700 [CD]
| years [NNS] | old [JJ] | .

• Full Parsing:
S

NP VP

NP ADJP

The/DT University of Coimbra/NNP Is/VBZ 700/CD years/NNS old/JJ

Paulo Gomes ATAI 06/07 20


Syntactic Analysis
The [DT] | University of Coimbra [NNP] | is [VBZ] | 700 [CD]
| years [NNS] | old [JJ] | .

• Shallow Parsing:
• [NP The/DT University of Coimbra/NNP ]
• [VP is/VBZ ]
• [NP 700/CD years/NNS ]
• [ADJP old/JJ ]

Paulo Gomes ATAI 06/07 21


Syntactic Analysis

• Identification of word/term POS class (POS


tagging).

• Identification of sentence structure (parsing):


– Full Parsing
– Shallow Parsing

Paulo Gomes ATAI 06/07 22


Module Overview
• Introduction to Natural Language Processing
(NLP)

• Morphological Analysis

• Syntactic Analysis

• Semantic Analysis

• Applications
Paulo Gomes ATAI 06/07 23
Semantic Analysis
• Syntax-Driven Semantic Analysis:
– Example:
• “Vegetarians eat fruit.”

NP VP
∀x,y : vegetarian(x) Λ fruit(y) ⇒ eats(x,y)
NN
VB NN OR

Vegetarians eat fruit

Paulo Gomes ATAI 06/07 24


Semantic Analysis
• Syntax-Driven Semantic Analysis:
– Principle of compositionality:
• The meaning of a sentence can be composed from the
meaning of its parts.
S

NP VP
Natural
Language Semantic Semantic
Parser NN VB NN
Representation
Text Analysis
Vegetarians eat fruit

Parse Tree

Paulo Gomes ATAI 06/07 25


Semantic Analysis
• Lexical Semantics

“Bank”
What does it mean?

– Difference between symbol (lexeme) and


meaning of the symbol.

– Study of linguistic phenomenon.


Paulo Gomes ATAI 06/07 26
Semantic Analysis
• Lexical Semantics
– Homonymy:
• Same lexeme, different meaning (e.g. Bank).
• Word Sense Disambiguation
– Synonymy:
• Different lexemes, same meaning (e.g. Price &
Cost).
– Hyponymy:
• One lexeme is a subclass of another lexeme (e.g.
Car & Vehicle).
Paulo Gomes ATAI 06/07 27
Semantic Analysis
• Lexical Semantics
– Meronymy:
• One lexeme is part of another lexeme (e.g. Car &
Wheel).
– Other linguistic phenomenon:
• Metaphor
• Anaphora
• Metonymy
• …

Paulo Gomes ATAI 06/07 28


Semantic Analysis
• Lexical Semantics:
– to analyze the meaning of words and
semantic relations between them.
• Lexeme:
– an individual entity in the lexicon.
• Lexicon:
– the finite list of expressions used in the
language to express meaning.

Paulo Gomes ATAI 06/07 29


Semantic Analysis
• Thesaurus:
– Organizes lexemes and their meanings
(senses or synsets), along with the semantic
relations between senses.
• WordNet
– Online thesaurus:
http://wordnet.princeton.edu/perl/webwn

Paulo Gomes ATAI 06/07 30


Semantic Analysis
• Definition:
– the process whereby meaning representations are
composed and assigned to linguistic input.

Natural Language Text Formal


+ Semantic Analysis Representation
Parsing Structure (text meaning)

Paulo Gomes ATAI 06/07 31


Module Overview
• Introduction to Natural Language Processing
(NLP)

• Morphological Analysis

• Syntactic Analysis

• Semantic Analysis

• Applications
Paulo Gomes ATAI 06/07 32
NLP Applications
• Question & Answering (Q&A)
– Brain boost:
http://www.brainboost.com/

Natural
Web Search
Language
(Google, ...)
Question

Web
Answer Resulting Web
extraction Pages

Paulo Gomes ATAI 06/07 33


NLP Applications
• Machine Translation

– Google language tools:

http://www.google.com/language_tools?hl=en

Paulo Gomes ATAI 06/07 34


NLP Applications
• Conversation Systems:

– Eliza

http://www-ai.ijs.si/eliza-cgi-bin/eliza_script

– A page about Chatterbots

http://www.simonlaven.com/

Paulo Gomes ATAI 06/07 35


NLP Applications
• Text Mining
Query Concept
distribution in
the selected
documents

Selected
Documents

Temporal
distribution of
the selected
documents

Paulo Gomes ATAI 06/07 36


NLP Applications
• Information retrieval from Databases

Natural
SQL
Language
(SELECT)
Question

Database

Answer
Resulting
Extraction and
Table(s)
Formatting

Paulo Gomes ATAI 06/07 37


NLP Applications
• Document Management:
– Categorization, Clustering, Summarization

Document
List
Ontology

Information
about the
selected
Concept document
Selection

Paulo Gomes ATAI 06/07 38


NLP Applications
• Document/Email Routing and Filtering

EmailN
... Ontology User 1
+ Email1
Email2
User profiles
Email1

User 2
Information
NewsN
Routing System ...
...
News2
News2
News1

User N

Paulo Gomes ATAI 06/07 39


NLP Applications
• Other Applications:
– Query Expansion
– Web Mining
– Named Entity Recognition
– Word Sense Disambiguation
– Human Computer Interfaces
– Natural Language Generation
–…

Paulo Gomes ATAI 06/07 40


Next Modules
• Case-Based Reasoning (CBR)
• Planning
• Knowledge Discovery
• Intelligent Systems for Knowledge Management
• Ontologies
• Semantic Web
• Affective Computing
• Exploration of Unknown Environments and Map
Constructions
Paulo Gomes ATAI 06/07 41
The End

• Questions?

Paulo Gomes ATAI 06/07 42

You might also like