0% found this document useful (0 votes)

106 views

Natural Language Processing Applications: Fabienne Venant Université Nancy2 / Loria 2008/2009

This document provides an introduction to natural language processing (NLP) and discusses several NLP applications. It aims to endow computers with human-level linguistic abilities. Applications discussed include conversational agents, machine translation, question answering, and more. The document outlines linguistic knowledge required for NLP, including phonetics, phonology, morphology, syntax, semantics. It provides examples of how each of these areas is used in applications such as speech recognition, generation, translation, and information extraction.

Uploaded by

praval84

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views

Natural Language Processing Applications: Fabienne Venant Université Nancy2 / Loria 2008/2009

Uploaded by

praval84

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 123

Natural Language Processing

Applications
Fabienne Venant

Université Nancy2 / Loria

2008/2009
Introduction to NLP
NLP

aims at :
• making computers talk
• endowing computers with the linguistics
ability of humans
Dialog system
Fiction
 Conversational agent – system dialog

HAL 9000 -- A space odyssey C3PO -- Star Wars

Dialog system
reality
• E-commerce: AINI

• a chatterbot integrated
with 3D animated agent
character
• Improve customer
services
• Reduce customer reliance
on human operator
Dialog system
reality
• E-teaching : autotutor
(http://www.autotutor.org/what/what.htm )
– Intelligent tutoring system that helps student
learn by holding a conversationnal in natural
language
– Animated agent : synthesis speech,
intonation, facial expressions, and gestures
– demo (from 2002)
Machine translation
• Automatically translate
a document from one
language to another
• Very useful on the web
• Far from solved
problem
Question - answering
• Generalization of simple Web search
• Ask complete questions
– What does divergent mean?
– How many states were in Europe in 2007?
– What is the occupation of Bill Clinton’s wife ?
– What do scientist think about global warming?
Linguistic knowledge in NLP
Linguistic knowledge in NLP
What would HAL need to engage in this dialog?

• Dave Bowman: Hello, HAL do you read me, HAL?

• HAL: Affirmative, Dave, I read you.
• Dave Bowman: Open the pod bay doors, HAL.
• HAL: I'm sorry Dave, I'm afraid I can't do that.
• Dave Bowman: What's the problem?
• HAL: I think you know what the problem is just as well as I do.
• Dave Bowman: What are you talking about, HAL?
• HAL: This mission is too important for me to allow you to
jeopardize it.
• Dave Bowman: I don't know what you're talking about, HAL?
• HAL: I know you and Frank were planning to disconnect me, and
I'm afraid that's something I cannot allow to happen.
• Dave Bowman: Where the hell'd you get that idea, HAL?
• HAL: Dave, although you took thorough precautions in the pod
against my hearing you, I could see your lips move.
Speech recognition / speech
synthesis
• phonetics, phonology :
– how words are pronounced in terms of
sequences of sounds
– How each of these sounds is realized
acoustically
• Morphology : can’t, I’m, were, lips...
– Producing and recognizing variations of
individual words
– The way words break down into component
parts that carry meaning (like sg / pl)
Phonetics
Study of the physical sounds of human
speech
•Transcription of sounds (IPA)

– /i:/, /ɜ:/, /ɔ:/, /ɑ:/ and /u:/

– 'there' => /ðeə/
– 'there on the table' => /ðeər ɒn ðə teɪbl /
– Exercices
Phonetics 2
• Articulory phonetics : production

• Acoustics phonetics: properties of sound

waves (frequency and harmonics)

• Auditory phonetics : speech perception

– McGurk effect
Phonology
• Describe the way sounds function to encode meaning
– Phoneme : speech sound that helps us constructing
meaning
• /r/ : rubble double, Hubble, fubble, wubble.
• /u/ : rubble  rabble, rebel, Ribble, robble...

– Phoneme can be realized in different forms depending on

context (allophones)
• /l/ : lick [l] / ball [ɫ]

– Speech synthesis uses allophones

• Speackjet
Morphology
• Study the structure of words
– Inflected forms  lemma
• walks, walking, walked  walk
– Lemma + part of speech = lexeme
• Walk, walking, walked  walk
• Walker, walkers  walker

• Flectional morphology : decomposes a word into a lemma and one

or more affixes giving informations abouts tense, gender, number
– Cats lemma: cat + affixe s (plural)

• Derivational morphology: decomposes a word into a lemma and one

or more affixes giving informations about meaning and category
– Unfair  prefix (un, semantic: non) + lemma: fair

• Exceptions and irregularities ?

– Women  woman, pl
– Aren’t  Are not
Morphology
Methods
• Lemmatisation : process of grouping together the different
inflected forms of a word so they can be analysed as a
single item
– Need to determine the part of speech of a word in a sentence
(requiring grammar knowledge)
• Stemming: operates on a single word without knowledge of
the context
– cannot discriminate between words which have different meanings
depending on part of speech
– easier to implement and run faster, reduced accuracy may not
matter for some applications
• Examples
– better  lemma : good, missed in the stemming
– walking lemma: walk, matched in both stemming and
lemmatization.
Morphology
Method and applications
• Method
– Finite state transducer
• Applications
– to resolve anaphora:
Sarah met the women in the street.
She did not like them. [She (sg) = Sarah (sg) ; them (pl) = the
women (pl) ]
– for spell checking and for generation
• * The women (pl) is (sg)

– For information retrieval

• Google search

– ...
Syntax

I’m sorry Dave, I can’t do that

Syntax
structure of language
I’m I do, sorry that afraid Dave I’m can’t

• Languages have structure:

– not all sequences of words over the given
alphabet are valid
– when a sequence of words is valid (grammatical
), a natural structure can be induced on it.
Syntax
• Describes the constituent structure of NL
expressions
– (I (am sorry)), Dave, ( I ((can’t do) that))
• Grammars are used to describe the syntax
of a language
• Syntactic analysers and surface realisers
assign a syntactic structure to a
string/semantic representation on the
basis of a grammar
Syntax
• It is useful to think of this structure as a
tree:
– represents the syntactic structure of a string
according to some formal grammar.
– the interior nodes are labeled by non-
terminals of the grammar, while the leaf
nodes are labeled by terminals of the
grammar.
Syntax
tree example

NP VP

John V NP PP

Adv V Det n Prep NP

often gives a book to Mary

Methods in syntax
Words  syntactic tree
– Algorithm: parser
• A parser checks for correct syntax and builds a data
structure.
– Resources used: Lexicon + Grammar
– Symbolic : hand-written grammar and lexicon
– Statistical : grammar acquired from treebank
• Treebank : text corpus in which each sentence has
been annotated with syntactic structure.
• Syntactic structure is commonly represented as a tree
structure, hence the name treebank.
– Difficulty: coverage and ambiguity
Syntax
applications
• For spell checking
– *its a fair exchange  No syntactic tree
– It’s a fair exchange  ok syntactic tree

• To construct the meaning of a sentence

• To generate a grammatical sentence

Syntax  meaning

John loves Mary love(j,m)

Agent = Subject
≠Mary loves John love(m,l)
Agent = Subject
=Mary is loved by John love(j,m)
Agent = By-Object
Semantics

– Where the hell ‘d you get that idea HAL

– Dave, although you took thorough precautions
in the pod against my hearing you, I could see
your lips move
Lexical semantics
Meaning of words
An idea
To get 1. a though or suggestion about a
1. come to have or hold; possible course of action.
receive.
2. a mental impression.
2. succeed in attaining,
achieving, or experiencing; 3. a belief.
obtain. 4. (the idea) the aim or purpose.
3. experience, suffer, or be
afflicted with.
4. move in order to pick up, The hell
deal with, or bring. 1. a place regarded in various religions
5. bring or come into a as a spiritual realm of evil and
specified state or condition. suffering, often depicted as a place
6. catch, apprehend, or thwart. of perpetual fire beneath the earth to
7. come or go eventually or which the wicked are sent after
with some difficulty. death.
8. move or come into a
specified position or state 2. a state or place of great suffering.
... 3. a swear word that some people use
when they are annoyed or surprised
Lexical semantics

Who is the master?

Context?
Semantic relations?

Lewis Carroll, Through the looking glass

Compositional semantics

• Where the hell did you get that idea?

a swear word that some people

use when they are annoyed or
surprised or to emphasize sth Have this belief
Semantics issues in NLP

• Definition and representation of meaning

• Meaning construction
• Semantic relations
• Interaction between semantic and syntax
Semantic relations
• Paradigmatic relation (substitution) Blablabla word1 bla bla bla

word2
“How are you doing?” I would ask.
“Ask me how I am feeling?” he answered.
“Okay, how are you feeling?” [. . .]
“I am very happy and very sad.”
“How can you be both at the same time?” I asked in all seriousness, a girl of
nine or ten.
“Because both require each others’ company. They live in the same house.
Didn’t you know?”
Terry Tempest Williams, “The village watchman” (1994)

• synonymy: sofa=couch=divan=davenport
• antonymy: good/bad, life/death, come/go
• contrast: sweet/sour/bitter/salty, solid/liquid/gas
• hyponymy, or class inclusion: cat<mammal<animal
• meronymy, or the part-whole relation: line<stanza<poem
Semantic relations
• Syntagmatic relations: relations between words that go
together in a syntactic structure.
– Collocation : heavy rain, to have breakfast, to deeply regret...
• Useful for generation

– Argumental structure
• Someone breaks something with something

3 arguments

– Difficulty: number of arguments ? Can an argument be optional ?

John brokes the window
John brokes the window with a hammer
The window brokes  semantic argument ≠ syntactic argument
– Thematic roles : agent, patient, goal, experiencer, theme...
semantic / syntax
lexicon

• Sub categorisation frames

– to run: SN1
– to eat : SN1, SN2
– To give : SN1, SN2, SP3 (to)
– envious : SN1, SP2 (of)
Semantic / syntax
lexicon
• Argumental structure
– Logic representation: eat (x, y), give (x,y,z)
– Thematic roles : to give [agent, theme, go k], to buy [agent,
theme, source], to love [experiencer, patient]
• Link with syntax: break (Agent:, Instrument, Patient:)
– Agent <=> subj
– Instrument <=> subj, with-pp
– Patient <=> obj, subj

• Selectional restrictions: semantics features on arguments

– To eat [agent : animate, theme : comestible, solid]
– John eats bread l thème [+solide] [+comestible]
– *The banana eats  filtering
– * John eats wine
– But : ? John eats soup
Semantics in NLP
• For machine translation
– Le robinet fuit / Le voleur fuit -> leak/run away

• For information retrieval (and cross Language

Information Retrieval)
– Search on word meaning rather than word form
• Keywords disambiguation
• Query expansion (synonyms)
 more relevance
Semantics in NLP
• QA: Who assassinated President McKinley?
– Keywords: assassinated President McKinley /Answer named entity :
Person / Answer thematic role : Agent of target synonymous with
\assassinated
– False positive (1): In [ne=date 1904], [ne=person description
President] [ne=person Theodore Roosevelt], who had succeeded the
[target assassinated] [role=patient [ne=person William McKinley]], was
elected to a term in his own right as he defeated [ne=person description
Democrat] [ne=person Alton B. Parker]?
– Correct Answer (8): [role=temporal In [ne=date 1901]], [role=patient
[ne=person description President] [ne=person William McKinley]] was
[target shot] [role=agent by [ne=person description anarchist]
[ne=person Leon Czolgosz]] [role=location at the [ne=event Pan-
American Exposition] in [ne=us city Bu_alo], [ne=us state N.Y.]

Using Semantic representation in question answering, Sameer S and al, 2003

Pragmatics

Dave Bowman: Open the pod bay doors, HAL.

HAL: I'm sorry Dave, I'm afraid I can't do that.
Pragmatics
• Knowledge about the kind of actions that
speakers intend by their use of sentences
– REQUEST: HAL, open the pod bay door.
– STATEMENT: HAL, the pod bay door is open.
– INFORMATION QUESTION: HAL, is the pod
bay door open?
• Speech act analysis (politeness, irony,
greeting, apologizing...)
Discourse
Where the hell'd you get that idea, HAL?

Dave and Frank were planning to disconnect me

 Much of language interpretation is dependent

on the preceding discourse/dialogue
Linguistics knowledge in NLP
summary
• Phonetics and Phonology —knowledge about linguistic
sounds
• Morphology —knowledge of the meaningful components
of word
• Syntax —knowledge of the structural relationships
between word
• Semantics —knowledge of meaning
• Pragmatics — knowledge of the relationship of meaning
to the goals and intentions of the speaker
• Discourse —knowledge about linguistic units larger than
a single utterance
Ambiguity
• Most tasks in speech and language
processing can be viewed as resolving
ambiguity at one of these levels

• Ambiguous item  multiple, alternative

linguistic structures can be built for it.
Ambiguity
I made her duck

• I cooked waterfowl for her.

• I cooked waterfowl belonging to her.
• I created the (plaster?) duck she owns.
• I caused her to quickly lower her head or
body.
Ambiguity
I made her duck

• Morphological ambiguity :
Part of speech
– duck : verb / noun tagging
– her: dative pronoun / possessive pronoun
• Semantical ambiguity Word sense
– Make: create / cook disambiguation

• Syntatic ambiguity:
– Make: transitive/ ditransitive Syntactic
disambiguation /
– [her duck ] / [her][duck] parsing
Ambiguity
• Sound-to- text issues:
– Recognise speech.
– Wreck a nice peach.
• Speech act interpretation
– Can you switch on the computer?’
• Question or request?

Combinatorial problem
Ambiguity vs paraphrase
• Ambiguity : the same sentence can mean
different things
• Paraphrase: There are many ways of saying the
same thing.
– Beer, please.
– Can I have a beer?
– Give me a beer, please.
– I would like beer.
– I’d like a beer, please.
In generation (MeaningText), this implies making
choices
Combinatorial problem
Models and algorithms
Models and algorithms
• The various kind of knowledge can be
captured through the use of a small
number of formal models or theories

• Models and theories are all drawn for the

standard toolkit of computer science,
mathematics and linguistics
Models and algorithms
• Models • Algorithms
– State machines – Dynamic programming
– Rule systems – Machine learning
– Logic • Classifiers / sequence
models
– Probalistic models
• Expectation-
– Vector-space models maximization (EM)
– Learning algorithms
Models
• State machine : simplest formulation
– State, transition among state, input
representation
– Finite-state automata
• Deterministic
• Non deterministic
– Finite-state transducers
Models
• Formal rules systems

– Regular grammars
– Context-free grammars
– Feature augmented grammars
Models
State machines and formal rule systems are
the main tools used when dealing with
knowledge of phonology, morphology,and
syntax.
Models
• Models based on logics
– First Order Logic / predicate calculus
– Lamda-calculus, feature structures, semantic
primitives
These logical representations have traditionally
been used for modeling semantics and
pragmatics, although more recent work has
tended to focus on potentially more robust
techniques drawn from non-logical lexical
semantics.
Models
• Probabilistic models
– crucial for capturing every kind of linguistic knowledge.
– Each of the other models can be augmented with probabilities.
– Example, the state machine augmented with probabilities can
become
• weighted automaton, or Markov model.
• hidden Markov models (HMMs) : part-of-speech tagging, speech
recognition, dialogue understanding, text-to-speech, machine
translation....
• Key advantage of probabilistic models : ability to solve
the many kinds of ambiguity problems
– almost any speech and language processing problem can be
recast as “given N choices for some ambiguous input, choose
the most probable one”.
Models
• Vector space models
– based on linear algebra
– Information-retrieval
– Word meanings
Models
Language processing : search through a space of
states representing hypotheses about an input
– Speech recognition : search through a space of
phone sequences for the correct word.
– Parsing : search through a space of trees for the
syntactic parse of an input sentence.
– Machine translation : search through a space of
translation hypotheses for the correct translation of a
sentence into another language.
Models
• Machine learning models: classifiers, sequence models
– Based on attributes describing each object
– Classifier : attempts to assign a single object to a single class
– Sequence model: attempts to jointly classify a sequence of
objects into a sequence of classes.

– Example, deciding whether a word is spelled correctly :

• classifiers : decision trees, support vector machines, Gaussian
mixture models + logistic regression  make a binary decision
(correct or incorrect) for one word at a time.
• Sequence models : hidden Markov models, maximum entropy
Markov models + conditional random fields  assign
correct/incorrect labels to all the words in a sentence at once.
Brief history
Brief history
• 1940’s - 1950’s : foundational insights
• 1950- 1970 : symbolic / statistical
• 1970 – 1983 : four paradigms
• 1983 – 1993 : empiricism and finite state
models
• 1994 – 1999: field unification
• 2000 -2008 : empiricist trends
1940’s  1950’s
• Automaton
• Probabilistic / information – theoretic
models
1940’s  1950’s
Automaton
• Turing’s (1936) : model of algorithmic computation

• McCulloch-Pitts neuron (McCulloch and Pitts, 1943) : a simplified

model of the neuron as a kind of computing element (propositional
logic)

• Kleene (1951) and (1956) : finite automata and regular expressions.

• Shannon (1948) : probabilistic models of discrete Markov processes to

automata for language.

• Chomsky (1956) : finite state machines as a way to characterize a

grammar
– Formal language theory (algebra and set theory):
– Context-free grammar for natural languages
• Chomsky (1956)
• Backus (1959) and Naur et al. (1960) : ALGOL programming language.
1940’s  1950’s
Probalistic algorithms
Speech and language processing,
• Shannon
– metaphor of the noisy channel
– entropy as a way of measuring the information capacity of a
channel, or the information content of a language,
– first measure of the entropy of English by using probabilistic
techniques.
• Sound spectrograph (Koenig et al., 1946),
• Foundational research in instrumental phonetics
• First machine speech recognizers (early 1950s).
– 1952, Bell Lab, statistical system that could recognize any of the
10 digits from a single speaker (Davis et al., 1952).
1940’s  1950’s
Machine translation
One of the earliest applications of computers
• Major attempts in US and USSR
– Russian to English and reverse

• George Town University, Washington system:

– Translated sample texts in 1954

• The ALPAC report (1964)

– Assessed research results of groups working on MTs
• Concluded: MT not possible in near future.
• Funding should cease for MT !
• Basic research should be supported.
• Word to word translation does not work
– Linguistic Knowledge is needed
1950’s  1970’s
Two camps

• Symbolic paradigm

• Statistical paradigm
1950’s  1970’s
Symbolic paradigm 1
Formal language theory and generative syntax
• 1957 Noam Chomsky's Syntactic Structures
– A formal definition of grammars and languages
– Provides the basis for an automatic syntactic
processing of NL expressions
• Montague's PTQ
– Formal semantics for NL.
– Basis for logical treatment of NL meaning
• 1967 : Woods procedural semantics
– A procedural approach to the meaning of a sentence
– Provides the basis for a automatic semantic
processing of NL expressions
1950’s  1970’s
Symbolic paradigm 2

Parsing algorithms
– top-down and bottom-up
– dynamic programming.
– Transformations and Discourse Analysis
Project (TDAP)
• Harris, 1962
• Joshi and Hopely (1999) and Karttunen (1999),
• cascade of finite-state transducers.
1950’s  1970’s
Symbolic paradigm 3
AI
• Summer of 1956 :John McCarthy, Marvin Minsky,
Claude Shannon, and Nathaniel Rochester
– work on reasoning and logic
• Newell and Simon  the Logic Theorist and the General
Problem Solver Early natural language understanding
systems
– Domains
– Combination of pattern matching and keyword search
– Simple heuristics for reasoning and question-answering

• Late 1960s  more formal logical systems

1950’s  1970’s
Statistical paradigm 1
• Bayesian method to the problem of optical character recognition.
– Bledsoe and Browning (1959) : Bayesian text-recognition
• a large dictionary
• compute the likelihood of each observed letter sequence given each word in
the dictionary
• Joshi and Hopely (1999) and Karttunen (1999)
– cascade of finite-state transducers likelihoods for each letter.

• Bayesian methods to the problem of authorship attribution on The

Federalist papers
– Mosteller and Wallace (1964)
• Testable psychological models of human language processing
based on transformational grammar
• Ressources
– First online corpora: the Brown corpus of American Englis
– DOC (Dictionary on Computer)
– an on-line Chinese dialect dictionary.
Symbolic vs statistical approaches
Symbolic
• Based on hand written rules
• Requires linguistic expertise
• No frequencey information
• More brittle and slower than statistical approaches
• Often more precise than statistical approaches
• Error analysis is usually easier than for statistical approaches

Statistical
• Supervised or non-supervised
• Rules acquired from large size corpora
• Not much linguistic expertise required
• Robust and quick
• Requires large size (annotated) corpora
• Error analysis is often difficult
Four paradigms: 1970-1983
• Statistical
• Logic-based paradigms
• Natural language understanding
• Discourse modeling
1970-1983
Statistical paradigm
Speech recognition algorithms
• Hidden Markov model (HMM) and the metaphors of the
noisy channel and decoding
– Jelinek, Bahl, Mercer, and colleagues at IBM’s Thomas J.
Watson Research Center,
– Baker at Carnegie Mellon University
• Baum and colleagues at the Institute for Defense
Analyses in Princeton

• AT&T’s Bell
Rabiner and Juang (1993)  descriptions of the wide
range of this work.
1970-1983
Logic-based paradigm
• Q-systems and metamorphosis grammars
(Colmerauer, 1970, 1975)
• Definite Clause Grammars (Pereira and
Warren, 1980)
• Functional grammar (Kay,1979)
• Lexical Functional Grammar (LFG)
(Bresnan and Kaplan’s,1982)
importance of feature structure unification
1970-1983
Natural language understanding1

• SHRDLU system : simulated a robot embedded

in a world of toy blocks (Winograd, 1972a).
– natural-language text commands
• Move the red block on top of the smaller green one
• complexity and sophistication
– first to attempt to build an extensive (for the time)
grammar of English (based on Halliday’s systemic
grammar)
– Ok for parsing
– Semantic and discourse?
1970-1983
Natural language understanding2
• Yale School : series of language
understanding programs
– conceptual knowledge (scripts, plans, goals..)
– human memory organization
– network-based semantics (Quillian, 1968)
– case roles (Fillmore, 1968)
– representations of case roles (Simmons,
1973).
1970 - 1083
• Unification of logic-based and natural-
language-understanding paradigms in
systems such as the LUNAR question-
answering system (Woods, 1967, 1973)

 uses predicate logic as a semantic

representation
1970-1983
Discourse Modelling
Four key areas in discourse:
• Substructure in discourse Grosz, 1977a,
• A discourse focus Sidner, 1983
• Automatic reference resolution (Hobbs,
1978)
• BDI (Belief-Desire-Intention)
– framework for logic-based work on speech
acts (Perrault and Allen,1980; Cohen and
Perrault, 1979).
1983-1993
• Return of state models
– Finite-state phonology and morphology (Kaplan and Kay, 1981)
– Finite-state models of syntax by Church (1980).
• Return of empiricism
– Probabilistic models throughout speech and language processing,
• IBM Thomas J. Watson Research Center: probabilistic models of speech
recognition.
• Data-driven approaches
– Speech  part-of-speech tagging, parsing, attachment ambiguities,
semantics.
• New focus on model evaluation,
– Held-out data
– Quantitative metrics for evaluation,
– Comparison of performance on these metrics with previous published
research.
• Considerable work on natural language generation
1994-1999
Major changes.
• Probabilistic and data-driven models had become quite standard
• Parsing, part-of-speech tagging, reference resolution, and discourse
processing
– Algorithms incorporate probabilities
– Evaluation methodologies from speech recognition and information
retrieval.
• Increases in the speed and memory of computers
– commercial exploitation (speech recognition, spelling and grammar
correction)
• Augmentative and Alternative Communication (AAC)
• Rise of the Web
– need for language-based information retrieval and information
extraction.
1994-1999
Ressources and corpora
• Disk space becomes cheap
• Machine readable text becomes uniquitous
• US funding emphasises large scale evaluation
on « real data »
• 1994 : The British National Corpus is made
available
– A balanced corpus of British English
• Mid 1990s : WordNet (Fellbaum & Miller)
– A computational thesaurus developed by
psycholinguists
• The World Wide Web used as a corpus
2000-2008
Empiricist trends 1
• Spoken and written material widely available
– Linguistic Data Consortium (LDC) ...
– Annotated collections (standard text sources with various forms of
syntactic, semantic, and pragmatic annotations)
• Penn Treebank (Marcus et al., 1993),)
• PropBank (Palmer et al., 2005),
• TimeBank (Pustejovsky et al., 2003b)
• ....
– More complex traditional problems castable in supervised machine
learning
• Parsing and semantic analysis
– Competitive evaluations
• Parsing (Dejean and Tjong Kim Sang, 2001),
• Information extraction (NIST, 2007a; Tjong Kim Sang, 2002; Tjong Kim Sang
and De Meulder,
• 2003)
• Word sense disambiguation (Palmer et al., 2001; Kilgarriff and Palmer, 2000)
• Question answering (Voorhees and Tice, 1999), and summarization (Dang,
2006).
2000-2008
Empiricist trends 2
• More serious interplay with the statistical
machine learning community
– Support vector machines (Boser et al., 1992;
Vapnik, 1995)
– Maximum entropy techniques (multinomial
logistic regression) (Berger et al., 1996)
– Graphical Bayesian models (Pearl, 1988)
2000-2008
Empiricist trends 2

Largely unsupervised statistical approaches

– Statistical approaches to machine translation
(Brown et al., 1990; Och and Ney, 2003) t
– Topic modeling (Blei et al., 2003)
 Effective applications could be constructed
from systems trained on unannotated data
alone.
 Use of unsupervised techniques
Language and intelligence
Language and intelligence

• Is a talking machine a thinking machine?

– Use of language intertwined with our general
cognitive abilities.
– Turing test (1950), "Computing Machinery and
Intelligence«
– What does that mean that a machine thinks?
• Imprecion in the terms « machine » and « think »
Turing test
• The computer’s use of language forms the basis
for determining if the machine could think.
• If the machine could win the game, it would be
judged intelligent.
• Three participants
– two people and a computer.
– a human judge engages in a natural language
conversation with one human and one machine
– each of which try to appear human
• If the judge cannot reliably tell which is which,
then the machine is said to pass the test.
Eliza
• Eliza, the first chatterbot : a computer
program that mimics human conversation.
– Joseph Weizenbaum, Massachusetts Institute
of Technology
• User types in natural language
• ELIZA then analyzes the user's statement
and generates some response which it
types out.
• http://www-ai.ijs.si/eliza/eliza.html
ELIZA
Sample conversation
• Men are all alike.
• IN WHAT WAY?
• They're always bugging us about something or other.
• CAN YOU THINK OF A SPECIFIC EXAMPLE?
• Well, my boyfriend made me come here.
• YOUR BOYFRIEND MADE YOU COME HERE
• He says I'm depressed much of the time.
• I AM SORRY TO HEAR YOU ARE DEPRESSED
• It's true. I am unhappy
• DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE
• UNHAPPY
• I need some help, that much seems certain.
• WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP
• Perhaps I could learn to get along with my mother.
• TELL ME MORE ABOUT YOUR FAMILY
Eliza's strategy
Keywords
• Input read and inspected for the presence of a
keyword
• When keyword is found, the sentence is
transformed according to a rule associated with
the keyword
• Else a content-free remark or, under certain
conditions, an earlier transformation is retrieved.
• The text so computed or retrieved is then printed
out
ELIZA
Example transformations

"I am X"  "How long have you been X?«

I am very unhappy.
HOW LONG HAVE YOU BEEN VERY UNHAPPY?

"X YOU Y ME"  "What makes you think I Y

you?«
It seems that you hate me.
WHAT MAKES YOU THINK THAT I HATE YOU?
ELIZA
Why does it work?

• Rogerian psychotherapist  Can pretend to

know almost nothing of the real world
"I went for a long boat ride"
"Tell me about boats"
• Not a real dialog (no real communication)
– ELiza generates a reaction (often a rather non
commital one) to the user's statement
– The user cannot ask for information.
– The user cannot issue an order
ELIZA
Does it really work?
• Eliza: Hello. I am ELIZA. How can I help you?
• You: I went to Paris yesterday
• Eliza: Can you elaborate on that?
• You: I took the train
• Eliza: Can you elaborate on that?
• You: The train was late
• Eliza: Please go on.
• You: All the seats were full
• Eliza: Please go on.
• You: A little boy was having a tantrum
• Eliza: Please go on.
ELIZA
Does it really work?
Indexicals (I, you, here, now)
User: I am having fun right now.
Eliza: Does it please you to believe that I am having
fun right now.

World Knowledge
User: I am a human.
Eliza: How long have you been a human?
When Eliza fails
• Yes/No Questions
User: Are you a good therapist?
Eliza: Would you prefer it if I were a good therapist?
User: I'm wondering about your skills as a therapist;
Eliza: Does it please you to believe I am wondering about my skills of
therapist?
• Wh-Questions
User: Who are you?
Eliza: Would you prefer it if I were not?
User: What do you want me to tell?
Eliza: Does that question interest you?
• Complex Assertions
User: I don't have a problem. I'm quite happy, I have a girlfriend, I
study. Everything is OK!
Eliza: Why do you not have any problem, I'm quite happy, you have a
girlfriend, you study. Everything is OK!
Eliza is not intelligent but...
• Regardless of what people believe or know about the inner workings of
computers, they talk about them and interact with them as social entities.
• Weizenbaum (1976) notes that many people continued to believe in
ELIZA’s abilities even after the program’s operation was explained to
them.
• People act toward computers as if they were people and expect that
computers should be able to understand their needs and be capable of
interacting with them naturally.
• Given these predispositions, speech- and language-based system are
not supposed to be intelligent
• But they may provide users with the most natural
interface for many applications
• So what about turing test?
NLP applications
Three main types of applications:
• Language input technologies
• Language processing technologies
• Language output technologies
Language input technologies

• Speech recognition
• Optical character recognition
• Handwriting recognition
• Retroconversion
Language input technologies
• Speech recognition
– Two main types of Applications
• Desktop control: dictation, voice control, navigation
• Telephony-based transaction: travel reservation,
remote banking, pizza ordering, voice control
– 60-90% accuracy.
– Speech recognition is not understanding!
– Based on statistical techniques and very large
corpora
Cf. the Parole team (Yves Laprie)
Language input technologies
• Speech recognition
– Desktop control
• Philips FreeSpeech (www.speech.philips.com)
• IBM ViaVoice (www.software.ibm.com/speech)
• Scansoft's DragonNaturallySpeaking
(www.lhsl.com/naturallyspeaking)
– demo
– See also google category:
http://directory.google.com/Top/Computers/Sp
eechTechnology/
Language input technologies
Dictation
• Dictation systems can do more than just transcribe what
was said:
– leave out the 'ums' and 'eh‘
– implement corrections that are dictated
– fill the information into forms
– rephrase sentences (add missing articles, verbs and
punctuation; remove redundant or repeated words and self
corrections)
 Communicate what is meant, not what is said
• Speech can be used both to dictate content or to issue
commands to the word processing applications (speech
macros eg to insert frequently used blocks of text or to
navigate through form)
Language input technologies
Dictation and speech recognition
• Telephony-based elded products
– Nuance (www.nuance.com)
– ScanSoft (www.scansoft.com)
– Philips (www.speech.philips.com)
– Telstra directory enquiry (tel. 12455)
• See also google category :
– http://directory.google.com/Top/Computers/Sp
eechTechnology/Telephony/
Language input technologies

Optical character recognition

• Key focus
– Printed material  computer readable representation
• Applications
– Scanning (text ) digitized format)
– Business card readers (to scan the printed information from
business cards into the correct fields of an electronic address
book : www.cardscan.com
• Website construction from printed documents
• Fielded products
– Caere's OmniPage (www.scansoft.com)
– Xerox' TextBridge (www.scansoft.com)
– ExperVision's TypeReader (www.expervision.com)
Language input technologies
Handwriting recognition
• Key focus
– Human handwriting  computer readable
representation
• Applications
– Forms processing
– Mail routing
– Personal digital agenda (PDA)
Language input technologies
Handwritting recognition
• Isolated letters
– Palm's Grati (www.palm.com)
– Computer Intelligence Corporation's Jot
(www.cic.com)
• Cursive scripts
– Motorola's Lexicaus
– ParaGraph's Calligraphper (www.paragraph.com)
• cf. the READ team (Abdel Belaid)
Language input technologies
Retroconversion
• Key focus: identify the logical and physical
structure of the input text
• Applications
– Recognising tables of contents
– Recognising bibliographical references
– Locating and recognising mathematical
formulae
– Document classication
Language processing technologies
• Spelling and grammar checking
• Spoken Language Dialog System
• Machine Translation
• Text Summarisation
• Search and Information Retrieval
• Question answering systems
Spoken Language Dialog Systems
• Goal
– a system that you can talk to in order to carry out some task.
• Key focus
– Speech recognition
– Speech synthesis
– Dialogue Management
• Applications
– Information provision systems: provides information in response
to query (request for timetable information, weather information)
– Transaction-based systems: to undertake transaction such as
buying/selling stocs or reserving a seat on a plane.
SLDSs - Some problems
• No training period possible in Phone-
based systems
• Error handling remains difficult
• User initiative remains limited (or likely to
result in errors)
SLDS
state of the art
• Commercial systems operational for
limited transaction and information
services
– Stock broking system
– Betting service
– American Airlines information system
• Limited (finite-state) dialogue management
• NL Understanding is poor
SLDS commercial systems
• Nuance (www.nuance.com)
• SpeechWorks (www.scansoft.com)
• Philips (www.speech.philips.com)
• See also google category :
– http://directory.google.com/Top/Computers/Sp
eechTechnology/
Machine translation
• Key focus
– Translating a text written/spoken in one
language into another language
• Applications
– Web based translation services
– Spoken language translation services
Existing MT system
• Bowne's iTranslator (www.itranslator.com)
• Taum-Meteo (1979): (English/French)
– Domain of weather reports
– Highly successful
• Systran: (among several European languages)
– Human assisted translation
– Rough translation
– Used over the internet through AltaVista
– http://babelsh.altavista.com
MT state of the art
• Broad coverage systems already available
on the web (Systran)
• Reasonable accuracy for specic domains
(TAUM Meteo) or controlled languages
• Machine aided translation is mostly used
Text summarisation
• Key issue
– Text  Shorter version of text
• Applications
– To decide whether it's worth reading the
original text
– To read summary instead of full text
– to automatically produce abstract
Text summarisation
Three main steps
1. Extract \important sentences" (compute
document keywords and score document
sentences wrt these keywords)
2. Cohesion check: Spot anaphoric references
and modify text accordingly (eg add sentence
containing pronoun antecedent; remove dicult
sentences; remove pronoun)
3. Balance and coverage: modify summary to
have an appropriate text structure (delete
redundant sentences; harmonize tense of
verbs; ensure balance and proper coverage)
Text summarisation
• State of the Art
– Sentences extracted on the basis of: location,
linguistic cues, statistical information
– Low discourse coherence
• Commercial systems
– British Telecom's ProSum (transend.labs.bt.com)
– Copernic (www.copernic.com)
– MS Word's Summarisation tool
– See also
http://www.ics.mq.edu.au/~swan/summarization/proje
cts.htm
Information Extraction / Retrieval
and QA
• Given a NL query and a document (e.g., web
pages),
– Retrieve document containing answer (retrieval)
– Fill in template with relevant information (extraction)
– Produce answer to query (Q/A)
• Limited to factoid questions
• Excludes: how-to questions, yes-no questions,
questions that require complex reasoning
• Highest possible accuracy estimated at around
70%
Information Extraction / Retrieval
and QA
• IR systems : google, yahoo, etc.
• QA systems
– AskJeeves (www.askjeeves.com)
– Articial life's Alife Sales Rep (www.articial-
life.com)
– Native Minds'vReps (www.nativeminds.com)
– Soliloquy (www.soliloquy.com)
Language output technologies

• Text-to-Speech

• Tailored document generation

Language output technologies
Text to speech
• Key focus
– Text  Natural sounding speech
• Applications
– Spoken rendering of email via desktop and
telephone
– Document proofreading
– Voice portals
– Computer assisted language learning
Language output technologies
Text to speech
• Requires appropriate use of intonation and
phrasing
• Existing systems
– Scansoft's RealSpeak
(www.lhsl.com/realspeak)
– British Telecom's Laureate
– AT&T Natural Voices
(http://www.naturalvoices.att.com)
Language output technologies
• Tailored document generation
• Key focus
– Document structure + parameters 
Individually tailored documents
• Applications
– Personalised advice giving
– Customised policy manuals
– Web delivered dynamic documents
Language output technologies
• KnowledgePoint (www.knowledgepoint.com)
– Tailored job descriptions
• CoGenTex (www.cogentex.com)
– Project status reports
– Weather reports
NLP application
summary
• NLP application process language using knowledge about language
• All levels of linguistic knowledge are relevant
• Two main problems: ambiguity and paraphrase
• NLP applications use a mix of symbolic and statistical methods
• Current applications are not perfect as
– Symbolic processing is not robust/portable enough
– Statistical processing is not accurate enough
• Applications should be classied into two main types: aids to human users (e.g., spell
checkers, machine aided translations) and agents in their own right (e.g., NL interfaces
to DB, dialogue systems)
• Useful applications have been built since the late 70s
• Commercial success is harder to achieve
Sources
• http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html

• Speech and Language Processing

An introduction to Natural Language Processing,
Comptutational Linguistics, and Speech Recognition, by
Daniel Jurafsky and James H. Martin

Review Your Grammar and Ace Exams
From Everand
Review Your Grammar and Ace Exams
Florian Navarroza-Flores
No ratings yet
Is 13349 (Penstock)
100% (1)
Is 13349 (Penstock)
16 pages
Is 13349 (Penstock)
100% (1)
Is 13349 (Penstock)
16 pages
Ansi Awwa C207-86
100% (1)
Ansi Awwa C207-86
10 pages
Academic Writing Principles and Features: Formality
No ratings yet
Academic Writing Principles and Features: Formality
19 pages
Assessment of Grammar
No ratings yet
Assessment of Grammar
32 pages
NLP_Unit 1
No ratings yet
NLP_Unit 1
38 pages
NLP
No ratings yet
NLP
78 pages
Chapter 6-NLP Basics
No ratings yet
Chapter 6-NLP Basics
27 pages
NLP_Conventional
No ratings yet
NLP_Conventional
27 pages
Lecture 1: Introduction To NLP: Understand Concepts Applications
No ratings yet
Lecture 1: Introduction To NLP: Understand Concepts Applications
32 pages
05 and 06-CCS-lectures.pptx
No ratings yet
05 and 06-CCS-lectures.pptx
106 pages
Document (4)
No ratings yet
Document (4)
17 pages
Speech and Language Processing
No ratings yet
Speech and Language Processing
26 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
45 pages
Introduction To Computational Linguistics: CS 5890 University of Colorado at Colorado Springs
No ratings yet
Introduction To Computational Linguistics: CS 5890 University of Colorado at Colorado Springs
29 pages
Chapter 5 - Communication Perceving and Acting
No ratings yet
Chapter 5 - Communication Perceving and Acting
20 pages
Cse 4022
No ratings yet
Cse 4022
284 pages
Natural Language Processing
No ratings yet
Natural Language Processing
41 pages
El102 Pre-Mid Notes
No ratings yet
El102 Pre-Mid Notes
10 pages
NLP Unit I
No ratings yet
NLP Unit I
117 pages
Natural Language Processing
No ratings yet
Natural Language Processing
44 pages
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
No ratings yet
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
32 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
NLP Cmu
No ratings yet
NLP Cmu
38 pages
NLP Ambiguity
No ratings yet
NLP Ambiguity
35 pages
Intro
No ratings yet
Intro
56 pages
Natural Language Processing: Aman Shakya
No ratings yet
Natural Language Processing: Aman Shakya
17 pages
NLP Unit 2
No ratings yet
NLP Unit 2
48 pages
Atural Anguage Rocessing: Chandra Prakash LPU
No ratings yet
Atural Anguage Rocessing: Chandra Prakash LPU
59 pages
2 why is NLP hard
No ratings yet
2 why is NLP hard
7 pages
Chapter 6
100% (1)
Chapter 6
28 pages
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
No ratings yet
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
30 pages
AI_-NLP-Comp_Vision_updated1
No ratings yet
AI_-NLP-Comp_Vision_updated1
40 pages
Apunts Classe Sem
No ratings yet
Apunts Classe Sem
52 pages
Natural Language Processing Artificial Intelligence
No ratings yet
Natural Language Processing Artificial Intelligence
81 pages
Linguistics Essentials: Instructor: Rada Mihalcea Taught by J. Hajic at Johns Hopkins University
No ratings yet
Linguistics Essentials: Instructor: Rada Mihalcea Taught by J. Hajic at Johns Hopkins University
46 pages
SMC Learning Module For Students in NLP001 1
No ratings yet
SMC Learning Module For Students in NLP001 1
44 pages
Unit I Natural Language Basics Foundations of Natural Language Processing
No ratings yet
Unit I Natural Language Basics Foundations of Natural Language Processing
14 pages
Natural Language Processing by DR A Nagesh
No ratings yet
Natural Language Processing by DR A Nagesh
136 pages
Natural Language Processing
No ratings yet
Natural Language Processing
57 pages
Unit 5
No ratings yet
Unit 5
45 pages
Ai Phases in NLP Sem Vi
No ratings yet
Ai Phases in NLP Sem Vi
3 pages
ai 6
No ratings yet
ai 6
55 pages
Chapter 7 - Communication Perceving and Acting
No ratings yet
Chapter 7 - Communication Perceving and Acting
21 pages
NLP Introduction
No ratings yet
NLP Introduction
35 pages
4.5 The Structure of Language
No ratings yet
4.5 The Structure of Language
5 pages
nlp unit 1
No ratings yet
nlp unit 1
52 pages
Anchas' Mid Psycho
No ratings yet
Anchas' Mid Psycho
5 pages
chapter 1_Natural Language Processing (NLP)
No ratings yet
chapter 1_Natural Language Processing (NLP)
35 pages
Very Excellent Syntax Semantics
No ratings yet
Very Excellent Syntax Semantics
70 pages
NLP Chapter 1
No ratings yet
NLP Chapter 1
41 pages
Lecture 2
No ratings yet
Lecture 2
151 pages
History of NLP
No ratings yet
History of NLP
7 pages
Lecture_2_LinguisticPreliminaries
No ratings yet
Lecture_2_LinguisticPreliminaries
65 pages
8 - LANGUAGE New
No ratings yet
8 - LANGUAGE New
39 pages
7 NLP
No ratings yet
7 NLP
7 pages
Tsa Unit1
No ratings yet
Tsa Unit1
58 pages
SemVII_NaturalLanguageProcessing
No ratings yet
SemVII_NaturalLanguageProcessing
32 pages
Introduction
No ratings yet
Introduction
23 pages
Lecture 4 - NNLS
No ratings yet
Lecture 4 - NNLS
53 pages
Natural Language Processing
No ratings yet
Natural Language Processing
72 pages
Lec 1
No ratings yet
Lec 1
25 pages
How Language Works: How Babies Babble, Words Change Meaning, and Languages Live or Die
From Everand
How Language Works: How Babies Babble, Words Change Meaning, and Languages Live or Die
David Crystal
3.5/5 (78)
Rhetoric - Mastering the Art of Persuasion: From the First Steps to a Perfect Presentation
From Everand
Rhetoric - Mastering the Art of Persuasion: From the First Steps to a Perfect Presentation
Horst Hanisch
No ratings yet
Jib Crane Datasheet Ub 360
100% (1)
Jib Crane Datasheet Ub 360
3 pages
Chemical Compatibility Results From Cole-Parmer India
No ratings yet
Chemical Compatibility Results From Cole-Parmer India
2 pages
Ductile Iron
No ratings yet
Ductile Iron
2 pages
Environment Impact Assessment - DJB
No ratings yet
Environment Impact Assessment - DJB
59 pages
Script
No ratings yet
Script
7 pages
5ºano - Prova 2
No ratings yet
5ºano - Prova 2
4 pages
10th Activities New - Key Ans (1) New
No ratings yet
10th Activities New - Key Ans (1) New
8 pages
1Z0-1122-24-Demo
No ratings yet
1Z0-1122-24-Demo
6 pages
Polish Cases- The Nominative (mianownik) | Mówić po polsku
No ratings yet
Polish Cases- The Nominative (mianownik) | Mówić po polsku
14 pages
Class 4 SST
No ratings yet
Class 4 SST
15 pages
Scientific Writing Introduction To Scientific Writing: Herat University Computer Science Faculty
No ratings yet
Scientific Writing Introduction To Scientific Writing: Herat University Computer Science Faculty
21 pages
Tracing Pad Template!
No ratings yet
Tracing Pad Template!
7 pages
9º Ano - Comparatives-Superlatives
No ratings yet
9º Ano - Comparatives-Superlatives
3 pages
Sport and Exercise British English Student A2 B1
No ratings yet
Sport and Exercise British English Student A2 B1
9 pages
Exercise Junior 2 (Units 15, 16 - Modal Verbs)
No ratings yet
Exercise Junior 2 (Units 15, 16 - Modal Verbs)
5 pages
And Development: Ent: and
No ratings yet
And Development: Ent: and
4 pages
Read & Think French, Premium The Editors Of Think French! Magazine - The ebook is now available, just one click to start reading
100% (3)
Read & Think French, Premium The Editors Of Think French! Magazine - The ebook is now available, just one click to start reading
63 pages
Write Up Final For Frame
100% (1)
Write Up Final For Frame
1 page
50 Most Common Adjectives
No ratings yet
50 Most Common Adjectives
4 pages
Translation Applications - Benefits and Limitations
100% (1)
Translation Applications - Benefits and Limitations
4 pages
Monologo, The Fools Who Dream
No ratings yet
Monologo, The Fools Who Dream
4 pages
English4 - q4 - Mod6 - Prefixes Un-, In-, Im-, Dis-, Mis-, Re, Suffix Ful and Less, Er and or - v4
No ratings yet
English4 - q4 - Mod6 - Prefixes Un-, In-, Im-, Dis-, Mis-, Re, Suffix Ful and Less, Er and or - v4
27 pages
Sem 5 Present Continuous
No ratings yet
Sem 5 Present Continuous
14 pages
PDF (Ebook) Language Disorders: A Functional Approach to Assessment and Intervention by Robert E. Owens Jr. ISBN 9780132978729, 0132978725 download
100% (3)
PDF (Ebook) Language Disorders: A Functional Approach to Assessment and Intervention by Robert E. Owens Jr. ISBN 9780132978729, 0132978725 download
81 pages
F
No ratings yet
F
13 pages
English B1.1: A. Prepositional Verbs and Tense Revision
No ratings yet
English B1.1: A. Prepositional Verbs and Tense Revision
2 pages
From Stepan Barutkin To Anatoly Ivanov: Modal Verbs May, Might, and Be Allowed To
No ratings yet
From Stepan Barutkin To Anatoly Ivanov: Modal Verbs May, Might, and Be Allowed To
5 pages
Perl Interview Questions and Answers: Prepared by - Abhisek Vyas Document Version 1.0
No ratings yet
Perl Interview Questions and Answers: Prepared by - Abhisek Vyas Document Version 1.0
13 pages
Grammar Practice Adverbs of Frequency Worksheet
No ratings yet
Grammar Practice Adverbs of Frequency Worksheet
2 pages
Group 4 Chapter 4 Adverb
No ratings yet
Group 4 Chapter 4 Adverb
54 pages
Grammar Reference A2
100% (1)
Grammar Reference A2
11 pages
12 +CR++Wafaa+Strategies+in+Translating+Collocations+in+Political+Texts+ (1) (9200)
No ratings yet
12 +CR++Wafaa+Strategies+in+Translating+Collocations+in+Political+Texts+ (1) (9200)
22 pages
Code Switching and Code Mixing in Mobile Legends Online Game Dialogue (Sociolinguistic Studies)
No ratings yet
Code Switching and Code Mixing in Mobile Legends Online Game Dialogue (Sociolinguistic Studies)
15 pages

Natural Language Processing Applications: Fabienne Venant Université Nancy2 / Loria 2008/2009

Uploaded by

Natural Language Processing Applications: Fabienne Venant Université Nancy2 / Loria 2008/2009

Uploaded by

Natural Language Processing

Université Nancy2 / Loria

HAL 9000 -- A space odyssey C3PO -- Star Wars

• Dave Bowman: Hello, HAL do you read me, HAL?

– /i:/, /ɜ:/, /ɔ:/, /ɑ:/ and /u:/

• Acoustics phonetics: properties of sound

• Auditory phonetics : speech perception

– Phoneme can be realized in different forms depending on

– Speech synthesis uses allophones

• Flectional morphology : decomposes a word into a lemma and one

• Derivational morphology: decomposes a word into a lemma and one

• Exceptions and irregularities ?

– For information retrieval

I’m sorry Dave, I can’t do that

• Languages have structure:

Adv V Det n Prep NP

often gives a book to Mary

• To construct the meaning of a sentence

• To generate a grammatical sentence

John loves Mary love(j,m)

– Where the hell ‘d you get that idea HAL

Who is the master?

Lewis Carroll, Through the looking glass

• Where the hell did you get that idea?

a swear word that some people

• Definition and representation of meaning

– Difficulty: number of arguments ? Can an argument be optional ?

• Sub categorisation frames

• Selectional restrictions: semantics features on arguments

• For information retrieval (and cross Language

Using Semantic representation in question answering, Sameer S and al, 2003

Dave Bowman: Open the pod bay doors, HAL.

Dave and Frank were planning to disconnect me

 Much of language interpretation is dependent

• Ambiguous item  multiple, alternative

• I cooked waterfowl for her.

• Models and theories are all drawn for the

– Example, deciding whether a word is spelled correctly :

• McCulloch-Pitts neuron (McCulloch and Pitts, 1943) : a simplified

• Kleene (1951) and (1956) : finite automata and regular expressions.

• Shannon (1948) : probabilistic models of discrete Markov processes to

• Chomsky (1956) : finite state machines as a way to characterize a

• George Town University, Washington system:

• The ALPAC report (1964)

• Late 1960s  more formal logical systems

• Bayesian methods to the problem of authorship attribution on The

• SHRDLU system : simulated a robot embedded

 uses predicate logic as a semantic

Largely unsupervised statistical approaches

• Is a talking machine a thinking machine?

"I am X"  "How long have you been X?«

"X YOU Y ME"  "What makes you think I Y

• Rogerian psychotherapist  Can pretend to

Optical character recognition

• Tailored document generation

• Speech and Language Processing

You might also like