3nlp Computer
3nlp Computer
3nlp Computer
iq
rd
3 class
P
References:
1. Alian Rich, Artificial Intelligence,1989.
2. William A. Stubblefield & Luger E.George,Artificial
Intelligence and the Design of Expert Systems, 1998.
3. Daniel Jurafsky and James H. Martin Speech and language
processing : Introduction to natural language processing ,
computational linguistics and speech recognition second
edition 2006.
4. Daniel H. Marcellus Artificial Intelligence and the design of
expert systems 1998
Informal method:
Example:
Computers milk drinks.
Computer drinks milk.
Computers use data.
Formal method:
1. lexical analysis.(word)
2. syntactical analysis.(grammars)
3. semantic analysis.(meanings)
Intelligent robot:
The robot would have to know:
1. The meaning of the words.
2. Relationship of one word to another.
3. Knowledge of grammars.
4. Associate descriptions and objects.
5. Analyze sentence in relation to another sentences.
e.g
- John drank milk
-he then put on his coat.
Understanding
What understands?
To understand something is transform it from one representation into
another, where this second representation has been chosen to correspond to a
set of available actions that could be performed and where the mapping has
been designed. So that for each event, an appropriate action will be done.
Syntactic parsing
Syntactic parsing is the step in which a flat input sentence is
converted into a hierarchical structure that correspond to the units of
meaning in the sentence, this process is called parsing.
Parsing is the problem of constructing a derivation or a parse tree for
an input string from a formal definition of a grammar. Parsing
algorithms falls into two classes: top-down parsers, which begin with
top level sentence symbol and attempt to build a tree whose leaves
match the target sentence. And bottom-up-parsers, which starts with
the words in the sentence (the terminal and attempt to find a series of
reductions that yield the sentence symbol.
This step plays an important role in many natural languages
understanding system for two reasons:
1. Semantic processing must operate on sentence components. If there
is no parsing step then the semantics system must decide on its own
component.
If the parsing is done, then its constrain the number of component
that
semantic
can
considered,
additionally
parsing
is
Example:
The president has a cat
CFG:
Sentence NP.VP
NP article . noun
VP V NP
noun president / cat
artical the / a
V has
Parse tree
The parsing process takes the rules of the grammar and compares
them against the input sentence. Each rule that matches the
sentence adds something to a complete structure that is being
build for the sentence; the simplest structure build is the parse
tree.
Sentence
artical
the
NP
VP
noun
NP
President
has
artical
noun
cat
Sentence
NP VP
Article Noun VP
The Noun VP
The President VP
The president V NP
The president has NP
The president has article Noun
The president has a noun
The president has a cat
Bottom-up derivation
The president has a cat
Article president has a cat
Article noun has a cat
Article noun V a cat
Article noun V article cat
Article noun V article noun
NP V article noun
NP V NP
NP VP
Sentence
Advantage of CFG
1. its ability to describe, build and design structural language
(programming language)
like pascal , FORTRAN
2. it is easy to be programmed and combined to automatic processing
system.
Disadvantage of CFG
1. limited capability when concerned with NLP, natural languages have
too many complicated rules in order to define them that would require
too many CFG rules to represent them all.
2. CFG is unable to describe discrete structural sentences, such as:
Arthur, Barry, and David are husbands of Jane, Joan and Jill.
Grammar ambiguity
Ambiguity is the case in which a grammar produces more than one parse
tree for some given sentence, such a grammar is called ambiguous
grammar. In this case we can not determine which parse tree to select for a
given sentence.
Example:
john saw a boy with a telescope
NP
NP
VP
VP
N
N
NP
PP
NP
John
John
saw
the
art
Boy
NP
with
art
saw
Telescope
art
the
N PP
Boy
NP
with
ar
a
Telescop
The broker
NP determiner, adjective, noun
A spicy pizza
NP determiner, adjective, adjective, noun
The big red apple
Clauses
Determiner([the])
Noun([man])
Adjective([big])
Verb([cried])
Example:
sentence (S) :- append (S1,S2,S), NP(S1), VP(S2).
We can write this rule with another argument that carries number information
The variable X could be bound to any symbol, but the rule actually will be
used in only two ways:
Verbs and nouns must generally be cataloged in both a singular and plural
form, for example,
Verb ([thinks],singular)
Verb([think],plural).
Noun([doctor],singular).
Noun([doctors],plural).
For determiner,
Determiner([that],singular).
Determiner([those],plural).
Determiner([the],_).
Determiner([a],_).
NP(S):-append(S1,S2,S),append(S3,S4,S1),determiner(S2),
adjective(S3),noun(S4).
At the highest level there is a rule that will look like this:
Sentence (S0,S) :- NP(S0,S1), backsen1(S1,S).
The goal predicates for any sentence must carry an empty list as the second
argument since all the word in the first list will be stripped off.
For the following sentence the young engineer smiled , will be as follow
Introduction
One of the key requirements for program to appear intelligently is
that it be able to interact with user in something like every day speech
rather than in an artificial computer
Like language. It is very hard, however, to make a computer understand
normal English.
Sentences in everyday language are frequently somewhat ambiguous, and
they often imply things rather than state them.
In this lecture we will study progressively informal strategy for
constructing natural language interface, it work by finding the keyword in
those limited cases where we can guarantee that one word in the sentence
will bear all the important information.
where the topic can be found. The first two arguments in ind are strings,
and the third is a list of strings.
Ind (BOOK1,LOGOFF,[99]).
Ind (BOOK1,EDLIN,[41-46,57]).
Frequently when the user asks about a particular term, it will be talked
about in a way that is different from the exact index citation. It is useful to
maintain a stock of synonyms that the system will recognize.
For instance, there is only one index entry for LOGOFF, but since a user
might try to access this information under LOGOUT.
Dsyn(LOGOUT,LOGOFF).
Dsyn(LOGIN,LOGON)
The last data item used by the system is a group of words commonly
occurring in questions but that have nothing to do with the keyword
strategy. They should be ignored.
Reject(HOW).
Reject(GO).
Reject(HOW).
All the work of this system takes place between the repeat and fail
predicates.
The getquery predicate prompts the user for his question and brings it
back as a list of symbols bound to the predicates argument.
The first argument is string and the second argument is the equivalent list
of symbol.
After getquery does its work, the next thing is to isolate the keyword that
the query is relay about. By use findref.
Findref(X,Y):-memberof(Y,X),not(reject(Y)),!.
Produceans(X):-ind(X1,X,Y),putflag,
write(X,can be found in the refrence,X1,on pages ,Y,.),nl.
produceans(X):-syn(X,Z),ind(X1,Z,Y),putflag,
The first two clauses setup answers to queries, and the third clause writes
a message that says no answer.
Put flag predicate used to put a flag in the data base when either the first
two rules succeed. This means that the answer was found. The third
clause tests the flag to decide if it should print nothing found message.
The last clause removes the flag from the database so that it dose not
interfere with the operation of second query.
Putflag:- not(flag),assert(flag),!.
Putflag.
Remflag:- flag,retract(flag),!.
Remflag.
Finally, the following rules treat the synonyms. One of the rules is for the
purpose of stating that if A is synonym of B. then B is synonym of A.
there is also rules that force the system to recognize predictable singular
and plural forms of words as being synonyms of each other.
Syn(X,Y):-dsyn(X,Y).
Syn(Y,X):-dsyn(Y,X).
Dsyn(Y,X):-concat(X,S,Y).
Dsyn(Y,X):-concat(X,ES,Y).
Dsyn(Y,X):-concat(X,S,Y).
Action sentences require much more work to analyze. They are the
subject with which case grammar deal.
2. Agent case: is the entity that applies the action to the object.
The realtor and his assistant in inspected a house for their
client
3. co-agent case: identifies another agent how shares in applying
the action, it is possessive noun followed by noun,
The realtor and his assistant in inspected a house for their
client
4. Beneficiary case: the entity, whose behalf the action in the
sentence was performed,
The realtor and his assistant in inspected a house for their
client
5. Location case: express where the action take place.
In the school.
6. time case: express when the action took place, at 5 oclock .
7. instrument case: identify something used by the agent to apply
the
We can easily write down the grammar (the grammar are just rules
that describe something that is complex in terms of a catalog of
more basic parts) that describe what happing in these sentence
using thematic analysis.
Sentence agent, verb1, object
Sentence agent, verb1, instrument
Sentence agent, verb1, object, trajectory
Agent pronoun
Agent proper-noun
Agent noun-phrase
Object pronoun
Object proper-noun
Object noun-phrase
example
I hit
He hit
We hit
All the rules for the sentences can be categorized using difference pair
idea:
Sentence(S,S0) :- agent (S,S1), backparta (S1,S0).
Backparta (S,S0):- verb (S,S1), object (S1,S0).
Sentence(S,S0):- agent(S,S1),backpartb(S1,S0).
Sentence(S,S0):- verb(S,S1),backpartc(S1,S0).
Backpartc(S,S0):-object(S,S1), instrument(S1,S0).
Sentence(S,S0):-agent(S,S1),backpartd(S1,S0).
Backpartd(S,S0):- verb(S,S1),backparte(S1,S).
Backparte(S,S0):-object(S,S1),backpartf(S1,S).
Backpartf(S,S0):- trajectory(S,S1),time(S1,S0).
The lower level rules would all be set up in a similar manner.
This model of MT usage is effective especially for high volume jobs and
those requiring quick turn-around:
The translation of S/W manuals for localization to reach new
markets.
The translation of market-moving financial news, for example from
Japanese To English for use by stock traders.
3. Tasks limited to small sublanguage domains in which fully
automatic high-quality translation is achievable.
The following domains have a limited vocabulary and only a few basic
phrase types:1.Weather forecasting is an example of a sublanguage domain that can be
modeled completely enough to use raw MT output even without postediting. Weather forecasting consists of phrases like:
-Cloudy today and Thursday,
-Low tonight 4,
-Outlook for Friday: Sunny.
2. Equipment maintenance manuals.
3.Air travel queries.
4. Appointment scheduling.
5.Restaurant recommendations.
Language Similarities & Differences:Even when language differ, these differences often have systematic
structure. The study of systematic cross-linguistic similarities and
differences is called Topology.
Machine Translation in that the difficulty of translating form one
language to another depends a great deal on how similar the languages are
in their vocabulary, grammar, and conceptual structure.
1. Syntactically, languages are different in the basic word order of verb,
subject, and object in simple declarative clauses.
- English, French, German: are all SVO languages, meaning that verb
tends to come between the subject & object.
-Hindi and Japanese: are SOV languages, meaning that the verb tends to
come at the end of clauses.
-Arabic: is VSO language.
Similarities:SVO languages have prepositions.
SOV languages have postposition.
2. sometimes, rather than a single word, there is a fixed phrase in the
target language.
( Arabic)
grow up (English)
Informatique (French)
English
Japanese
brother
otooto (younger)
oniisan (older)
German
wall
wand (inside)
mauer (outside)
4. One language may have a (lexical gap), where no word or phrase can
express the meaning of a word in the other language.
1. Direct Translation
In direct translation, we proceed word-by-word through the source
language text translating each word as we go. We make use of no
intermediate structures, except for shallow morphological analysis; each
source word is directly mapped onto some target word. Direct translation
is thus based on a large bilingual dictionary; each entry in the dictionary
can be viewed as a small program whose job is to translate one word.
After the words are translated, simple reordering rules can apply, for
una bofetada a la
not gave a
slap
bruja verde
Step 2 presumes that the bilingual dictionary has the phrase dar una
bofetada a as the Spanish translation of English slap. The local
reordering step 3 would need to switch the adjective-noun ordering from
green witch to bruja verde. And some combination of ordering rules and
the dictionary would deal with the negation and past tense in English
didn't. While the direct approach can deal with our simple Spanish
After 2: Lexical Transfer Maria PAST no dar una bofetada a la verde bruja
After 3: Local reordering Maria no dar PAST una bofetada a la bruja verde
After 4: Morphology
Finally, even more complex reorderings occur when we translate from SVO
to SOV languages, as we see in the English-Japanese example ,
He adores listening to music
kare ha ongaku wo kiku
he
music
no ga daisuki desu
to listening adores
These examples suggest that the direct approach is too focused on individual
words, and that in order to deal with real examples we'll need to add
phrasal and structural knowledge into our MT models.
2.
Transfer
languages differ systematically in structural ways. One strategy for
=>
Nominal
Noun Adj
These syntactic transformations are operations that map from one tree
structure to another. The transfer approach and this rule can be applied to
our example Mary did not slap the green witch. Besides this transformation
rule, we'll need to assume that the morphological processing figures out that
didn 't is composed of do-PAST plus not, and that the parser attaches the
PAST feature onto the VP. Lexical transfer, via lookup in the bilingual
dictionary, will then remove do, change not to no, and turn slap into the
phrase dar una bofetada a, with a slight rearrangement of the parse tree .
He adores listening to music
kare ha ongaku wo kiku
he
music
to
listening
no ga daisuki desu
adores
Speech recognition
1. Understanding Spoken Language
1. Sampling.
3.
4.
5.
Segmentation.
6.
7. Responding to a message.
2.1 Sampling
The speech signal is first sampled and then digitized using an
analogue to digital converter. The result from this stage is then stored
in the computer.
certain characteristics
the vocal cords. The striations themselves are of different duration, and these
differences arise from the variations of pitch. Thus, the changes in pitch (i.e. the
intonation) can be tracked through the spectrogram.
Several dark bands can be seen on the spectrogram, corresponding to the lower
few formants which are produced during vowel production. For example, nonnasalized vowels give rise to spectral peaks in the spectrum. These peaks can
reliably be correlated to the resonant frequency of the vocal tract at the time of
production.
2.4 Pitch contour evaluation
The fundamental frequency of voicing is of use for determining stress and
intonation. The contours of the fundamental frequency (as it rises and falls) can
also be used as an indication of major syntactic boundaries. Additionally,
stress patterns, rhythm and intonation carry some clues as to the Phonetic
identify of some speech sounds.
addition, several
different templates may have to be stored for each word in order to take into
account the differences which are a direct result of the varying speech
rates of individual speakers, the channel noise, etc. Thus, some form of
speech processing is desirable to eliminate redundant information.
Some of the techniques to eliminate redundant information include:
(a) Warping the incoming signal to match certain features of the stored
Templates. For example, the word can b6 chopped up into segments,
each being processed in the normal way. The individual segments for each
unit can then be compared with the corresponding units in the stored
template. Each segment will be compressed/stretched, as appropriate, until
it matches. The amount of distortion required for each segment is then
recorded. Finally, the template requiring the least distortion will be
selected.
(b) Detecting certain features of the incoming signal, so that they can be
used to adjust the signal to a closer match with the stored template. For
example, the fundamental frequency of the incoming signal can be determined
and the difference this makes to the stored template can be taken into
account. Therefore, irrelevant differences between the stored template and the
incoming signal can be reduced.
Allophone
Certain allophones can provide word or syllable boundary information
which can be, very useful in recognition systems. For example, some
allophonic variations occur only at the end or beginning of words/syllables. In
addition, segmentation at the allophonic level eliminates the need for applying
co-articulation rules at a lower level. The disadvantages are that there are many
thousands of allophones for any given language, depending on how narrow the
phonetic transcription is. Furthermore, the successful identification of the
allophone is very much dependent on the context.
Phoneme
The phoneme represents the smallest number of distinctive phonological classes
for recognition and is substantially less than the number of allophones,
syllables, words, etc. (there are approximately 43 phonemes in the English
language). Although the number of phonemes is small, their automatic
recognition by a computer system is still a major problem, since there are no
Diphone
The term diphone is used to represent a vowel-consonant sequence, such that
the segment is taken from the centre of the vowel to the centre of the -,
consonant; the segment is often referred to as transeme. The diphone is
employed because a great deal of acoustic information that is used to identify
the consonants lies in the transitions between the consonants and the vowels.
One of the advantages of considering the diphone is that it includes transitional
information which is necessary for recognition in many cases. Furthermore,
since it is taken across two sounds, it contains within itself some of the coarticulation information which is not present in other units such as the phoneme.
One of the disadvantages with diphones is the large number of diphones (running
into thousands). Furthermore, the phonological rules as they are written at
preset are not easily applied to diphones.
Syllable
saved. The main disadvantages are that when dealing with a very large
lexicon, the scanning of the templates to find the best match can be very
time-consuming. There are many algorithms to reduce the number of
templates scanned; an example here is the use of partial phonetic
information, in lexical, (word) access. There is also the problem of recognizing
word boundaries, because when we speak we tend to anticipate the next word.
This results in the beginnings and in particular the ends, of words being
distorted from the stored templates of words spoken in isolation. The
problem then becomes one of determining how a template may be distorted in
these situations.
Syntactical analysis can also be used to restrict the recognition of the next
word on the basis of previously stored words. This is desirable, because with
large lexicons the task of searching for the best match can become very
expensive computationally. However, due to the vagaries of the English
language, this syntactical approach has certain limitations, including the
difficulty in distinguishing between well-formed and poorly formed
sentences. A statistical approach can be adopted at all levels of decision
making where a score can be assigned to each of the alternatives on the
basis of past history; the alternatives with the highest score being the one
selected for further processing. There are many ways of obtaining the highest
score. The breadth first search' computes the score at each alternative and
selects the route with the highest score. The depth first search' selects the
highest score at the initial level and then pursues this initial choice in
subsequent levels, in a 'depth first' manner. The problem with the 'depth first'
technique is that the system is committed to the consequences of the first
choice. There are also searching techniques which area hybrid of 'breadth
first' and 'depth first' techniques.
(ASR)
research
is to
address this
problem
model, and seeing if it matches the output. We then select the best matching
source sentence as our desired source sentence.
Fig.3 shows further details of the operationalization in, which shows the
components of an HMM speech recognizer as it processes a single utterance.
The figure shows the recognition process in three stages. In the feature
extraction or signal
1. Speech Synthesis
The modern task of speech synthesis, also called text-to-speech or
TTS, is to produce speech (acoustic waveforms) from text input. Modern
speech synthesis has a wide variety of applications. Synthesizers are used,
together with speech recognizers, in telephone-based conversational agents
that conduct dialogues with people. Synthesizer are also important in nonconversational applications that speak to people, such as in devices that read
out loud for the blind, or in video games or childrens toys. Finally, speech
synthesis can be used to speak for sufferers of neurological disorders, such
as astrophysicist Steven Hawking who, having lost the use of his voice due
to ALS, speaks by typing to a speech synthsizer and having the synthesizer
speak out the words.
The task of speech synthesis is to map a text like the following:
PG&E will file schedules on April 20.
to a waveform like the following:
Speech synthesis systems perform this mapping in two steps, first converting
the input text into a phonemic internal representation and then converting
this internal representation into a waveform. We will call the first step text
analysis and the second step waveform synthesis (although other names are
also used for these steps).
collectible. Cousins, however, was insistent that all debts will be collected: We continue
to pursue monies owing and we expect to be paid for electricity we have sold.
In order to produce an alignment for each word in the training set, we take
this allowables list for all the letters, and for each word in the training set,
we find all alignments between the pronunciation and the spelling that
conform to the allowables list. From this large list of alignments, we
compute, by summing over all alignments for all words, the total count for
each letter being aligned to each phone (or multiphone or e). From these
counts we can normalize to get for each phone pi and letter l j
a probability P(pi|l j):
P(pi|l j) =count(pi, l j) /count(l j)
We can now take these probabilities and realign the letters to the phones,
using the Viterbi algorithm to produce the best (Viterbi) alignment for each
word, where the probability of each alignment is just the product of all the
individual phone/letter alignments.
In this way we can produce a single good alignment A for each particular
pair (P,L) in our training set.
1.3.3 Tune
Two utterances with the same prominence and phrasing patterns can
still differ prosodically by having different tunes. The tune of an utterance
is the rise and fall of its F0 over time. A very obvious example of tune is the
difference between statements and yes-no questions in English. The same
sentence can be said with a final rise in F0 to indicate a yes-no-question, or a
final fall in F0 to indicate a declarative intonation.
Fig.4 shows the F0 track of the same words spoken as a question or a
statement. Note that the question rises at the end; this is often called a
question rise. The falling intonation of the statement is called a final fall.
Figure 4 The same text read as the statement You know what I mean. (on
the left) and as a question You knowwhat I mean? (on the right). Notice
that yes-no-question intonation in English has a sharp final rise in F0.
It turns out that English makes very wide use of tune to express meaning.
Besides this well known rise for yes-no questions, an English phrase
containing a list of nouns separated by commas often has a short rise called a
continuation rise after each noun. English also has characteristic contours
to express contradiction, to express surprise, and many more.
The mapping between meaning and tune in English is extremely complex,
and linguistic theories of intonation like ToBI have only begun to develop
sophisticated models of this mapping. In practice, therefore, most synthesis
systems just distinguish two or three tunes, such as the continuation rise (at
commas), the question rise (at question mark if the question is a yes-no
question), and a final fall otherwise.
1.
2.
3.
4.
5.
Written Text
computer is
Spoken Language
Input to
done by 1. Input to computer is done by source of
keyboard.
voice such as human speech using
microphone which is connected to
computer.
People can create more complex 2. Since speech is done in interactive
communication more errors are created.
sentences than in the spoken form
because they have more time to think
& plan.
A writer an pause for days or months 3. Every pause, restart, revision of speech
before continuing a though.
would hesitation the listener.
Formal and Informal methods are 4. Spoken mode must has speech
used.
recognition
dependent
on:
text
dependent SR sys. & text independent
SR sys. And Isolated or continuous
speech.
Informal method has three stages to 5. SR sys. Has many stages to convert
perform:
speech -to- text that stored in computer:
- Lexical analysis.
- Sampling.
- Syntax analysis.
- Detecting start & end of signal.
- Semantic analysis.
- Calculation of speech spectra.
- Pitch contour evaluation.
- Segmentation.
- Word recognition.
- Message response.