NLP Part I Unit I Notes
NLP Part I Unit I Notes
UNIT- I
Introduction to Natural language
The Study of Language, Applications of NLP, Evaluating Language Understanding Systems,
Different Levels of Language Analysis, Representations and Understanding, Organization of
Natural language Understanding Systems, Linguistic Background: An outline of English
Syntax.
Semantic Ambiguity
Philosophers How words can mean anything at all What is meaning, and how do
and how they identify objects in the words and sentences acquire it?
world. How do words identify objects in
the world?
[2]Applications of NLU:
• The applications can be divided into two major classes:
• Text-based applications
• Dialogue-based applications.
• Text-based applications involve the processing of written text, such as books,
newspapers, reports, manuals, email messages, and so on. These are all reading-
based tasks.
Text-based natural language applications such as
• Finding appropriate documents on certain topics from a data base of texts (for example,
finding relevant books in a library)
• Extracting information from messages or articles on certain topics (for example,
building a database of all stock transactions described in the news on a given day)
• Translating documents from one language to another (for example, producing
automobile repair manuals in many different languages)
• Summarizing texts for certain purposes (for example, producing a 3-page summary of
a 1000-page government report)
Dialogue-based applications
• Dialogue-based applications involve human-machine communication. Most naturally
this involves spoken language, but it also includes interaction using keyboards.
Typical potential applications include.
• Question-answering systems, where natural language is used to query a database (for
example, a query system to a personnel database)
• Automated customer service over the telephone (for example, to perform banking
transactions or order items from a catalogue)
• Tutoring systems, where the machine interacts with a student (for example, an
automated mathematics tutoring system)
• Spoken language control of a machine (for example, voice control of a VCR or
computer)
• General cooperative problem-solving systems (for example, a system that helps a
person plan and schedule freight shipments)
[4]Evaluating Language Understanding Systems:
• To evaluate a system is to run the program and see how well it performs the task it was
designed to do.
• If the program is meant to answer questions about a database of facts, you might ask it
questions to see how good it is at producing the correct answers.
• The evaluation has two types 1) Black box evaluation 2) Glass box evaluation
• If the system is designed to participate in simple conversations on a certain topic, you
might try conversing with it.
• This is called black box evaluation because it evaluates system performance without
looking inside to see how it works.
• Only when the success rates become high, making a practical application feasible, can
much significance be given to overall system performance measures.
Glass box evaluation
• This method of evaluation is to identify various subcomponents of a system and then
evaluate each one with appropriate tests.
• This is called glass box evaluation because you look inside at the structure of the
system.
• The problem with glass box evaluation is that it requires some agreement on what the
various components of a natural language system should be.
• Achieving such a agreement is an area of considerable activity at the present.
• Given a sentence S.
• find a keyword in S whose pattern matches S.
• If there is more than one keyword, pick the one with the highest integer value.
• Use the output specification that is associated with this keyword to generate the
next sentence.
• If there are no keywords, generate an innocuous continuation statement, such as
"Tell me more" or "Go on".
• The following are some of the different forms of knowledge relevant for natural language
understanding:
• Phonetic and phonological knowledge - concerns how words are related to the sounds
that realize them. Such knowledge is crucial for speech-based systems.
• Morphological knowledge - concerns how words are constructed from more basic
meaning units called morphemes.
• A morpheme is the primitive unit of meaning in a language.
• (for example, the meaning of the word "friendly" is derivable from the meaning of the noun
"friend" and the suffix "-ly", which transforms a noun into an adjective).
• Syntactic knowledge - concerns how words can be put together to form correct sentences
and determines what structural role each word plays in the sentence and what phrases are
subparts of what other phrases.
• Semantic knowledge - concerns what words mean and how these meanings -combine in
sentences to form sentence meanings.
• Pragmatic knowledge - concerns how sentences are used in different situations and how use
affects the interpretation of the sentence.
• World knowledge - includes the general knowledge about the structure of the world that
language users must have in order to, for example, maintain a conversation.
• It includes what each language user must know about the other user’s beliefs and goals.
Syntax, Semantics, and Pragmatics
• The following examples may help you understand the distinction between syntax,
semantics, and pragmatics.
1. Language is one of the fundamental aspects of human behavior and is a crucial
component of our lives.
2. Green frogs have large noses.
3. Green ideas have large noses.
4. Large have green ideas nose.
Sentence 1 appears to be a reasonable start. It agrees with all that is known about syntax,
semantics, and pragmatics.
Each of the other sentences violates one or more of these levels.
Sentence 2 is well-formed syntactically and semantically, but not pragmatically.
sentence 3 is much worse. Not only is it obviously pragmatically ill-formed, it is also
semantically ill-formed.
SVR ENGINEERINGCOLLEGE NANDYAL 6
UNIT –I Natural Language Processing
what is wrong with it: Ideas cannot be green and, even if they could, they certainly cannot
have large noses.
Sentence 4 is even worse. In fact, it is unintelligible, even though it contains the same
words as sentence 3. It does not even have enough structure to allow you to say what is wrong
with it. Thus it is syntactically ill-formed.
For example, if I ask you where you are going and you reply "I go store", the response would
be understandable even though it is syntactically ill-formed. Thus it is at least pragmatically
well-formed.
• Sentence 5 is ill-formed because the subject and the verb do not agree in number (the
subject is singular and the verb is plural).
• Sentence 6 is ill-formed because the verb put requires some modifier that describes where
John put the object.
• In the first reading, the sentence is formed from a noun phrase (NP) describing a type of
fly’ rice flies, and a verb phrase (VP) that asserts that these flies like sand.
• In the second structure, the sentence is formed from a noun phrase describing a type of
substance, rice, and a verb phrase stating that this substance flies like sand (say, if you throw
it).
• The two structures also give further details on the structure of the noun phrase and verb
phrase and identify the part of speech for each word.
• In particular, the word "like" is a verb (V) in the first reading and a preposition (P) in the
second.
The Logical Form
• The structure of a sentence doesn’t reflect its meaning
• For example, the NP "the catch" can have different meanings depending on whether
the speaker is talking about a baseball game or a fishing expedition.
• Both these interpretations have the same syntactic structure, and the different
meanings arise from an ambiguity concerning the sense of the word "catch".
• Once the correct sense is identified, say the fishing sense, there still is a problem in
determining what fish are being referred to.
• The intended meaning of a sentence depends on the situation in which the sentence is
produced.
• It uses knowledge of the discourse context (determined by the sentences that preceded the
current one) and knowledge of the application to produce a final representation.