Table of Content
Table of Content
A HISTROY OF NLP..............................................................3
INTRODUCTION TO NATURAL LANGAUGE
PROCESSING.......................................................................3
ORIGIN OF NATURAL LANGAUGE.....................................4
GRADUATE PROGRAM IN AI AND NLP..............................4
ASPECTS OF NATURAL LANGAUGE PROCESSING .......4
NATURAL LANGAUGE UNDERSTANDING.........................4
PROBLEMS OF NATURAL LANGAUGE..............................5
Semantics and Pragmatics....................................................6
Semantics and Pragmatics....................................................6
AUTOMATIC SPEECH RECOGNITION................................7
SYNTAX.................................................................................8
Generation............................................................................11
Overview..............................................................................12
Discussion And Conclusion..................................................12
REFERENCES.....................................................................12
PERFACE
Nepal has finally entered the computer age. Today computer technology is the
most powerful, useful technology in the world. There has been a remarkable swing
towards the use of computer as the modern tool in every field. The only criterion for a
successful career in computers is a logical bent of mind and the willingness to learn
continuously. Human life is no more a static existence with the change in time and new
discoveries in computer science. Today, machines can recognize spoken words and
simple sentences.
We have tried to make the book precise and illustrative .The language of the book
has been made as simple as possible. Since this is our first attempt, any unexpected
technical mistake may occur. Therefore, we expect and welcome all productive
comments and suggestions for the improvement of the book.
It's a big challenge to be open and accepting of what is new, as well as the
inevitable changes that happen; being able to utilize, shape and even digest what we have
been given can be difficult, but often rewarding. This paper report is lucky and blessed to
know that there are many people to contribute, give themselves and of themselves to this
book. The task of creating any book requires the talents of many hardworking people
pulling together to meet possible demands. Lastly, we would like to extend our heartfelt
gratitude to our teachers of electronics and computer dept. for their praiseworthy
suggestions, guidelines and providing us materials in preparing this report. We are
grateful to our parents and friends for their support, co-operation and valued suggestion
for the development of this report to such an extent.
Thanking you,
Ganga budathoki
Samu shrestha
Rekha shrestha
ARTIFICIAL INTELLIGENCE (AI)
Artificial Intelligence (AI) is a broad field and means different things to different people. It
is concerned with getting computers to do task that require human intelligence. However, having
said that, there are many tasks, which we might reasonably think require intelligence such as
complex arithmetic, which computers can do very easily. Conversely, there are many tasks that
people do without even thinking. Such as: recognizing a face –which is extremely complex to
automate. AI is concerned with these difficult tasks, which seem to require complex and
sophisticated reasoning processes and knowledge.
People might want to automate human intelligence for a number of different reasons .
.One reason is simply to understand human intelligence better .For example, we may be able to
test and refine psychological and linguistic theories by writing programs which attempt to
stimulate aspects of human behavior. Another reason is simply so that we have smarter
programs .We may not care if the programs accurately stimulate human reasoning, but by
studying human reasoning we may develop using techniques for solving difficult problems.
AI is very large and vast topic. It contains any sub-headings. Among many, Natural
language processing is one of the most important sub fields of AI. Originating from the main topic
AI, the concept of NLP was forwarded, as the computers don't used to understand human
language. So NLP is in process of developing in order to overcome this drawback of the
computer processor.
A HISTROY OF NLP
NLP started when the computer was demobbed after the Second World War. Since then,
it has been used in a variety of roles, starting with machine translation to help fight the cold war to
the present when it is used for tools to aid multilingual communication in international
communities.
The first NLP system to solve an actual task was probably the BASEBALL question
answering system, which handled questions about a database of baseball statistics. Roger
Schank and his students built a series of programs that all had the task of understanding
language. Natural Language Generation was considered from the earlier day of machine
translation in the 1950s,but it didn't appear as a monolingual concern until the 1970s.
DEFINATION
NL is a language spoken or written by humans as opposed to a language use to program
or communicate with computers. The natural language processing generally refers to language
that is typed, printed, or displayed, rather than being spoken.
At present, the use of computer is limited by communication difficulties and effective
use of computers will be possible if people could communicate with them through the use of
NL .The NLP is the formulation and investigation of computationally effective mechanism for
communication through natural language. This involves natural language generation and
understanding .An architecture that contains either one will be considered as containing NLP .If
the user can communicate with it using natural language then it is clear that the architecture has
NLP. I t is true that some of the architecture can theoretically be programmed in such a way so as
to provide NLP. The potential capability is not enough for our criteria we must have an actual
implementation of the architecture showing how it does NLP.
The field of NLP is divided into two parts. Firstly, computers are trained to understand a
natural language such as ordinary English. This will enable the user to communicate with the
machine in a language with which they are already familiar –secondary, the machines can be
trained to produce outputs which are English. This will enable the user to understand what the
computer has to say very clearly. Some natural language interfaces are already available as part
of business software's such as spreadsheet, database etc.
a) Syntactic Analysis
Where we use grammatical rules describing the legal structure of the language to
obtain one or more parser of the sentences.
b) Semantic Analysis
Where we try and obtain an initial representation of the meaning of the
sentences, given the possible parsers or extracting the meaning is known as semantic
analysis.
c) Pragmatic Analysis
Where we use additional contextual information to fill in gaps in the meaning
representation and to work out what the speaker was really getting at.
From knowledge of the meaning of the words and the structure of the sentence we can
work out that someone (who is male) asked for someone who is a boss. But we can't say who
these people are and why the first guy wanted the second. If we know something about the
context (including the last few sentences spoken \ written .we may be able to work things out.
Maybe the last sentence was "Fred had just been sacked ", we know from knowledge that boss
generally sack people and if people want to speak to people who sack them it is generally to
complain about it .We could then really start to get at the meaning of the sentence –Fred wants to
complain to his boss about getting sacked.
Anyway, this second stage of getting at the real contextual is referred to as Pragmatics.
The first stage based on the meanings of the words and the structure of the sentence –is
semantics.
Semantic:
Determining syntax only provides a frame work for understanding." Producing a
syntactic parse of a sentence is only the first step toward understanding that sentence", note
Elaine rich. "At some point", she adds, "A semantic interpretation of the sentence must be
produced ". A semantic analysis is one that interprets a sentence according to meaning rather
than form.
Some methods of semantic analysis make use of various types of grammars, which are
formal systems of rules that attempt to describe the ways that sentences can be constructed. A
semantic grammar for example, applies knowledge about classifications of concepts in a specific
domain to the interpretation of a sentence in order to parse a sentence according to its meaning.
One system of semantic analysis is conceptual dependency developed by Roger
Schank around 1970; this system attempts to classify situations in terms of a limited number of
"primitive" (elemental) concepts. Conceptual dependency provides useful representation for
conceptually equivalent sentences such as "John sold Mary a book" and "Mary brought a book
from John". Schank uses conceptual dependency in conjunction with his system of scripts to
determine meaning from an understanding of plans and goals.
Pragmatic
It is the last stage of analysis, where the meaning is elaborated based on contextual
and world knowledge. Contextual knowledge includes knowledge of the previous sentences
(spoken or written), general knowledge about the world, and knowledge of the speaker.
Possibly the most difficult task facing researchers in understanding natural language is
pragmatics, the study of what people really meant? If we ask," why didn't the company show a
profit last month?" the answer, "Because expenses were higher than income "is not acceptable;
what we probably mean is something like "what mistake did the company make to cause it to lose
money?"
While speech signals carry linguistic information regarding the message to be conveyed,
they also possess extra linguistic information about aspects such as the speaker's identity, dialed,
his psychological and physiological states and the prevailing environmental conditions such as
noise, room acoustics etc. In order to develop high-grade speech recognition systems one has to
learn how to extract the message bearing components from the signal while discarding the rest.
Understanding, considerable fundamental research in speech science is needed before
automatic speech recognition systems can approach anywhere near human performance.
SYNTAX
The stage of syntactic analysis is the best understood stage of NLP. Syntax helps us
understand how words are grouped together to make complex sentences and gives us a starting
point for working out the meaning of the whole sentences. For example, consider the following
two sentences:
1) The dog ate the bone.
2) The bone was eaten by dog.
The rules of syntax help us work out it’s the bone that gets eaten and not the dog. A
simple rule like it’s the second noun that gets eaten just wont work.
REPRESENTATION IN FIGURE
The dog ate the bone.
Correct meaning goes like this:
The bone was eaten by dog.
In other cases there may be many possible grouping of words .For example:
1) (a) John saw (Mary with telescope) i.e. Mary has telescope
(b) John (saw mary with telescope) i.e. John saw her with telescope.
PICTORIAL REPRESENTATION
JOHN Telescope
Mary
Anyway, rules of syntax specify the possible organizations of words in sentences. They
are normally specified by writing a grammar for the language .Of course, just having a grammar
isn't enough to analyze the sentence .we need a parser to use the grammar to analyze the
sentence .The parser should return possible parse trees for the sentence. The next section will
describe how simple grammar and parsers may be written
WRITING A GRAMMAR
A natural language grammar specifies allowable sentences structures in terms of basic
syntactic categories such as nouns and verbs and allows us to determine the structure of the
sentences. It is defined in a similar way to grammar for a programming language, though tends to
be more complex, and the complexity of natural language a given grammar is unlikely to cover all
possible syntactically acceptable sentences.
NOTE:
In natural language we don't usually parse language in order to check that it is correct
.We parse it in order to determine the structure and help work out the meaning. But most
grammars are first concerned with structure of 'correct ' English, as it gets much more complex to
parse if you allow bad English.
PARSING / PARSERS
Having a grammar isn’t enough to parse natural language we need a parser. The parser
should search for possible ways the rules of the grammar can be used to parse sentences. So
parsing can be viewed, as a kind of search problem .In general there may be many different rules
that can be used to 'expand' or rewrite a given syntactic category, and the parser must check
through them all, to see if the sentence can be passed using them.
The core of NLP is 'parser'. It reads each sentence and then proceeds to analyze it. This
process can be divided into three tasks i.e. dividing the signals into its acoustic, phonetic,
morphological syntactic and semantic aspects.
The first part comes into operation only if the input consists of spoken words. i.e.
pragmatic interpretation takes into account the fact that the same words can have different
meaning in different situations .
The second establishes the syntactic form (grammatical arrangement) of the
sentences.
The third tries to extract meaning from these analyzed patterns .It takes into
account the common usages of words in language i.e. semantic interpretation is the
process of extracting the meaning of an utterances as expression in some representation
language.
Parsing had been investigated by linguistics quite independently of and prior to the AI
scientists. They developed tree diagrams for parsing sentences .So parsing is the process of
building a parser tree for an input string. i.e. John ate a banana ,'John ' constitutes a proper noun
and is the subject and 'ate a banana ' constitutes the predicate . The linguistics would parse the
sentence with the help of a tree diagram.
N P V P ( [ ] )
P r o n o u n V P ( [ N P ] ) N P
Y o u V P ( [ N P , N P N] ) P A r t ic a l N o u n
V e r b ( [ N P , NP Pr o ] n) o u n t h e g o ld
G iv e M e
Multiple Parses
In general, as discussed earlier, there may be many different parses for a complex
sentences, as the grammar rules and dictionary allow the same list of word to be parsed in
several different ways. A commonly cited example is the pair of sentences:
Time flies like an arrow
Fruit flies like a banana.
"Files" can be either a verb and a noun, while "likes" can be either a verb or a preposition (or
whatever). So, in the first sentence "time" should be the noun phrase and "flies like an arrow" to
be the verb phrase (with "like an arrow" modifying files). In the second sentence "fruit flies" should
be the noun phrase and "like a banana" to be the verb phrase. Now, we know that there is no
such thing as a "time fly" and it would be a bit strange to "fly like a banana". But without such
general knowledge about word meaning we wouldn't tell which parse is correct, so a parser, with
no semantic component, should return both parses, and leave it up to the semantic stage of
analysis to throw out the bogus one.
Generation
In these lectures we have discussed natural language understanding. However, we
should be aware also of the problems in generating natural language. It we have something we
want to express (e.g. eats (john, chocolate)), or some goal we want to achieve (e.g. get Fred to
close the door), then there are many ways we can achieve that through language.
He eats chocolate.
It's chocolate that John eats.
John eats chocolate.
Chocolate is eaten by John.
Close the door.
It's cold in here.
Can you close the door.
A new generation system must be able to choose appropriately from among the different possible
constructions, based in knowledge of the content. If a complex text is to be written, it must further
know how to make that text coherent.
Anyway, that's enough on Natural language. The main point to understand are roughly
what happens at each stage of analysis (for language understanding), what the problems are and
why, and how to write simple grammars in prolog's DCG formalism.
Overview
The goal of the NLP group is to design and build software that will analyze, understand,
and generate languages that humans use naturally, so that eventually you will be able to address
your computer as though you were addressing another person.
The goal is not easy to reach. "Understanding" language, means, among other things,
knowing how to link those concepts together in a meaningful way. It's ironic that natural language,
the symbol system that is easiest for humans to learn and use, is hardest for a computer to
master. Long after machines have proven capable of inverting large matrices with speed and
grace, they still fail to master the basis of our spoken and written languages.
The challenges we face stem from the highly ambiguous natural language. As an English
speaker you effortlessly understand a sentences like "flying planes can be dangerous." Yet this
sentence presents difficulties to a software program that lacks both your knowledge of the world
and your expression experience with linguistic structures. Is the more plausible interpretation that
the pilot is at risk, or that the danger is to the people on the ground? Should "can" be analyzed as
a verb or as a noun? Which of the many possible meanings of "plane" could refer to, among other
things, an airplane, a geometric object, or a woodworking tool. How much and what sort of
content needs to be brought to bear on these questions in order to adequately disambiguate the
sentence?
We address these problems using a mix of knowledge engineered and
statistical/machine-learning techniques to disambiguate and respond to natural language input.
Our work has implications for applications like text critiquing, information retrieval, question
answering, summarization, gaming, and translation.
REFERENCES
1) Yahoo.com
2) Google.com
3) Artificial Intelligence:
4) -