0% found this document useful (0 votes)
223 views

Table of Content

This document contains a table of contents for a report on natural language processing (NLP). The table of contents outlines sections on the history of NLP, introductions to NLP and natural language, origins of natural language, graduate programs in AI and NLP, aspects of NLP like understanding and speech recognition, and conclusions. The document provides an overview of the topics to be discussed in the full report on natural language processing.

Uploaded by

rekha_rckstar
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
223 views

Table of Content

This document contains a table of contents for a report on natural language processing (NLP). The table of contents outlines sections on the history of NLP, introductions to NLP and natural language, origins of natural language, graduate programs in AI and NLP, aspects of NLP like understanding and speech recognition, and conclusions. The document provides an overview of the topics to be discussed in the full report on natural language processing.

Uploaded by

rekha_rckstar
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 13

Table Of Content

A HISTROY OF NLP..............................................................3
INTRODUCTION TO NATURAL LANGAUGE
PROCESSING.......................................................................3
ORIGIN OF NATURAL LANGAUGE.....................................4
GRADUATE PROGRAM IN AI AND NLP..............................4
ASPECTS OF NATURAL LANGAUGE PROCESSING .......4
NATURAL LANGAUGE UNDERSTANDING.........................4
PROBLEMS OF NATURAL LANGAUGE..............................5
Semantics and Pragmatics....................................................6
Semantics and Pragmatics....................................................6
AUTOMATIC SPEECH RECOGNITION................................7
SYNTAX.................................................................................8
Generation............................................................................11
Overview..............................................................................12
Discussion And Conclusion..................................................12
REFERENCES.....................................................................12
PERFACE
Nepal has finally entered the computer age. Today computer technology is the
most powerful, useful technology in the world. There has been a remarkable swing
towards the use of computer as the modern tool in every field. The only criterion for a
successful career in computers is a logical bent of mind and the willingness to learn
continuously. Human life is no more a static existence with the change in time and new
discoveries in computer science. Today, machines can recognize spoken words and
simple sentences.

We have tried to make the book precise and illustrative .The language of the book
has been made as simple as possible. Since this is our first attempt, any unexpected
technical mistake may occur. Therefore, we expect and welcome all productive
comments and suggestions for the improvement of the book.

It's a big challenge to be open and accepting of what is new, as well as the
inevitable changes that happen; being able to utilize, shape and even digest what we have
been given can be difficult, but often rewarding. This paper report is lucky and blessed to
know that there are many people to contribute, give themselves and of themselves to this
book. The task of creating any book requires the talents of many hardworking people
pulling together to meet possible demands. Lastly, we would like to extend our heartfelt
gratitude to our teachers of electronics and computer dept. for their praiseworthy
suggestions, guidelines and providing us materials in preparing this report. We are
grateful to our parents and friends for their support, co-operation and valued suggestion
for the development of this report to such an extent.

We feel, this report will forecast a sense of information on Natural Language


Processing.

Thanking you,

Ganga budathoki
Samu shrestha
Rekha shrestha
ARTIFICIAL INTELLIGENCE (AI)
Artificial Intelligence (AI) is a broad field and means different things to different people. It
is concerned with getting computers to do task that require human intelligence. However, having
said that, there are many tasks, which we might reasonably think require intelligence such as
complex arithmetic, which computers can do very easily. Conversely, there are many tasks that
people do without even thinking. Such as: recognizing a face –which is extremely complex to
automate. AI is concerned with these difficult tasks, which seem to require complex and
sophisticated reasoning processes and knowledge.
People might want to automate human intelligence for a number of different reasons .
.One reason is simply to understand human intelligence better .For example, we may be able to
test and refine psychological and linguistic theories by writing programs which attempt to
stimulate aspects of human behavior. Another reason is simply so that we have smarter
programs .We may not care if the programs accurately stimulate human reasoning, but by
studying human reasoning we may develop using techniques for solving difficult problems.
AI is very large and vast topic. It contains any sub-headings. Among many, Natural
language processing is one of the most important sub fields of AI. Originating from the main topic
AI, the concept of NLP was forwarded, as the computers don't used to understand human
language. So NLP is in process of developing in order to overcome this drawback of the
computer processor.

A HISTROY OF NLP
NLP started when the computer was demobbed after the Second World War. Since then,
it has been used in a variety of roles, starting with machine translation to help fight the cold war to
the present when it is used for tools to aid multilingual communication in international
communities.
The first NLP system to solve an actual task was probably the BASEBALL question
answering system, which handled questions about a database of baseball statistics. Roger
Schank and his students built a series of programs that all had the task of understanding
language. Natural Language Generation was considered from the earlier day of machine
translation in the 1950s,but it didn't appear as a monolingual concern until the 1970s.

INTRODUCTION TO NATURAL LANGAUGE PROCESSING

In NL we are primarily concerned understanding spoken or typed language. We are trying


to get from some input sense data to some representation of what that data really means .Of
course "what it really means " is rather vague we aren't generally concerned with the deep
philosophical implications of the data (e.g. beautiful sunrise means there is sense to the
universe). We are just concerned with getting a sufficient interpretation for our purposes. In NL
we might want to be able to answer the users NL questions given some database of information
the meaning of the sentence might then look something like a formal database query.

DEFINATION
NL is a language spoken or written by humans as opposed to a language use to program
or communicate with computers. The natural language processing generally refers to language
that is typed, printed, or displayed, rather than being spoken.
At present, the use of computer is limited by communication difficulties and effective
use of computers will be possible if people could communicate with them through the use of
NL .The NLP is the formulation and investigation of computationally effective mechanism for
communication through natural language. This involves natural language generation and
understanding .An architecture that contains either one will be considered as containing NLP .If
the user can communicate with it using natural language then it is clear that the architecture has
NLP. I t is true that some of the architecture can theoretically be programmed in such a way so as
to provide NLP. The potential capability is not enough for our criteria we must have an actual
implementation of the architecture showing how it does NLP.
The field of NLP is divided into two parts. Firstly, computers are trained to understand a
natural language such as ordinary English. This will enable the user to communicate with the
machine in a language with which they are already familiar –secondary, the machines can be
trained to produce outputs which are English. This will enable the user to understand what the
computer has to say very clearly. Some natural language interfaces are already available as part
of business software's such as spreadsheet, database etc.

ORIGIN OF NATURAL LANGAUGE


NLP processors are so designed that even a 'Computer Illiterate ' users can use such
advanced machine. This in turn has stimulated academic research in subjects such as natural
language processing speech recognition, synthetic speech generation, vision system etc.
The common language COBOL and BASIC appear similar to spoken English. However
the user has to learn them and become familiar with rules and restrictions .To overcome this
drawback, both human user and computer can now initiate interaction with one another by asking
questions, setting goals or generally known as processing mined initiative. It is therefore even
more important that the dialogue between two should be in a language, which the user is already
familiar with natural language.

GRADUATE PROGRAM IN AI AND NLP


Artificial Intelligence is a major area of research and graduate study in the Department of
Computer and Information sciences at the University of Delaware. Current research interests
center on natural language generation and understanding, multi-agent systems, neural networks,
planning and plan recognition, user modeling, cooperative distributed problem solving, and
rehabilitation engineering .A cognitive science program brings in well known speakers from
throughout North America and serves as a forum for joint research efforts with faculty from the
department s of Educational studies, linguistics, and psychology close interaction with the center
for Applied Science and Engineering in Rehabilitation of the University of Delaware and the AI
dup ont Institute provides opportunities for research in NLP ,speech synthesis ,and robotics.

ASPECTS OF NATURAL LANGAUGE PROCESSING


Processing English words and sentences typed onto the computer through the
keyboard .The response of the machines also in plain English appear on the VDU or through a
printer.
The machine can recognize words and sentences, which are spoken by the user. Speech
synthesizers can be used for the machine to reply or to warn the ser about a mistake he is about
to make.
Another aspect is computer vision where the machine understands its environment with the help
of tactile sensors or TV cameras.

NATURAL LANGAUGE UNDERSTANDING


Natural Language Understanding is a study of the different processes by which a
computer tries to comprehend the instructions given in ordinary English. In building expert
system, scientist are attempting to have machines perform intelligent activities at a higher level
than most people, after all, few humans are expert. On the other hand, in trying to create
programs that allow computers to understand NL, scientists are trying to tech computers to
understand to emulate a skill that nearly all of us perform without any trouble. Although expert
systems already have been built that perform at the level of human experts, computers still
cannot understand natural language as well as a typical four-year-old child compared to people,
computers require a great deal of precision in communication.
Computer does not have linguistic flexibility for e.g. a common "operating system" called
MS-DOS requires the command like: "c:\>diskcopy A: B:" to instruct a computer to copy all files
from one diskette A to diskette B. But if we type "please copy all the files on diskette A to diskette
B" Computer does not understand what you are trying to tell it to do. The computer only accepts
instructions that are entered in the form, which it has been programmed to understand. (If we
misspell a word, misplace a colon, or omit an asterisk, the computer cannot execute your
instruction properly).
The goal of Natural Language understanding is to allow computers to understand people
understanding allows an appropriate action to be performed. But it does not mean to have
computers understand everything we say, after all even people misunderstood each other
occasionally. Let consider an example, if you want someone to bring you something to drink, you
might say any sentence as
o Please get me something to drink.
o Bring me a drink.
o Got anything to drink
o I'm thirsty –can you bring me something?
So, if someone brings you something to drink, then we can claim, you had been understood i.e.
performing an appropriate action. Similarly, performing appropriate action by computer when you
give an instruction, then the computer can be said to be have understood your instruction. But if
computer does another action, then computer has not understood you.
Natural Language Understanding is one of the hardest problems of AI, due to the
complexity, irregularity and diversity of human language and the philosophical problems of
meanings.
Why it is so easy for people to understand each other, yet so difficult for computers to
understand people? Let explore some of the characteristics of natural language that seem to
cause no problems for people but which create great difficulties for computers.

PROBLEMS OF NATURAL LANGAUGE


Natural language system are developed both to explore general linguistic theories and to
produce natural language interfaces or font end to application systems .In the discussion here
well generally assume that there is some application system that the user is interacting with, and
that it is the job of the understanding system to interpret the user's utterances and "translate"
them into a suitable form for the application. We'll also assume now that the natural language in
question is English, though of course in general it might be any other language.
In general the user may communicate with the system by speaking or by. Understanding
typed language .Our input is just the raw speech signals (typing different point in time). Before we
can get to work on what the speech means we must work out from the frequency spectrogram
what words are being spoken. This is very difficult to do in general .For a start different speakers
all have different voices and accents and individual speakers may articulate differently on
different occasion-so there's no simple mapping from speech waveform to word. There may also
be background noise, so we have to separate the signals resulting from wind whistling in the
trees from the signals resulting from Fred saying "Hello".
Even if the speech is very clear, it may be hand to work out what words are spoken.
There may be many different way of splitting up a sentence into words. In fluent speech there are
generally virtually no pauses between words and the understanding system must guess where
words breaks are .As an example of this, consider the sentence "how to recognize speech" .If
spoken quickly this might be misread as saying "how to wreck a nice beach". And even if we get
the word breaks right we may still not know what words are spoken –some words sound similar
(e.g. bear and bare) and it may be impossible to tell which was meant without thinking about the
meaning of the sentence. Because of all these problems a speech understanding system may
come up with a number of different alternative sequence of words, perhaps ranked according to
the likelihood .Any ambiguities as to what the words in the sentence are (e.g. bear and bare) will
be resolved when the system starts trying to work out the meaning of the sentence.
Whether we start with speech signals or typed input, at some stage we'll have a list of
words and will have to work out what they mean. There are three main stages to this analysis.

a) Syntactic Analysis
Where we use grammatical rules describing the legal structure of the language to
obtain one or more parser of the sentences.

b) Semantic Analysis
Where we try and obtain an initial representation of the meaning of the
sentences, given the possible parsers or extracting the meaning is known as semantic
analysis.

c) Pragmatic Analysis
Where we use additional contextual information to fill in gaps in the meaning
representation and to work out what the speaker was really getting at.

Semantics and Pragmatics

Semantics and Pragmatics


Semantics and Pragmatics are concerned with getting at the meaning of a sentence .In
the first stage (semantics) a partial representation of the meaning is obtained based on the
possible syntactic structures of the sentence, and on the meaning of the words in that sentence.
In the second stage, the meaning is elaborated based on contextual and knowledge to illustrate
the difference between these stages, consider the sentence.
 He is asked for boss.

From knowledge of the meaning of the words and the structure of the sentence we can
work out that someone (who is male) asked for someone who is a boss. But we can't say who
these people are and why the first guy wanted the second. If we know something about the
context (including the last few sentences spoken \ written .we may be able to work things out.
Maybe the last sentence was "Fred had just been sacked ", we know from knowledge that boss
generally sack people and if people want to speak to people who sack them it is generally to
complain about it .We could then really start to get at the meaning of the sentence –Fred wants to
complain to his boss about getting sacked.

Anyway, this second stage of getting at the real contextual is referred to as Pragmatics.
The first stage based on the meanings of the words and the structure of the sentence –is
semantics.

Semantic:
Determining syntax only provides a frame work for understanding." Producing a
syntactic parse of a sentence is only the first step toward understanding that sentence", note
Elaine rich. "At some point", she adds, "A semantic interpretation of the sentence must be
produced ". A semantic analysis is one that interprets a sentence according to meaning rather
than form.
Some methods of semantic analysis make use of various types of grammars, which are
formal systems of rules that attempt to describe the ways that sentences can be constructed. A
semantic grammar for example, applies knowledge about classifications of concepts in a specific
domain to the interpretation of a sentence in order to parse a sentence according to its meaning.
One system of semantic analysis is conceptual dependency developed by Roger
Schank around 1970; this system attempts to classify situations in terms of a limited number of
"primitive" (elemental) concepts. Conceptual dependency provides useful representation for
conceptually equivalent sentences such as "John sold Mary a book" and "Mary brought a book
from John". Schank uses conceptual dependency in conjunction with his system of scripts to
determine meaning from an understanding of plans and goals.

Pragmatic
It is the last stage of analysis, where the meaning is elaborated based on contextual
and world knowledge. Contextual knowledge includes knowledge of the previous sentences
(spoken or written), general knowledge about the world, and knowledge of the speaker.
Possibly the most difficult task facing researchers in understanding natural language is
pragmatics, the study of what people really meant? If we ask," why didn't the company show a
profit last month?" the answer, "Because expenses were higher than income "is not acceptable;
what we probably mean is something like "what mistake did the company make to cause it to lose
money?"

AUTOMATIC SPEECH RECOGNITION


Automatic speech recognition means to understand the production and perception of
speech sounds in human beings and then to stimulate the process using electronic systems.
Scientists of NLP have designed machines called sonographs, which can analyze speech signals
.A careful study of the machine's output called a 'sonogram'; helps to recognize the spoken words
and even extract their meaning. Speaking into machines would appear to be the most natural and
efficient means of man-machine communication?
Automatic speech recognition systems are electronic devices, which receive the spoken
words from human speakers and produce an output, which consists of coded digital signals ready
to be fed into an application unit. These signals corresponds to what the Automatic speech
recognizer (ASR) has recognized of the speech message .The application units takes appropriate
action on receiving these signals from the ASR. They can be used to initiate a specific machine
activity. For example, a robot may be given oral instructions to walk ahead, to turn and so on.
Linguistics studies of speech sounds suggest that there are basic discrete sound
segments in any language and these are strong together to form words while speaking. These
segments are known as 'phonemes'. Most spoken languages possess, on an average, about 40
phonemes. Each one of these phonemes can be characterized by a set of unique properties.
However, recognition of phonemes by computer systems is yet not a successful
proposition .The contextual influence of continuous speech on individual phonemes are so
considerable that it is difficult to get machines to recognize them with certainty. There are
enormous variations in the styles of speaking by men and women. A word spoken by the same
speaker on different occasions has minor variations .The same word spoken by different
speakers has variations depending on individual accent or dialect. In the former case, the
variations arise on account of context and position of phoneme within a word .In the later case, it
is on account of factors such as the size of speaking .The development of complete speaker
independent speech recognition system will require an even more comprehensive understanding
of the sources of variability in the speech signals.

While speech signals carry linguistic information regarding the message to be conveyed,
they also possess extra linguistic information about aspects such as the speaker's identity, dialed,
his psychological and physiological states and the prevailing environmental conditions such as
noise, room acoustics etc. In order to develop high-grade speech recognition systems one has to
learn how to extract the message bearing components from the signal while discarding the rest.
Understanding, considerable fundamental research in speech science is needed before
automatic speech recognition systems can approach anywhere near human performance.

SYNTAX
The stage of syntactic analysis is the best understood stage of NLP. Syntax helps us
understand how words are grouped together to make complex sentences and gives us a starting
point for working out the meaning of the whole sentences. For example, consider the following
two sentences:
1) The dog ate the bone.
2) The bone was eaten by dog.

The rules of syntax help us work out it’s the bone that gets eaten and not the dog. A
simple rule like it’s the second noun that gets eaten just wont work.

REPRESENTATION IN FIGURE
The dog ate the bone.
Correct meaning goes like this:
The bone was eaten by dog.

In other cases there may be many possible grouping of words .For example:
1) (a) John saw (Mary with telescope) i.e. Mary has telescope
(b) John (saw mary with telescope) i.e. John saw her with telescope.
PICTORIAL REPRESENTATION

JOHN Telescope
Mary

Fig: John see Mary who has telescope

Fig: John see Mary by the help of telescope

Anyway, rules of syntax specify the possible organizations of words in sentences. They
are normally specified by writing a grammar for the language .Of course, just having a grammar
isn't enough to analyze the sentence .we need a parser to use the grammar to analyze the
sentence .The parser should return possible parse trees for the sentence. The next section will
describe how simple grammar and parsers may be written

Thus we can study it by dividing it into four parts: -


1. Writing a grammar
2. Parser
3. Returning the parse tree
4.Multiple Parses

WRITING A GRAMMAR
A natural language grammar specifies allowable sentences structures in terms of basic
syntactic categories such as nouns and verbs and allows us to determine the structure of the
sentences. It is defined in a similar way to grammar for a programming language, though tends to
be more complex, and the complexity of natural language a given grammar is unlikely to cover all
possible syntactically acceptable sentences.

NOTE:
In natural language we don't usually parse language in order to check that it is correct
.We parse it in order to determine the structure and help work out the meaning. But most
grammars are first concerned with structure of 'correct ' English, as it gets much more complex to
parse if you allow bad English.

PARSING / PARSERS
Having a grammar isn’t enough to parse natural language we need a parser. The parser
should search for possible ways the rules of the grammar can be used to parse sentences. So
parsing can be viewed, as a kind of search problem .In general there may be many different rules
that can be used to 'expand' or rewrite a given syntactic category, and the parser must check
through them all, to see if the sentence can be passed using them.
The core of NLP is 'parser'. It reads each sentence and then proceeds to analyze it. This
process can be divided into three tasks i.e. dividing the signals into its acoustic, phonetic,
morphological syntactic and semantic aspects.

 The first part comes into operation only if the input consists of spoken words. i.e.
pragmatic interpretation takes into account the fact that the same words can have different
meaning in different situations .
 The second establishes the syntactic form (grammatical arrangement) of the
sentences.
 The third tries to extract meaning from these analyzed patterns .It takes into
account the common usages of words in language i.e. semantic interpretation is the
process of extracting the meaning of an utterances as expression in some representation
language.

This entire process is known as 'syntactic analysis' or parsing.

Parsing had been investigated by linguistics quite independently of and prior to the AI
scientists. They developed tree diagrams for parsing sentences .So parsing is the process of
building a parser tree for an input string. i.e. John ate a banana ,'John ' constitutes a proper noun
and is the subject and 'ate a banana ' constitutes the predicate . The linguistics would parse the
sentence with the help of a tree diagram.

ATE BANANA ATE BANANA


JOHN A JOHN The

Subject Verb Predicate Noun Subject Verb Determiner Noun


Determiner

Returning the parse tree:


For a parse to be useful we often want to return the parse tree, once we have parsed the
sentences. This may then be useful by a semantic component, to help determine the meaning of
the sentence. [We sometimes do the semantic processing at the same time as the parsing,
making this unnecessary. Both approaches are sometimes used.]
S

N P V P ( [ ] )

P r o n o u n V P ( [ N P ] ) N P

Y o u V P ( [ N P , N P N] ) P A r t ic a l N o u n

V e r b ( [ N P , NP Pr o ] n) o u n t h e g o ld

G iv e M e

Fig: Parse tree for "you give me the gold'


showing sub-categorization of the verb and
verb phrase

Multiple Parses
In general, as discussed earlier, there may be many different parses for a complex
sentences, as the grammar rules and dictionary allow the same list of word to be parsed in
several different ways. A commonly cited example is the pair of sentences:
 Time flies like an arrow
 Fruit flies like a banana.
"Files" can be either a verb and a noun, while "likes" can be either a verb or a preposition (or
whatever). So, in the first sentence "time" should be the noun phrase and "flies like an arrow" to
be the verb phrase (with "like an arrow" modifying files). In the second sentence "fruit flies" should
be the noun phrase and "like a banana" to be the verb phrase. Now, we know that there is no
such thing as a "time fly" and it would be a bit strange to "fly like a banana". But without such
general knowledge about word meaning we wouldn't tell which parse is correct, so a parser, with
no semantic component, should return both parses, and leave it up to the semantic stage of
analysis to throw out the bogus one.

Generation
In these lectures we have discussed natural language understanding. However, we
should be aware also of the problems in generating natural language. It we have something we
want to express (e.g. eats (john, chocolate)), or some goal we want to achieve (e.g. get Fred to
close the door), then there are many ways we can achieve that through language.
He eats chocolate.
It's chocolate that John eats.
John eats chocolate.
Chocolate is eaten by John.
Close the door.
It's cold in here.
Can you close the door.

A new generation system must be able to choose appropriately from among the different possible
constructions, based in knowledge of the content. If a complex text is to be written, it must further
know how to make that text coherent.
Anyway, that's enough on Natural language. The main point to understand are roughly
what happens at each stage of analysis (for language understanding), what the problems are and
why, and how to write simple grammars in prolog's DCG formalism.

Overview
The goal of the NLP group is to design and build software that will analyze, understand,
and generate languages that humans use naturally, so that eventually you will be able to address
your computer as though you were addressing another person.
The goal is not easy to reach. "Understanding" language, means, among other things,
knowing how to link those concepts together in a meaningful way. It's ironic that natural language,
the symbol system that is easiest for humans to learn and use, is hardest for a computer to
master. Long after machines have proven capable of inverting large matrices with speed and
grace, they still fail to master the basis of our spoken and written languages.
The challenges we face stem from the highly ambiguous natural language. As an English
speaker you effortlessly understand a sentences like "flying planes can be dangerous." Yet this
sentence presents difficulties to a software program that lacks both your knowledge of the world
and your expression experience with linguistic structures. Is the more plausible interpretation that
the pilot is at risk, or that the danger is to the people on the ground? Should "can" be analyzed as
a verb or as a noun? Which of the many possible meanings of "plane" could refer to, among other
things, an airplane, a geometric object, or a woodworking tool. How much and what sort of
content needs to be brought to bear on these questions in order to adequately disambiguate the
sentence?
We address these problems using a mix of knowledge engineered and
statistical/machine-learning techniques to disambiguate and respond to natural language input.
Our work has implications for applications like text critiquing, information retrieval, question
answering, summarization, gaming, and translation.

Discussion And Conclusion


The title " Natural Language Processing" is so vast. So we don't get enough information
about it. We have gone through the surface of the topic. The summary of the report is that it is
developed, as the communication with computer is difficult in the computer language to that in
order to avoid that drawback the concept of NLP come into existence.
NLP is the branch of AI which deals with making the computer intelligent artificially NLP
has advanced more rapidly in the past decade because of its simplicity in communication
between computer and user. NLP have become more integrated and applicable. Thus, from this
we have come to the conclusion that NLP has originated and is in the cause of development
because of its advantage that computer understand the human language so that the people of
every part of the world can use computer more easier than before.

REFERENCES
1) Yahoo.com
2) Google.com
3) Artificial Intelligence:
4) -

You might also like