Language Processing System
Language Processing System
facilities and services should be quite less than a minute. languages cause various flavours as English language has
User should also be able to produce output without having more than half dozen renowned flavours all over the
the expertise and training. world. These flavours have different accents, set of
vocabularies and phonological aspects. These ominous
2. Problem Statement and menacing discrepancies and inconsistencies in natural
languages make it a difficult task to process them as
This research was primarily conducted to address the compared to the formal languages [13].
problem of time consumption and difficulty in use. This
research basically related to the software analysis and Speech Language Processing (SLP) technologies help a
design phase of the software development process. Few computer machine to read and understand the main
years ago data flow diagram’s were being used to language-oriented concepts. Usually the input is given to
symbolize the flow of data and represent the user’s the computer through plain text or in question answer
requirements. But in current age, unified modeling (Q&A) format. Computers can effortlessly analyze,
language is used to model and map the user requirements, understand and generate human speech language contents.
which is more comprehensive e and authentic way to of Users can also communicate and correspond with the
representation and it is beneficial for the later stages of computer in German, English or another human language
software development. To find out information for object- using SLP interface. Communication with computers
oriented design is real time consuming job. The system using written or spoken language will have a sound
analyst has to consume to much time in finding the object impact upon the work environment [17]. Speech
related to a business domain, and then further specifying languages can effectively improve the usability of
the attributes and methods related to those objects. computer application based on natural language interface.
It was needed that any individual person involved In the process of analyzing and understanding the natural
obligatory in software development may get his required languages, various problems are usually faced by the
output with accuracy in minimum time consumed. researchers. The problems connected to the greater
complexity of the natural language are verb’s conjugation,
inflexion, lexical amplitude, problem of ambiguity, etc
3. Solution Statement
[11]. From this set of problems the one which causes
A Speech Language Processing Interface has been used in more difficulties is problem of ambiguity. Ambiguity
this research that is based on a Rule-based Framework for could be easily solved at the syntax and semantic level by
Object-Oriented Application Design (OOAD). Current using a sound and robust rule-based system.
designed system incorporates the capability of finding the
information form the user given text and automatically 4.1. Lexical Assessment:
finds the objects and their attributes and methods. After To understand and analyses contents of a speech language
mapping user requirements UML diagrams can also be first phase is the lexical assessment. Lexical phase
drawn as Class Diagram. An Integrated Development typically generates the valid tokens or symbols or
Environment would also be provided for User Interaction lexicons [11]. These lexical tokens are further used to
and efficient Input and output. The functionality of the process provided speech language context. As in the
conducted research was domain specific but it can be following example,
enhanced easily in the future according to the
requirements. “Ali is typing a letter.”
There are five possible lexicons in the given string as
4. Speech Language Processing ‘Ali’, ‘is’, ‘typing’, ‘a’, ‘letter’ and ‘.’. Full stop and other
symbols in the text are also considered as lexicons as they
The speech languages are also named as "Natural are also used in forthcoming phases for understanding the
Languages" [6]. The understanding and multi-facet semantics of the sentence.
processing of the natural languages is in fact one of the
arguments of greater interest in the field artificial 4.2. Syntactic Assessment:
intelligence field [10]. Typically, the natural languages
Syntax analysis is performed on word level to recognize
are irregular and asymmetrical. The natural languages are
the word category [12]. First of all the available lexicons
based on un-formal grammars. There are so many
are categorized into nouns, pronouns, prepositions,
geographical, psychological and sociological factors are
adverbs, articles, conjunctions, etc. The syntactic analysis
involved which ultimately influence the behaviours of
of the programs would have to be in a position to isolate
natural languages [17]. There are undefined set of words
subject, verbs, objects, adverbs, adjectives and various
and they also change and vary area to area and time to
time. These variations and inconsistencies in the natural
Proceedings of 4th International Conference on Computer Applications 2006 Rangoon, Myanmar
23-24 February, 2006
other complements. It is little complex and multipart are Ada, java, Python, Beta, Agora, Eiffel, Blue, C++, etc.
procedure. The major emphasis of an object oriented design is to
describe these associated operations on objects. As
"Ahmed is using computer in the office."
compared to structured analysis and design, object-
For this example, following is the output. oriented analysis and design are closer to each other in the
object-oriented approach [13]. This issue of object
oriented design causes homogeneity in the terminologies
Lexicons Phase-I Phase –II and graphical representations in the form of defined
symbols during analysis and the early stages of design.
Ahmed Noun Subject
is Helping-Verb ------- Conventional object oriented design paradigms also need
using Verb Verb the specification of various concepts often not represented
computer Noun Object in analysis, such as the types of the attributes of a class, or
in Preposition ------- the logic of its methods [15]. Design can be thought of in
the Article ------- two phases; high level design and low level design. High-
office Noun Adverb level design is the first phase, which logically decomposes
the target system into large and complex objects [16]. The
This is the final output of lexical assessment phase. second phase is called low-level design and this phase
4.3. Semantic Assessment: principally defines the respective attributes and methods
for each and individual object [16].
Semantic assessment actually understands the meanings
and implications of the given strings. To analyze a phrase 6. Used Methodology
from the semantic point of view particular meanings are
given to the sentence [21]. Semantic ambiguities are most Conventional natural language processing based systems
common due to the fact that generally a computer is not in use rule based systems. Agents are another way to
a position to distinguish the logical situations. develop speech language based systems [19]. In the
research, a rule-based algorithm has been designed and
"The student is writing a letter with a pen." used which has robust ability to read, understand and
The designed framework should analyze this sentence as extract the desired information. First of all, basic elements
"The student is using a pen to write a letter". This of the language grammar are extracted as verbs, nouns,
sentence should not mean as "the student is writing a pen adjectives, etc then on the basis of this extracted
wit letter". This complex problem has been addressed in information further processing is performed. In linguistic
section 6.0 in detail. terms, verbs often specify actions, and noun phrases the
objects that participate in the action [16]. Each noun
4.4. Pragmatic Assessment: phrase's then role specifies how the object participates in
Pragmatic assessment happens when the communication the action. As in the following example:
is taking place between two persons who do not share the "Asif hit a ball with a racket."
same context [8]. Pragmatics of a sentence are understood
and analyzed on the basis of the previous context. A procedure that understands such a sentence must
Preceding context helps to understand that what the actual discover that Role is the agent because he performs the
meanings of the sentence are. As following example: action of hitting, that the ball as the thematic object
because it is the object hit, and that the racket is an
"I will arrive to the airport at 10 o'clock." instrument because it is the tool with which hitting is done
In this example, if the subject person belongs to a [7]. Thus, complete sentence analysis finds information
different continent, the meanings can be totally changed. about the agent, co-agent, thematic object, beneficiary, etc
[9]. The identification of such information specifically
helps to understand the meanings of the input sentence. A
5. Object-Oriented Design brief detail of all these structures is given below.
Object oriented design helps to design and develop the
Agent: The agent causes the action to occur as in
products during Object-Oriented Analysis (OOA) [6, 11].
"Ahmed hit the ball," Ahmed is agent who performs
Object oriented analysis simply purifies the candidate
the task. But in this example a passive sentence, the
classes and there associated objects, defines related data
agent also may appear as "The ball was hit by
structures and procedures, and maps these sets of
Ahmed.''
information into an object-oriented programming
language (OOPL) [8]. Famous object oriented languages
Proceedings of 4th International Conference on Computer Applications 2006 Rangoon, Myanmar
23-24 February, 2006
complete and written in simple and correct English. [8] R. Grishman. 2001. Adaptive information extraction and
Under the scope of our project, software will perform a sublanguage analysis. In Proc. of IJCAI 2001.
complete analysis of the scenario to find the classes, their [9] Fagan, J. L. (1989). The effectiveness of a non-syntactic
attributes and operations. It will also draw the following approach to automatic phrase indexing for document
diagrams. An elegant graphical user interface has also retrieval. Journal of the American Society for Information
been provided to the user for entering the Input scenario Science, 40(2), 115–132.
in a proper way and generating UML diagrams. [10] J. M. Zelle and R. J. Mooney, Learning semantic grammars
with constructive inductive logic programming, Proceedings
9. Future Work of the 11th National Conference on Arti_cial Intelligence
(AAAI Press/MIT Press, Washington, D.C., 1993), pp.
The intended system for automatic object-oriented design 817822.
of system applications was started with the aims that there
should be a software which can automatically design [11] Krovetz, R., & Croft, W. B. (1992). Lexical ambiguity and
information retrieval. ACM Transactions on Information
according to the object-oriented patterns by reading the Systems, 10, 115–141.
scenario given in English language by extracting the
objects and their concerning attributes and methods. [12] Losee, R. M. (1996a). Learning syntactic rules and tags
Current system can only design the class diagrams. In with genetic algorithms for information retrieval and
future plans UML diagrams such as activity diagram, filtering: An empirical basis for grammatical rules.
Information Processing and Management, 32(2), 185–197.
sequence diagram, use case diagram, component diagram,
deployment diagram, component diagram and deployment [13] Manning, C. D., & Schutze, H. (1999). Foundations of
diagram can also be drawn. There is also some margin of Statistical Natural Language Processing. MIT Press,
improvements in the algorithms for generating of class Cambridge, Mass.
diagrams. Current accuracy of generating diagrams is [14] Partee, B. H., Meulen, A. t., &Wall, R. E. (1990).
about 80% to 85%. It can be enhanced up to 95% by Mathematical Methods in Linguistics. Kluwer, Dordrecht,
improving the algorithms and inducing the ability of The Netherlands.
learning. [15] S. Weiss, C. Apte, F. Damerau, D. Johnson, F. Oles, T.
Goetz and T. Hampp, Maximizing text-mining performance,
10. References IEEE Intelligent Systems 14 (1999) 63ñ69.
[1] A. Popescu, O. Etzioni, and H. Kautz. 2003. Towards a [16] Strzalowski, T. (1995). Natural language information
theory of natural language interfaces to databases. In Proc. of retrieval. Information Processing and Management, 31(3),
IUI-2003. 397–417.
[2] Biber, D., Conrad, S., & Reppen, R. (1998). Corpus [17] Van Rijsbergen, C. (1977). A theoretical basis for use of
Linguistics: Investigating Language Structure and Use. co-occurrence data in information retrieval. Journal of
Cambridge Univ. Press, Cambridge, U.K. Documentation, 33(2), 106–119.
[3] M. Rickman, A Process for Combining Object Oriented and [18] C. A. Thompson, R. J. Mooney and L. R. Tang, Learning to
Structured Analysis and Design, [email protected] parse natural language database queries into logical form,
Workshop on Automata Induction, Grammatical Inference
[4] Dunsmore, M. Roper, and M. Wood, “Systematic Object- and Language Acquisition (1997).
Oriented Inspection – An Empirical Study”, appeared in
Proceedings of the 23rd International Conference on [19] Taffet, Mary D. (2001). GENTECH 2001 Scholarship
Software Engineering 2001, pp. 135- 144, May 2001. Proposal: Automatic Tagging of Genealogical Data to
Enhance Web-based Retrieval. Available at:
[5] Y. He and S. Young. 2003. A data-driven spoken language <http://web.syr.edu/~mdtaffet/GENTECH_Scholarship_Prop
understanding system. In IEEE Workshop on Automatic osal.htm>.
Speech Recognition and Understanding. [20] Nerbonne, John (2003): "Natural language processing in
computer-assisted language learning". In: Mitkov, R. (ed.):
[6] U. S. Reddy. Objects as closures: Abstract semantics of The Oxford Handbook of Computational Linguistics. Oxford:
object-oriented languages. In Proceedings of the ACM 670-698.
Conference on Lisp and Functional Programming, pages
289{297, 1988. [21] Menzel, Wolfgang/Schröder, Ingo (1998): "Error diagnosis
for language learning systems". Proceedings of NLP+IA'98 1:
[7] World Wide Web Consortium. 2001. Speech Synthesis 526-530.
Markup Language Specification for the Speech Interface
[22] Etchegoyhen, Thierry/ Wehrle, Thomas (1998): "Overview
Framework. http: //www.w3.org/TR/speech-synthesis.
of GBGen: A large-scale, domain independent syntactic
generator". In: Hovy, Eduard (ed.): Proceedings of the 9th
Proceedings of 4th International Conference on Computer Applications 2006 Rangoon, Myanmar
23-24 February, 2006
International Workshop on Natural Language Generation. 526-530. Anne Vandeventer Faltin: Natural Language Tools
Niagara-on-the-Lake: 288-291. for CALL 153
[23] Granger, Sylviane (2003): "Error-tagged Learner Corpora [25]Nerbonne, John (2003): "Natural language processing in
and CALL: A Promising Synergy". CALICO Journal 20/3: computer-assisted language learning". In: Mitkov, R. (ed.):
465-480. The Oxford Handbook of Computational Linguistics. Oxford:
670-698.
[24]Menzel, Wolfgang/Schröder, Ingo (1998): "Error diagnosis
for language learning systems". Proceedings of NLP+IA'98 1: