0% found this document useful (0 votes)
53 views

Language Processing System

This document summarizes a research paper on developing a rule-based framework using natural language processing to automatically generate UML class diagrams from written requirements. The framework analyzes user-provided text to identify objects, attributes, and methods to model object-oriented application design. This allows generating diagrams faster than traditional CASE tools while reducing the time analysts spend understanding requirements. The framework incorporates lexical analysis, disambiguation, and a rule-based system to extract modeling elements from text and represent object-oriented designs.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Language Processing System

This document summarizes a research paper on developing a rule-based framework using natural language processing to automatically generate UML class diagrams from written requirements. The framework analyzes user-provided text to identify objects, attributes, and methods to model object-oriented application design. This allows generating diagrams faster than traditional CASE tools while reducing the time analysts spend understanding requirements. The framework incorporates lexical analysis, disambiguation, and a rule-based system to extract modeling elements from text and represent object-oriented designs.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Proceedings of 4th International Conference on Computer Applications 2006 Rangoon, Myanmar

23-24 February, 2006

Speech Language Processing Interface for Object-Oriented


Application Design using a Rule-based Framework

Imran S. Bajwa M. Asif Naeem Riaz-Ul-Amin Dr M Abbas Choudhary


Balochistan University of Balochistan University of Balochistan University of Balochistan University of
Information Technology and Information Technology and Information Technology and Information Technology and
Management Sciences Management Sciences Management Sciences Management Sciences
Quetta, Pakistan Quetta, Pakistan Quetta, Pakistan Quetta, Pakistan
[email protected] [email protected] [email protected] [email protected]

Abstract [3, 5]. Maintainability basically helps to easy analysis,


less complexity in system design, and easier verification
The looks and styles of software engineering have been by the user. On the other hand, reusability saves time and
completely changed from last two decades. So many costs as the modules used once are reused to increase
modern software paradigms have been introduced called productivity [4].
object-oriented design and modeling techniques. Object-
oriented design is concerned with developing an object- At the present, every phase of software engineering
oriented model of a software system to implement the follows the rules of Object Oriented design patterns.
identified user requirements. Unified Modeling Various famous object-oriented design, analysis and
Language is one of the modern object-oriented modeling modeling paradigms are being used from last few decades
and mapping tool. Class diagram can be drawn if the [8]. Some of important object-oriented design paradigms
objects and there concerning attributes and methods are are Booch, Sintropy, OORAM, UML, etc [3]. Unified
available. This paper presents a natural language Modeling Language (UML) is the modern and most
processing based automated system for identifying the famous modeling tools that are used to model the user
objects and their associated attributes and methods. This requirements. So many Computer Aided Software
extracted knowledge can be used to draw further UML Engineering (CASE) tools are used for unified modeling
based object-oriented application design. User writes as Rational Rose, Visio, Design, Smart Draw, Ervin, etc
the requirements in simple English in a few paragraphs [3]. Rational Rose is one of the famous CASE Tools
and the designed system has conspicuous ability to Nevertheless, these software are very efficient in terms of
analyze the given script. The designed system provides a producing final output.
quick and reliable way to generate UML diagrams to These modern CASE tools are rationally good software
save the time and budget of both the user and system but they have many drawbacks and downsides. To draw
analyst. UML diagrams using these case tools is a highly tough
Keywords: Automatic diagrams generation, Knowledge job due to the complicated and uncomfortable to perform.
engineering, Natural Language Processing, Object Traditionally, the system analyst has to do a lot of work
Oriented design. for deducing the business logic and understanding the
user requirements before drawing the UML diagrams by
using orthodox CASE tools [5]. Hence, there is wastage
1. Introduction of so much time due to the dull nature of the available
In the current times, many changes have been evolved in CASE tools for the required scenario. In today’s world
the various phases of design and development of a everybody needs a quick and reliable service. So it was
software application. The looks and styles of software needed that there should be some sort of intelligent
engineering have been completely changed. Some modern software for generating automated object–oriented design
software paradigms have been introduced as object- to save time and budget of both the user and system
oriented design and modeling techniques. Object-oriented analyst.
design is concerned with developing an object-oriented
In order to resolve all such issues, a framework is
model of a software system to implement the identified
required which facilitates both users and software
requirements. Object oriented design can yield various
engineers. The major goal and objective to design this
benefits and paybacks as maintainability and reusability
new system is that the consumed time to explore all the
Proceedings of 4th International Conference on Computer Applications 2006 Rangoon, Myanmar
23-24 February, 2006

facilities and services should be quite less than a minute. languages cause various flavours as English language has
User should also be able to produce output without having more than half dozen renowned flavours all over the
the expertise and training. world. These flavours have different accents, set of
vocabularies and phonological aspects. These ominous
2. Problem Statement and menacing discrepancies and inconsistencies in natural
languages make it a difficult task to process them as
This research was primarily conducted to address the compared to the formal languages [13].
problem of time consumption and difficulty in use. This
research basically related to the software analysis and Speech Language Processing (SLP) technologies help a
design phase of the software development process. Few computer machine to read and understand the main
years ago data flow diagram’s were being used to language-oriented concepts. Usually the input is given to
symbolize the flow of data and represent the user’s the computer through plain text or in question answer
requirements. But in current age, unified modeling (Q&A) format. Computers can effortlessly analyze,
language is used to model and map the user requirements, understand and generate human speech language contents.
which is more comprehensive e and authentic way to of Users can also communicate and correspond with the
representation and it is beneficial for the later stages of computer in German, English or another human language
software development. To find out information for object- using SLP interface. Communication with computers
oriented design is real time consuming job. The system using written or spoken language will have a sound
analyst has to consume to much time in finding the object impact upon the work environment [17]. Speech
related to a business domain, and then further specifying languages can effectively improve the usability of
the attributes and methods related to those objects. computer application based on natural language interface.

It was needed that any individual person involved In the process of analyzing and understanding the natural
obligatory in software development may get his required languages, various problems are usually faced by the
output with accuracy in minimum time consumed. researchers. The problems connected to the greater
complexity of the natural language are verb’s conjugation,
inflexion, lexical amplitude, problem of ambiguity, etc
3. Solution Statement
[11]. From this set of problems the one which causes
A Speech Language Processing Interface has been used in more difficulties is problem of ambiguity. Ambiguity
this research that is based on a Rule-based Framework for could be easily solved at the syntax and semantic level by
Object-Oriented Application Design (OOAD). Current using a sound and robust rule-based system.
designed system incorporates the capability of finding the
information form the user given text and automatically 4.1. Lexical Assessment:
finds the objects and their attributes and methods. After To understand and analyses contents of a speech language
mapping user requirements UML diagrams can also be first phase is the lexical assessment. Lexical phase
drawn as Class Diagram. An Integrated Development typically generates the valid tokens or symbols or
Environment would also be provided for User Interaction lexicons [11]. These lexical tokens are further used to
and efficient Input and output. The functionality of the process provided speech language context. As in the
conducted research was domain specific but it can be following example,
enhanced easily in the future according to the
requirements. “Ali is typing a letter.”
There are five possible lexicons in the given string as
4. Speech Language Processing ‘Ali’, ‘is’, ‘typing’, ‘a’, ‘letter’ and ‘.’. Full stop and other
symbols in the text are also considered as lexicons as they
The speech languages are also named as "Natural are also used in forthcoming phases for understanding the
Languages" [6]. The understanding and multi-facet semantics of the sentence.
processing of the natural languages is in fact one of the
arguments of greater interest in the field artificial 4.2. Syntactic Assessment:
intelligence field [10]. Typically, the natural languages
Syntax analysis is performed on word level to recognize
are irregular and asymmetrical. The natural languages are
the word category [12]. First of all the available lexicons
based on un-formal grammars. There are so many
are categorized into nouns, pronouns, prepositions,
geographical, psychological and sociological factors are
adverbs, articles, conjunctions, etc. The syntactic analysis
involved which ultimately influence the behaviours of
of the programs would have to be in a position to isolate
natural languages [17]. There are undefined set of words
subject, verbs, objects, adverbs, adjectives and various
and they also change and vary area to area and time to
time. These variations and inconsistencies in the natural
Proceedings of 4th International Conference on Computer Applications 2006 Rangoon, Myanmar
23-24 February, 2006

other complements. It is little complex and multipart are Ada, java, Python, Beta, Agora, Eiffel, Blue, C++, etc.
procedure. The major emphasis of an object oriented design is to
describe these associated operations on objects. As
"Ahmed is using computer in the office."
compared to structured analysis and design, object-
For this example, following is the output. oriented analysis and design are closer to each other in the
object-oriented approach [13]. This issue of object
oriented design causes homogeneity in the terminologies
Lexicons Phase-I Phase –II and graphical representations in the form of defined
symbols during analysis and the early stages of design.
Ahmed Noun Subject
is Helping-Verb ------- Conventional object oriented design paradigms also need
using Verb Verb the specification of various concepts often not represented
computer Noun Object in analysis, such as the types of the attributes of a class, or
in Preposition ------- the logic of its methods [15]. Design can be thought of in
the Article ------- two phases; high level design and low level design. High-
office Noun Adverb level design is the first phase, which logically decomposes
the target system into large and complex objects [16]. The
This is the final output of lexical assessment phase. second phase is called low-level design and this phase
4.3. Semantic Assessment: principally defines the respective attributes and methods
for each and individual object [16].
Semantic assessment actually understands the meanings
and implications of the given strings. To analyze a phrase 6. Used Methodology
from the semantic point of view particular meanings are
given to the sentence [21]. Semantic ambiguities are most Conventional natural language processing based systems
common due to the fact that generally a computer is not in use rule based systems. Agents are another way to
a position to distinguish the logical situations. develop speech language based systems [19]. In the
research, a rule-based algorithm has been designed and
"The student is writing a letter with a pen." used which has robust ability to read, understand and
The designed framework should analyze this sentence as extract the desired information. First of all, basic elements
"The student is using a pen to write a letter". This of the language grammar are extracted as verbs, nouns,
sentence should not mean as "the student is writing a pen adjectives, etc then on the basis of this extracted
wit letter". This complex problem has been addressed in information further processing is performed. In linguistic
section 6.0 in detail. terms, verbs often specify actions, and noun phrases the
objects that participate in the action [16]. Each noun
4.4. Pragmatic Assessment: phrase's then role specifies how the object participates in
Pragmatic assessment happens when the communication the action. As in the following example:
is taking place between two persons who do not share the "Asif hit a ball with a racket."
same context [8]. Pragmatics of a sentence are understood
and analyzed on the basis of the previous context. A procedure that understands such a sentence must
Preceding context helps to understand that what the actual discover that Role is the agent because he performs the
meanings of the sentence are. As following example: action of hitting, that the ball as the thematic object
because it is the object hit, and that the racket is an
"I will arrive to the airport at 10 o'clock." instrument because it is the tool with which hitting is done
In this example, if the subject person belongs to a [7]. Thus, complete sentence analysis finds information
different continent, the meanings can be totally changed. about the agent, co-agent, thematic object, beneficiary, etc
[9]. The identification of such information specifically
helps to understand the meanings of the input sentence. A
5. Object-Oriented Design brief detail of all these structures is given below.
Object oriented design helps to design and develop the
Agent: The agent causes the action to occur as in
products during Object-Oriented Analysis (OOA) [6, 11].
"Ahmed hit the ball," Ahmed is agent who performs
Object oriented analysis simply purifies the candidate
the task. But in this example a passive sentence, the
classes and there associated objects, defines related data
agent also may appear as "The ball was hit by
structures and procedures, and maps these sets of
Ahmed.''
information into an object-oriented programming
language (OOPL) [8]. Famous object oriented languages
Proceedings of 4th International Conference on Computer Applications 2006 Rangoon, Myanmar
23-24 February, 2006

Co-agent: If agent is working with any other partner


that is called co-agent. Both of them carry out the
Fig 1: Architecture of the designed system
action together as "Ahmed played tennis with Ali."
Beneficiary: The beneficiary is the person for whom Following is the brief detail of the various modules of the
an action has bee performed: "Ahmed brought the designed system.
balls for Ali." In this sentence Ali is beneficiary.
7.1. Input text acquisition
Thematic object: The thematic object is the object the This is the first and input module that helps to acquire
sentence is really all about— typically the object, input text scenario. User provides the business scenario in
undergoing a change. Often the thematic object is the from of paragraphs of the text in English Language. This
same as the syntactic direct object, as "Ahmed hit the module reads the input text in the form of characters and
ball." Here the ball is thematic object. concatenates them to generate the words called lexicons
Conveyance: The conveyance is something in which or tokens. These tokens are the base of syntax analysis
or on which agent travels: 'Ahmed goes by train." and semantic analysis accomplished in the next phases.
This module is the implementation of the lexical phase.
Trajectory: Motion from source to destination takes Lexicons and tokens are output of this module.
place over a trajectory. ID contrast to the other role
possibilities, several prepositions can serve to 7.2. Text understanding
introduce trajectory noun phrases: "Ahmed and Ali This module reads the input from the first module in the
went to London from Islamabad" form of words or lexicons. These words are categorized
Location: The location is where an action occurs. into various classes as verbs, helping verbs, nouns,
Several prepositions are possible, which conveys pronouns, adjectives, prepositions, conjunctions, etc.
means in addition to serving as a signal that a These categories are further classified into subject, object,
location noun phrase is "Ali studied in the library, at verb, adverb, etc.
a desk, by the wall, a picture, near the door."
7.3. OOD Information
Time: Time specifies when an action occurs. All the nouns and pronouns are referred as objects and
Prepositions such at, before and after introduce noun classes. Further the attributes and methods of each object
phrases serving as time role fill "Ahmed and Ali left are defined as all adjectives are marked attributes and all
before Evening." associated verbs are marked as methods of that specific
Duration: Duration specifies how long an action takes. object. This knowledge is extracted on the basses of the
Preposition such as since and for indicate duration. input provided by the preceding module.
"Ahmed and Ali walked for an hour.” 7.4. Class diagram generation
The extracted information is first confirmed from the
7. Designed System Anatomy
user. User can change the extracted information. This is
The designed system, speech language processing the last module, which finally uses the extracted objects
interface for object-oriented application design using a and their related attributes and methods to draw class
rule-based framework performs both phases of object– diagrams. Small diagrams will be connected to construct a
oriented design in section 5.0. First objects and classes are complete class diagram.
defined then further their attributes and methods are
defined. After performing this, the designed system has 8. Conclusion
ability to draw class diagrams on the basis if this extracted
knowledge. This system draws diagrams in four modules: This research is all about the object oriented design of
Text input acquisition, text understanding, knowledge software applications by dynamic generation of the class
extraction, and finally identification of object oriented diagrams by reading and analyzing the given scenario in
design information and drawing class diagrams as shown English language provided by the user. The designed
in following fig. system can find out the classes and objects and their
attributes and operations using an artificial intelligence
technique such as natural language processing. Then the
Text Text
Text Acquisition Understanding
UML diagrams such as class diagram would be drawn.
The accuracy of the software is expected up to about 80%
with the involvement of the software engineer provided
Output
Class OOD that he has followed the pre-requisites of the software to
Diagram Information
prepare the input scenario. The given scenario should be
Proceedings of 4th International Conference on Computer Applications 2006 Rangoon, Myanmar
23-24 February, 2006

complete and written in simple and correct English. [8] R. Grishman. 2001. Adaptive information extraction and
Under the scope of our project, software will perform a sublanguage analysis. In Proc. of IJCAI 2001.
complete analysis of the scenario to find the classes, their [9] Fagan, J. L. (1989). The effectiveness of a non-syntactic
attributes and operations. It will also draw the following approach to automatic phrase indexing for document
diagrams. An elegant graphical user interface has also retrieval. Journal of the American Society for Information
been provided to the user for entering the Input scenario Science, 40(2), 115–132.
in a proper way and generating UML diagrams. [10] J. M. Zelle and R. J. Mooney, Learning semantic grammars
with constructive inductive logic programming, Proceedings
9. Future Work of the 11th National Conference on Arti_cial Intelligence
(AAAI Press/MIT Press, Washington, D.C., 1993), pp.
The intended system for automatic object-oriented design 817822.
of system applications was started with the aims that there
should be a software which can automatically design [11] Krovetz, R., & Croft, W. B. (1992). Lexical ambiguity and
information retrieval. ACM Transactions on Information
according to the object-oriented patterns by reading the Systems, 10, 115–141.
scenario given in English language by extracting the
objects and their concerning attributes and methods. [12] Losee, R. M. (1996a). Learning syntactic rules and tags
Current system can only design the class diagrams. In with genetic algorithms for information retrieval and
future plans UML diagrams such as activity diagram, filtering: An empirical basis for grammatical rules.
Information Processing and Management, 32(2), 185–197.
sequence diagram, use case diagram, component diagram,
deployment diagram, component diagram and deployment [13] Manning, C. D., & Schutze, H. (1999). Foundations of
diagram can also be drawn. There is also some margin of Statistical Natural Language Processing. MIT Press,
improvements in the algorithms for generating of class Cambridge, Mass.
diagrams. Current accuracy of generating diagrams is [14] Partee, B. H., Meulen, A. t., &Wall, R. E. (1990).
about 80% to 85%. It can be enhanced up to 95% by Mathematical Methods in Linguistics. Kluwer, Dordrecht,
improving the algorithms and inducing the ability of The Netherlands.
learning. [15] S. Weiss, C. Apte, F. Damerau, D. Johnson, F. Oles, T.
Goetz and T. Hampp, Maximizing text-mining performance,
10. References IEEE Intelligent Systems 14 (1999) 63ñ69.
[1] A. Popescu, O. Etzioni, and H. Kautz. 2003. Towards a [16] Strzalowski, T. (1995). Natural language information
theory of natural language interfaces to databases. In Proc. of retrieval. Information Processing and Management, 31(3),
IUI-2003. 397–417.
[2] Biber, D., Conrad, S., & Reppen, R. (1998). Corpus [17] Van Rijsbergen, C. (1977). A theoretical basis for use of
Linguistics: Investigating Language Structure and Use. co-occurrence data in information retrieval. Journal of
Cambridge Univ. Press, Cambridge, U.K. Documentation, 33(2), 106–119.
[3] M. Rickman, A Process for Combining Object Oriented and [18] C. A. Thompson, R. J. Mooney and L. R. Tang, Learning to
Structured Analysis and Design, [email protected] parse natural language database queries into logical form,
Workshop on Automata Induction, Grammatical Inference
[4] Dunsmore, M. Roper, and M. Wood, “Systematic Object- and Language Acquisition (1997).
Oriented Inspection – An Empirical Study”, appeared in
Proceedings of the 23rd International Conference on [19] Taffet, Mary D. (2001). GENTECH 2001 Scholarship
Software Engineering 2001, pp. 135- 144, May 2001. Proposal: Automatic Tagging of Genealogical Data to
Enhance Web-based Retrieval. Available at:
[5] Y. He and S. Young. 2003. A data-driven spoken language <http://web.syr.edu/~mdtaffet/GENTECH_Scholarship_Prop
understanding system. In IEEE Workshop on Automatic osal.htm>.
Speech Recognition and Understanding. [20] Nerbonne, John (2003): "Natural language processing in
computer-assisted language learning". In: Mitkov, R. (ed.):
[6] U. S. Reddy. Objects as closures: Abstract semantics of The Oxford Handbook of Computational Linguistics. Oxford:
object-oriented languages. In Proceedings of the ACM 670-698.
Conference on Lisp and Functional Programming, pages
289{297, 1988. [21] Menzel, Wolfgang/Schröder, Ingo (1998): "Error diagnosis
for language learning systems". Proceedings of NLP+IA'98 1:
[7] World Wide Web Consortium. 2001. Speech Synthesis 526-530.
Markup Language Specification for the Speech Interface
[22] Etchegoyhen, Thierry/ Wehrle, Thomas (1998): "Overview
Framework. http: //www.w3.org/TR/speech-synthesis.
of GBGen: A large-scale, domain independent syntactic
generator". In: Hovy, Eduard (ed.): Proceedings of the 9th
Proceedings of 4th International Conference on Computer Applications 2006 Rangoon, Myanmar
23-24 February, 2006

International Workshop on Natural Language Generation. 526-530. Anne Vandeventer Faltin: Natural Language Tools
Niagara-on-the-Lake: 288-291. for CALL 153
[23] Granger, Sylviane (2003): "Error-tagged Learner Corpora [25]Nerbonne, John (2003): "Natural language processing in
and CALL: A Promising Synergy". CALICO Journal 20/3: computer-assisted language learning". In: Mitkov, R. (ed.):
465-480. The Oxford Handbook of Computational Linguistics. Oxford:
670-698.
[24]Menzel, Wolfgang/Schröder, Ingo (1998): "Error diagnosis
for language learning systems". Proceedings of NLP+IA'98 1:

You might also like