Programming by Voice VocalProgramming
Programming by Voice VocalProgramming
net/publication/221652444
CITATIONS READS
34 4,339
3 authors, including:
John Goldthwaite
University of North Carolina at Asheville
12 PUBLICATIONS 676 CITATIONS
SEE PROFILE
All content following this page was uploaded by John Goldthwaite on 31 October 2014.
149
adopted these tools allowing them to work while John Edwards reports that the fundamental technology for
recovering from CTS. continuous voice recognition significantly improved in
Dictation with the current generation of recognition 1997 [4].
software is much more difficult for programming tasks. Like Srinivasan and Vergo [11], our design is based on a
Although keywords can be recognized, the majority of commercially available voice recognition development
text in a program is variable and procedure names that are environment. Their work focuses on making the
composites of words and abbreviations. These composites development environment better. We are focusing on
are not handled by recognition systems and must be making a programming environment that can be
spelled. Computer languages use punctuation symbols for controlled by voice alone.
equivalence, relational expressions, variable declaration Leopold and Amber [9] have developed a method of
and parameter handling in ways that are different “writing” code for a Visual Programming Language that
syntactically from natural language. As a result an entirely uses a combination of Voice, handwriting, and Vocal
different type of recognition system is needed for Programmer is significantly different than VoiceXML
programming. We propose to build a generator for voice [13]. VoiceXML is XML language for writing voice
recognition syntax-directed programming environments. enabled applications, not a general system for entering
The generated programming environments will take XML documents.
advantage of the formal grammar of programming
languages to simplify programming by voice. DESIGN
We have designed a generator for voice recognition
The potential for this technology is not limited to
syntax-directed programming environments.
programming languages. Extensible Markup Language
(XML), the emerging standard for Web page definition This generator, VocalGenerator, takes as input a context-
and authoring, is based on document type definitions free grammar (CFG) for a programming language, such as
(DTD) or XML Schemas. An XML DTD or Schema is a Basic, C, C++, Java, or an XML DTD or XML Schema,
grammatical definition of the contents of a class of XML together with a voice vocabulary for that language. The
documents containing a given, structure and voice vocabulary includes the literals from the
representation. A particular XML DTD or XML Schema programming language; the names of classes and
is a language that defines how well-formed XML functions from the class and function libraries available to
documents of that type are written. We can use our voice the programmer; and a list of class, function, and variable
generator technology to create a voice recognition syntax- names specific to the program file. These are associated
directed environment for any XML DTD or XML with a list of pronounceable words and phrases that the
Schema. programmer would use to enter and edit a program in the
language.
A major motivation for the design given here is to bundle
the system for programming with popular commercially VocalGenerator generates as output a programming
available voice recognition software. We expect this environment in which the programmer can write programs
popular commercially available voice recognition engine by voice input alone. This programming environment
recognition rates to improve over time. The improvements includes a voice recognition syntax-directed program
will be integrated into the voice recognition system for editor. This voice recognition syntax-directed program
programming without significant programming efforts. editor aids the programmer by providing automatic
completion of program text and appropriate navigation
relative to the specific programming language. A voice
RELATED RESEARCH recognition syntax-directed program editor can also be
The related research falls in two groups, research in used to edit programs that have already been developed. It
programming languages and compilers that is fundamental analyzes a given program and generate the vocabulary
to our work and voice recognition to support computer used in the specific program.
programming. Because XML DTDs and XML Schemas We refer to the generated voice recognition syntax-
can also be considered as specialized programming directed programming environments with names, such as
language specification systems, these are also related. VocalBasic, VocalC, VocalC++, or VocalJava.
Among the research results fundamental to the work Once VocalGenerator has been built it can be used several
proposed here are formal grammars, in particular context- times to generate programming environments that support
free languages [7]; compiler-compilers, like YACC [8]; voice recognition syntax-directed programming in specific
syntax-directed editor generators, like the Cornell programming languages. Relatively little effort is required
Program Synthesizer [12]; grammars as a useful to prepare the input grammar and voice vocabulary for
representation for data structures [5], [6]. each additional programming language.
150
A voice recognition syntax-directed program editor makes begin
it easier to write a program by voice than a standard text program
editor. Standard voice recognition text editors are not move down
effective in entering and editing computer programs. BEGIN statement-seq
Stephen C. Arnold (an author of this paper) has personally statement-seq
tried programming in a voice recognition editor intended END;
for standard text input after developing bilateral carpal statement
tunnel syndrome and was not able to write even small BEGIN
programs. statement;
statement-seq
Syntax-directed editors provide two major capabilities
that standard text editors do not: navigation and selection END;
151
Step 6: The cursor is pointing at the non-terminal, CENSUS
<condition>. The command, "k>5", enters that condition
RESIDENCE
into the program.
PERSON
Step 7: The cursor is pointing at the non-terminal,
<condition>, just entered. The commands, "move down, NAME FIRST-NAME LAST-NAME
statement", moves the cursor to the first non-terminal,
<statement>, and prepares ifor the entry of a statement. SEX MALE FEMALE
Step 8: The cursor is pointing at the non-terminal,
<statement>. The commands, "assignment, x:=5", enter
that assignment statement into the program. more?
Step 9: The cursor is pointing at the assignment statement
just entered. ADDRESS
Applications extend beyond computer programming to
data entry and editing. A data entry and editing form has a STREET NUMBER STREET-NAME
structure that can be defined by a grammar. Figure 2
illustrates a grammar for a data entry and editing form.
CITY
STATE
CENSUS ::= RESIDENCE*
RESIDENCE ::= PERSON*, ADDRESS ZIP
PERSON ::= NAME, SEX
NAME ::= FIRST-NAME, LAST-NAME more?
SEX ::= MALE | FEMALE Figure 3: Data entry and editing form generated from
ADDRESS ::= STREET, CITY, STATE, ZIP the grammar in Figure 2
STREET ::= NUMBER, STREET-NAME One of the challenging problems in this research is dealing
with multiple levels of nesting and with in-line use of
nesting and iteration. One solution to the problem is to
Figure 2: Grammar for a data entry and editing form "normalize" or "denormalize" the grammar. However, the
for the census grammar must make sense to the user of the system. Users
Once this grammar and a vocabulary for it have been should not have to understand CFGs, but must be able to
defined, VocalGenerator will generate the voice interface understand the relationships between different parts of the
supporting data entry and editing through the form the language they are programming in or form the are entering
same way it generated a programming interface for a data into.
programming language. The data entry and editing form A number of different types of commands are needed in
generated by VocalGenerator on the grammar in Figure 2 the syntax-directed programming interface as well as in
is illustrated in Figure 3. the forms-based data entry and editing interface. They
The example form in Figure 3 shows how we plan to deal include commands for
with each one of the basic constructs in a CFG when • operating system: file management, compilation,
generating the form interface: execution, debugging, etc.
• Sequence: Five non-terminals, RESIDENCE, • moving around: scroll-up, scroll-down, page-up,
PERSON, NAME, ADDRESS and STREET, are page-down, top, bottom, left, right, etc.
sequences. Each one is the root in a tree with the right-
• typing: characters, such as "(", ")", "<", ">", "=",
hand side non-terminals or terminals as branches.
"+", "-", "*", "/". Erase, delete and tab. Shift and return.
• Iteration: Two non-terminals, PERSON and
• editing: ctrl, esc, highlight, cut, copy, paste, find,
RESIDENCE, can be iterated. Additional data entry fields
and replace.
for PERSON and RESIDENCE can be generated by
clicking the "more?" buttons. • programming: vocabulary for a programming
language, such as "integer", "string", "boolean", "begin",
• Choice: One non-terminal, SEX, is implemented
"end", "if-then-else", "while", "for", "repeat", "case",
as a "radio" button.
"procedure", "function" and "comment". Variable names
and constants. Syntax-directed motion, such as "next
152
statement", "previous statement", "next non-terminal", ability to type. Some of these individuals will likely
"previous non-terminal", etc. already use voice recognition as their primary input
Fortunately, with the exception of commands for method. We will also recruit a sample of ten matching
programming, most of these types of commands are non-disabled programmers to develop baseline data about
provided for by voice recognition software, and we shall typing rate for individuals without disabilities.
concentrate on commands for programming, only. We will provide Dragon Naturally Speaking 4.0 (DNS)
for the ten individuals with disabilities. We will give each
FUTURE POTENTIAL programmer a two-day tutorial on the use of voice
The potential users of this type of technology is not recognition software to insure that each is fully trained in
limited to computer programmers. A new extensible its use. We will follow their use of DNS voice recognition
markup language, XML, will fast become a standard for software for a period of six months, providing additional
information interchange on the Internet. The use of XML tutoring as needed. The six-month training period will
will be pervasive in eCommerce and on the Web in insure that all subjects are proficient in the use of voice
general within a very short period of time. XML is recognition.
particularly interesting from the point-of-view of this
After six months, the programmers will be divided into
project because of the concept of an XML Schema. An
two groups. Half will continue to use Dragon Naturally
XML Schema is a grammatical definition of a collection
Speaking, the other half will use Dragon with our syntax-
of XML documents containing data with the same
directed editor. We will monitor their programming for a
meaning, structure and representation. The same way a
period of one month, logging all input. We will also
grammar for a programming language defines all well-
monitor the matching programmers without disabilities.
formed and valid programs that can be written in the
language defined by the grammar, an XML Schema To analyze the log data, we will develop a visualization
defines all well-formed and valid XML documents that program that will convert the log into a timeline. The
can be written in the language defined by the XML timeline will display tracks for each program in use. The
Schema. With some modifications, VocalGenerator can be users typing will be displayed with a graph of the moving
cloned to understand not only BNF definitions of CFGs, average of the typing rate. It will measure the typing rate
but also XML Schema definitions. for each line as well as for user selected segments of the
timeline.
EVALUATION OF VOCAL PROGRAMMER
Using this tool, we will be able to analyze input for bursts
Once we have built the VocalGenerator, we shall test the
of typing that are likely to occur in programming. It will
following two hypotheses:
also permit a finer analysis of verbal programming so that
• Hypothesis 1: Programming is faster using a we can detect and document error patterns, error
syntax directed voice recognition system than using a correction strategies as well as input and editing strategies
natural language recognition system. employed by the subjects.
• Hypothesis 2: A programmer with Carpal Tunnel Analysis of the non-disabled programmers work will give
Syndrome or other motor impairment will be able to us input rates for various tasks. We will be able to
program at a rate that is fast enough to maintain compare these rates to those of the programmers using
competitive employment. Typing speed is not the limiting Dragon Naturally Speaking and our syntax directed voice
factor for employment unless the rate is less than critical system. We will determine whether the programmers are
minimum number of lines per day. working at comparable rates and whether there is a
To test these hypotheses we will build VocalProgrammer significant difference between the three groups.
with log capabilities that will capture vocal commands as
interpreted by the voice engine and the voice engines text ENABLING TECHNOLOGIES
output. Each speech command will be logged and time VocalGenerator combines two existing computer
stamped for later analysis. We will also create a modified technologies, syntax-directed editor generators and voice
keyboard handler to capture typing outside of the recognition development environments. To the best of our
VocalProgrammer. The keyboard logger will be used to knowledge, such a combination does not currently exist.
record the typing of both disabled and non-disabled
experimental subjects. VocalProgrammers log will record Voice recognition software has become practical to use on
the voice command, the text entered in the editor, and a personal computers. It can easily run on personal
time stamp. The keyboard log will record each keystroke, computers and is often integrated into existing software
the destination application, and a time stamp. development environments, such as Visual Studio.
We will test this system on students with severe mobility Voice recognition software has been built to support text
impairment such as quadriplegia, carpal tunnel syndrome, entry. Though such systems are still difficult to use, they
cerebral palsy, or rheumatoid arthritis that restricts the are effective. The existing voice recognition software is
153
not effective for programming. The differences between and functions will need to be obtained. A person would
natural language and computer languages is significant have to pronounce each of these names to the system.
enough that attempting to use voice recognition software It is clear that the technology needed to build Vocal
for programming quickly becomes so slow and frustrating Generator is readily available.
that it makes it impractical.
Voice recognition development environments are comer-
cially available. These environments make it possible to HOW WE WILL BUILD IT
develop voice recognition applications without having to VocalProgrammer will be constructed using two
develop the voice recognition system itself. It is only a commercially available software packages: the Dragon
matter of using the existing voice recognition engine in NaturallySpeaking development tools and Microsoft
the software system the developer decides to create. Visual C++. VocalProgrammer will be made available as
freeware. It will require the user to have purchased
Dragon NaturallySpeaking ActiveX Components is a
Dragon NaturallySpeaking Professional Version, because
commercially available voice recognition engine. It is the
it requires the speech recognition engine within the
engine of the highly rated Dragon NaturallySpeaking
Dragon NaturallySpeaking Professional Version. The
voice recognition system. It is quite capable of providing
investigators will only keep sufficient rights to the
the voice recognition capabilities necessary for building a
software system so that they can continue research on it.
voice recognition system to support programming. The
Dragon NaturallySpeaking ActiveX Components contains From a design perspective, VocalProgrammer is a
software for building a library of vocabulary for a combination of two technologies: voice recognition and
computer language. This library consists of the words that syntax-directed editor generators.
are used in writing a program. These words are the tokens The voice recognition capabilities needed to produce
of the programming language, the names of functions and VocalProgrammer are completely within the software of
procedures found in standard libraries, and the names of the Dragon NaturallySpeaking development tools. The
variables, functions, and procedures that exist in the programming environment generated for a specific pro-
specific program being written. gramming language by VocalProgrammer will be able to
Syntax-directed editors have been a practical technology be fully controlled by voice. The Dragon Naturally
for almost 20 years. Syntax-directed editors take Speaking SDK contains a library of calls that provide the
advantage of the highly regular structure of computer ability to control a menu system by voice and enter text
languages. Instead of requiring a programmer to type continuously [1]. The programming environment will be
every character of a statement, the editor determines the constructed such that all the commands of the
remainder of the statement based on its beginning. environment can be executed from menus. The code that
will be entered by the users will be entered using the
Since syntax-directed editors use a formal description of
continuous speech mode supported by the Dragon
the computer language, once the system has been built for
NaturallySpeaking SDK. The Dragon NaturallySpeaking
one computer language it will be very easy to modify to
Enhanced Runtime [2], that is contained within the
make it work for a different computer language. So, once
Dragon NaturallySpeaking Professional Version, is
a syntax-directed editor is built for a specific computer
capable of supporting a completely application defined
language, such as Java, it is a relatively easy task to make
vocabulary. This will enable VocalProgrammer to replace
it work with another language, such as C++.
the default natural language vocabulary of Dragon
The Cornell Program Synthesizer is a syntax-directed NaturallySpeaking with the vocabulary of the specific
editor capable of being customized to support specific programming language.
computer languages. The technology is well understood
The word set for each specific programming language can
and easily reproduced if necessary. It is quite capable of
be extracted directly out of the documentation of that
providing the syntax-directed editing needed to build a
language. We are targeting Java because its full
voice recognition system for supporting programming.
documentation set is available for free on-line. This word
After VocalGenerator has been built for one programming set will be turned into a vocabulary recognized by the
language, only two things would be required for a Dragon NaturallySpeaking Extended Runtime by the
different programming language: a new language Dragon NaturalVoc Tool [3]. This tool also creates
specification and a new vocabulary library. The language statistical relationships between the words in the
specification should be very simple to find and would vocabulary produced by analyzing large quantities of text.
require only minor modifications from one either This text can be generated from the public domain Java
commercially available or available in public the domain. source code. This will produce a vocabulary for
The vocabulary library for the new language should also VocalJava.
be relatively easy to build. A list of names of procedures
Syntax directed editor generators are a well-understood
technology [12]. Syntax directed editor generators are
154
constructed from two technologies: editors and parser Communications of the ACM, Vol. 26, No. 11, pp.
generators. The functionality needed to construct an editor 912-920, 1983.
readily exists within Microsoft Visual C++. The source
[6] M. Gyssens, J. Paredaens, and D. VanGucht. A
code for a number of parser generators exist in the public
grammar-based approach towards unifying
domain in C++ [10]. This source code can easily be
hierarchical data models (extended abstract). In Proc.
ported to Microsoft Visual C++ if it is not already in a
ACM SIGMOD Int. Conf. on Management of Data,
form that can be compiled by this system.
Portland, Oregon, 1989.
The core of the software development effort involved in
building VocalProgrammer is integrating existing [7] J. Hopcroft and J. Ullman. Introduction to Automata
software systems. Theory, Languages, and Computation. Addison-
Wesley Publishing Company, Reading,
These are not the only issues in the construction of Massachusetts, 1979.
VocalProgrammer. Computer languages are not spoken,
they exist only in written form. A significant effort will be [8] S.C. Johnson. YACC: Yet another compiler-compiler.
necessary to determine how to vocalize these languages. Technical report, Bell Labs., 1978.
This is an issue that has not been explored yet and may [9] J. Leopold and A. Amber, Keyboardless Visual
have applications beyond programming by voice. When Programming Using Voice, 1997 IEEE Symposium
programmers discuss their code in situations, such as code on Visual Languages (VL '97) Isle of Capri, ITALY
reviews, a standard way of speaking the code itself may be September 23-26, 1997.
of value. In this way, this research may have value beyond
its original objectives of helping disabled computer [10] David Muir Sharnoff, Catolog of Compilers: BNF,
programmers program again. http://www.idiom.com/free-compilers/LANG/BNF-
1.html, downloaded on Feb. 4, 2000.
[11] Savitha Srinivasan and John Vergo, Object Oriented
REFERENCES
[1] Dragon Systems, Dragon NaturallySpeaking SDK, Reuse: Experience in Developing a Framework for
C++ and SAPI Guide and Reference, Newton, Mass. Speech Recognition Applications, The 20th
1999. International Conference on Software Engineering,
Kyoto, Japan, pp 322-330, 19 - 25 April 1998.
[2] Dragon Systems, Dragon NaturallySpeaking SDK,
Integrating a Runtime into Your Speech-Enabled [12] T. Teitelbaum and T. Reps. The Cornell Program
Application, Newton, Mass. 1999. Synthesizer: A Syntax-Directed Programming
Environment. Communications of the ACM, Vol. 24,
[3] Dragon Systems, Dragon NaturallySpeaking No. 9, pp. 563-573, 1981.
NaturalVoc Tools, User's Guide, Newton, Mass.
1999. [13] Voice XML Forum 2000, Voice XML Forum, Voice
eXtensible Markup Language,
[4] John Edwards, Voice Based Interfaces make better http://www.voicexml.org/specs/VoiceXML-100.pdf,
PC Listeners, Computer, August 1997. downloaded on May 8, 2000
[5] G.H. Gonnet and F. Tompa. A constructive approach
to the design of algorithms and their data structures.
155