Vdoc - Pub - Build Your Own Net Language and Compiler
Vdoc - Pub - Build Your Own Net Language and Compiler
.NET Language
and Compiler
EDWARD G. NILGES
Trademarked names may appear in this book. Rather than use a trademark symbol with every
occurrence of a trademarked name, we use the names only in an editorial fashion and to the
benefit of the trademark owner, with no intention of infringement of the trademark.
Lead Editor: Dan Appleman
Technical Reviewer: William Steele
Editorial Board: Steve Anglin, Dan Appleman, Ewan Buckingham, Gary Cornell, Tony Davis,
John Franklin, Jason Gilmore, Chris Mills, Steven Rycroft, Dominic Shakeshaft, Jim Sumser,
Karen Watterson, Gavin Wray, John Zukowski
Assistant Publisher: Grace Wong
Project Manager: Beth Christrnas
Copy Manager: Nicole LeClerc
Copy Editor: Marilyn Smith
Production Manager: Kari Brooks
Production Editor: KellyWmquist
Distributed to the book trade in the United States by Springer-Verlag New York, Inc., 175 Fifth
Avenue, New York, NY, 10010 and outside the United States by Springer-Verlag GmbH & Co. KG,
Tiergartenstr. 17,69112 Heidelberg, Germany.
In the United States: phone 1-800-SPRINGER, email orders@springer- ny. com, or visit
http:/ /www.springer-ny .com. Outside the United States: fax +49 6221 345229, email
orders@springer .de, or visit http: 1/www. springer. de.
For information on translations, please contact Apress directly at 2560 Ninth Street, Suite 219,
Berkeley, CA 94710. Phone 510-549-5930, fax 510-549-5939, email info@apress. com, or visit
http://www.apress.com.
The information in this book is distributed on an "as is" hasis, without warranty. Although every
precaution has been taken in the preparation of this work, neither the author(s) nor Apress shall
have any liability to any person or entity with respect to any loss or damage caused or alleged to
be caused directly or indirectly by the information contained in this work.
The source code for this book is available to readers at http:/ /www.apress.com in the Downloads
section. You will need to answer questions pertaining to this book in order to successfully
download the code.
Contents at a Glance
About the Author ................................................... ix
Acknowledgments .................................................... xi
Introduction ...................................................... xiii
Chapter 1 A Brief History of Compiler Technology ............ 1
Chapter 2 A Brief Introduction to the •NET Framework ...... 15
Chapter 3 A Compiler Flyover ..................................27
Chapter 4 The Syntax for the QuickBasic Compiler ........... 51
Chapter 5 The Lexical Analyzer for the
QuickBasic Compiler ................................. 91
Chapter 6 QuickBasic Object Modeling ........................ 133
Chapter 7 The Parser and Code Generator
for the QuickBasic Compiler ...................... 171
Chapter 8 Developing Assemblers and Interpreters ......... .205
Chapter 9 Code Generation to the
Common Language Runtime ...........................229
Chapter 10 Implementing Business Rules ......................243
Chapter 11 Language Design: Some Notes ......................267
Appendix A quickBasicEngine Language Manual .................287
Appendix B quickBasicEngine Reference Manual ................297
Index ..............................................................375
iii
Contents
About the Authors ................................................. ix
Acknowledgments .................................................... xi
Introduction ...................................................... xiii
v
Contents
vi
Contents
vii
Contents
viii
About the Author
Edward G. Nilges has programmed since 1970, when he learned machine language
for an 8KB IBM 1401 as part of an elaborate draft-dodging scheme that appears to
have gotten out of hand.
Early on, Edward discovered the power of languages and their translation in
"ordinary" management information systems (MIS) applications. He consoli-
dated several applications into one by creating a specifications language within
the 1401 's constraints, and he also provided his university with a working Fortran
compiler. Some of his early adventures are relevant to today's challenges, and
this book contains some of Edward's unexpurgated war stories.
Edward has developed millions of lines of code for MIS, telecommunications,
naval architecture, and education applications. He has developed several compil-
ers, including internal compilers for telecommunications applications at Nortel,
the QuickBasic compiler of this book, and a compiler for the Mouse language that
fits in lKB of storage. He has taught at Roosevelt University in Chicago and DeVry
University, and delivered training classes at Princeton University.
At Princeton, Edward was honored to assist the real-life protagonist of the
recent film A Beautiful Mind, John Nash, with a bug in the old Microsoft C com-
piler. Edward was also privileged to meet Cornel West, the noted American
philosopher, and Ralph Nader. He took classes in philosophy and computer sci-
ence, and gained access to Firestone Library (and has since paid fines accrued).
Currently, Edward is working in China on methodologies for transferring
client! server applications to the Web, while also studying written and spoken
Chinese.
Edward has two grown children and a former wife who he honors as the
mother of those children. Indeed, he calls himself Edward G. Nilges to disambig-
uate himself from his eldest son, Edward A Nilges, who is studying philosophy at
the University of Illinois and has contributed errata to Bjarne Stroustrup's book,
The c++ Programming Language. His other son, Peter "Chauncey" "Zeit-Bug"
Nilges, recently graduated cum laude from DePaul.
Edward has published material on computer and general topics since 1976,
when he suggested in Computerworld that it was possible to write structured code
in assembly language, and got yelled at by Ed Yourdon. Recent articles include util-
ities for string conversion and display in Visual Studio, and a critical assessment of
the language we use in speaking about database theory, published in the Austrian
journal Labyrinths.
Current interests include .NET, art, running, reading, China, and world
philosophy.
ix
Acknowledgments
THIS BOOK IS DUE to a suggestion of David Treadwell of Princeton and Microsoft,
because he suggested a list of potential languages for .NET implementation,
including QuickBasic. Initial impetus was provided by Josef Finsel, author of The
Handbook for Reluctant Database Administrators, and I am in Mr. Finsel's debt
for this reason.
Dan Appleman's support and patience during the excitements of its develop-
ment is most appreciated, as is that of Marilyn Smith, Beth Christmas, Grace Wong,
Nicole LeClerc, Karl Brooks, Bill Johncocks, Kurt Krames, and KellyWmquist, as well
as the accounting team at Apress.
Dan Appleman in fact volunteered his valuable time for a developmental
edit and put up with some of my deeper nonsense with a great deal of patience.
I need to thank the gang at the Evanston YMCA, as well as the operators of
various executive stay places around the world for providing working space at
various times, for my day jobs have taken me to the far corners of the world.
The Silicon Valley "out-to-Iunch bunch," including Rick, Ragu, William, Bill,
and Jason, are also owed a debt of gratitude for their assistance, including Rick's
wireless card, Ragu's thoughtfulness, William's unfailing kindness and Jason's lap-
top, which is toast, I'm afraid.
Helmut Epp of DePaul University, Max Plager of Roosevelt University, the
late E. D. Klemke of Roosevelt University and Iowa State University, and Gilbert
Harman of Princeton University are all academics from whom I have learned
a prior dedication to the truth of the matter.
Long-suffering managers to whom this book is dedicated include Rita Saltz,
Robert Geiger (a Visual Basic authority in his own right from whom I learned
much), Jeff Burtenshaw, and Monsieur Hugh Levaux.
Tim '!yler was and remains a source of spiritual guidance before, during, and
after the writing of this book.
Lee The at Fawcette assisted with an earlier release of part of the software in
an article on Visual Studio and was a most patient and learned editor.
My strange friend Alex, "Sasha Alexandrovich" Gaydasch, is also owed a debt
of gratitude for his support and advice over a period of many years.
Of course, we all owe Edsger Wybe Dijkstra a debt for showing how integrity
goes a long way.
But the main dedication of this effort is to Darlene Nilges, Eddie Nilges, and
Peter Nilges (junglee Peter), for in dreams begin responsibilities.
xi
Introduction
I mean, if 10 years from now, when you are doing something quick and dirty,
you suddenly visualize that I am looking over your shoulder and say to your-
self, "Dijkstra would not have liked this," well, that would be enough
immortality for me.
-Edsger Wybe Dijkstra
Let us not speak falsely now, the hour is much too late.
-Bob Dylan, '!till Along the Watchtower"
DIJI(STRA DIDN'T PLAY for the Chicago Cubs baseball club, to my knowledge, nor
did he play for the Arsenal, Chelsea, Antwerp or Eindhoven football organizations
(nor did Dylan, but you knew that). Instead, Edsger Wybe Dijkstra was a found-
ing computer scientist who was involved in the early Algol language and either
invented or reported the invention of structured programming.
And since Dijkstra passed on in August 2002, he is rolling in his grave. Here
is a book on how to write a halfway decent compiler, using object-oriented tech-
niques (about which Dijkstra was skeptical) to compile Basic (which he felt was
a mental mutilation) that has the gall, the side, the cheek ... to quote the guy!
That is because Dijkstra was also one of the few computing scientists to keep
steadfast in his mind the true proposition that computing science is applied sci-
ence, and that is because Dijkstra refused to divorce theory and practice.
Furthermore, I cannot believe that Dijkstra would dislike the desire to know
how compilers work. I have set myself the task of communicating this, at a basic
level, to a wide audience of "ordinary" (ordinary?) programmers.
These are the numerous hard-working programmers who have written code,
probably, for Visual Basic and C++ COM and now are working, probably, in C#
and in Visual Basic .NET. I would like to show that a responsible compiler can be
written in Visual Basic. I would like to provide the complete, runnable, and mod-
ifiable, source code at the Apress Web site. I would also like world peace and
harmony, but I digress.
Why not C#? I didn't choose C# because of a simple theory of mine. All, or
nearly all, C# programmers know Visual Basic, but not all Visual Basic program-
mers have made the transition to C#. And despite the flash and glamour of C#,
there is nothing doable in C# that cannot be done in Visual Basic.
My goal is to demystify and to deconstruct a skill set that can be of actual
use in .NET and Java. Write-once, run anywhere is a goal that entails a need for
more compilers, and more generally, greater portability and ease of modifica-
tion, not only of code, but also of business rules, stored as data.
xiii
Introduction
xiv
Introduction
xv
CHAPTER 1
A Brief History of
Compiler Technology
I would therefore like to posit that computing's central challenge,
viz. 'how not to make a mess of it,' has not been met.
-Edsger Dijkstra
THE LATE, HERO COMPUTER SCIENTIST, Edsger Dijkstra, was rather confident. He
seemed to know that computing's central challenge is not messing up. Some pro-
grammers and their managers might contend that the main challenge is achieving
user satisfaction.
In either case, writing parsers and compilers will be a challenge!
You can learn much about compilers from their history. Therefore, this chap-
ter describes the mainstream history of compilers and follows up with a look at
the sidestream history of Basic compilers.
1. In the 1940s and 1950s, the presumption on the part of the big shots was that some "girl"
could prepare programs for their hardware and perhaps find an up-and-COming graduate
student to wed. (I'm not making any of this up.)
1
Chapter 1
for everything. If something goes wrong with his or her program, a programmer
is expected to fix it quickly and accurately.
John von Neumann thought that any use of the computer to assist in pro-
gramming was a waste of a valuable resource by lazy programmers. But one
early programmer, Grace M. Hopper, 2 discovered that the computer itself could
be an aid in preparing bug-free programs. Her work led to the first two major
computer languages: Fortran and Algol.
2. At the time Grace M. Hopper began exploring the use of the computer for programming (in
the late 1940s), she was a lieutenant in the United States Navy. She later became an admiral
in recognition of her accomplishments.
3. At the time, IBM was competing with Univac, now Unisys, for dominance of the computer
industry.
4. Edsger Dijkstra was an early Algol programmer. He noticed that Algol programmers could use
block structure to avoid goto statements, and he wrote a famous letter to the editor of a com-
puter journal, "Go To Considered Harmful."
2
A BriefHistory of Compiler Technology
produce. Fortran met this requirement, but Algol did not until about 1960, when
the Burroughs Corporation provided a machine whose hardware was able to run
Algol efficiently.
3
Chapter 1
In this era, IBM would agree only to have a customer engineer show up in
a white shirt and tie, and make a "best effort" to solve the problem. 5 In this
case, the customer engineer did a credible job without seeing the real prob-
lem and without knowing the configuration of the machine. This was, in fact,
a best effort.
Max had shown us that the machine included the optional hardware, and
I simply removed the subroutine (working completely in machine language)
and replaced its call by instructions to multiply and divide. The machine then
compiled and ran several programs through to completion.
This ranks with my discovery of Visual Basic 3 as a true Eureka moment. Max
nearly fell out of his chair when, at the next meeting of the university computer
committee, I announced the fix. H. Chang Shih of the Physics department
bought me a drink at Jimmy Wong's watering hole, which was then across the
street from Roosevelt.
We used the Fortran compiler continually for teaching and support. Although it
did not generate very fast code, it was a great way to solve problems quickly.
For example, Fortran-II had a very complete Print statement with a format fea-
ture that supported both multiple lines and replacement of control sequences
by data. This made it easy to elegantly format reports. In contrast, coding
reports in assembler was very tedious.
Fortran and Algol were followed by a plethora of compilers, both famous and
infamous, and initially in the tradition of Fortran. For example, early Cobol pro-
grams, like early Fortran programs, were primarily single main procedures with
goto commands for flow control.
Cobol raised and then dashed some management expectations. An Air
Force general was overheard to say, "Now that we have Cobol, can we get rid of
all those beatnik programmers?" (In the early 1960s, "beatnik" meant "slacker.")
But managers have continued to depend on programmers throughout the beat,
hippie, Generation X, and slacker eras.
IBM introduced an ambitious programming language, Programming
Language One (PLlI) in 1964, when it introduced its System/360 line of main-
frames. This language owed much to Algol because it was fully block structured.
However, PLlI's scale and scope exceeded the capabilities of its designers and
compiler writers, and it wasn't until 1974 that truly useful compilers became
available for PLII.
5. This "best effort" approach has been replaced by the vow to solve the problem no matter
what. The benefit is that, perhaps, more problems are solved. The downside is that many
"solutions" are hacks.
4
A BriefHistory of Compiler Technology
6. Of course, many C programmers have a variety of personal standards that prevent the choice
of C from being as dangerous as it could be. The problem is that, in general, they don't have
the organizational clout to enforce these standards over the complete system life cycle.
5
Chapter 1
Both the Fortran and the Algol teams ~ted to write compilers that would
generate highly efficient and optimized code. Their motto was "you compile
once, but you run many times." But Kemeny, and a separate group at Purdue
University, noticed that this is not true for student programs, which compile
many times and bomb out often.
Kemeny reasoned that a compiler for nonprofessionals should be fast and
should accurately link runtime errors to source code. His compiler, and the
Purdue University Fast Fortran Translator (PUFFT) system, used an "interpreter"
language and an interpreter to change instructions on the fly to actual machine
code at runtime.
Interpreters, as code that converts special codes to actual machine instruc-
tions every time the object code is run, are slower, by definition, than pure
object code. But because the special codes can be directly tied to source lines,
error reporting in interpreters can be highly accurate and understandable.
lust-In-Time Compilation
Just-in-TIme, or JIT, compilation is sometimes confused with the older technique
of interpretation in which the source program was converted to an intermediate
form and then "executed" by the interpreter. The interpreter needed to translate
the intermediate form each time an instruction was executed.
The similarity is that processing of the "object" code representation occurs
after compilation. The difference is that JIT compilation, unlike interpretation,
generates actual machine code for reuse. Interpreters produce machine code
for immediate consumption each time an interpreted instruction is executed,
and this magnifies the effect of any preexisting loops in the interpreted code.
No such magnifying effect appears in JIT compilation.
The conventional wisdom is that interpretation is slow. Therefore, the JIT com-
pilation used with .NET and Java creates a perception that such environments
might run code more slowly.
However, the use of object-oriented programming (OOP), as you will see in this
book, means that the "data" (the source code) and the procedures (the JIT com-
piler routines) are close together, and this avoids unnecessary sequential passes
over large amounts of code. The result is that JIT compilation to .NET's CLR cre-
ates code that is often somewhat slower than raw, native COM applications, but
those applications are not as flexible as .NET applications.
Earlier interpreters, because of the cost of storage, had to "pass over" source
code and translate it entirely to the interpreted special codes. Then the inter-
preter needed to modify each special code into machine language repeatedly
throughout the execution of the program, magnifying the effects of loops.
6
A BriefHistory of Compiler Technology
Suppose instead that the interpreter could save its work in the form of the com-
pilation of special code to object code, on the fly. In a procedural language, this
opens a can of worms, in which tables must be efficiently constructed and
searched. OOP provides a straightforward association of code and working data.
OOP's tighter linkage of the compiler's instructions with its data (the source and
interpreted code) means that both compilation and interpretation can be per-
formed incrementally, or "just in time," and the output of the compilation in the
form of binary machine code can be saved with specific instances of executable
objects. This is because the data is, by definition, inside the object instance. For
this reason, there is far less overhead in finding the data.
For example, the compiler that I present in this book stores all the information
about a variable in an object. The procedure responsible for accessing a variable
no longer confronts, on entry, a huge table of variables-by definition, more
than it wants to know. This procedure does not need to search a table because it
is presented with one handle to all the information about the variable, including
its name, value, and type.
Of course, the compiler does have a table of the variables. However, other proce-
dures are responsible for obtaining the variable using its name. It's true that the
basics are the same/ but overall, in a well-designed OOP solution, there tends to
be less rummaging around, because once the object is found, a rich set of data
is linked to it.
In the late 1960s, a number of computer scientists noted that, because computer
languages had to be strict and formally specified (unlike normal human languages),
not only was the process of writing a specific compiler itself the development of an
algorithm, it, in turn, could be algorithmically specified in a compiler generator.
However, this idea strained the capacities of mainframes and the abilities of earlier
programmers. Instead, two spin-offs from the overall effort became widely used.
These were the lexx and yacc programs of the Unix operating system. lexx accepts
a definition of the low-level syntax of the language, and yacc accepts a definition of
its high-level syntax. Together, they generate C or C++ code to parse the language.
The lexx and yacc programs dominate good compiler design practice today.
However, effective use of lexx and yacc requires knowledge of compiler internals,
since these are "white box" tools. 8
7. As you will see in Chapter 8, the Collection object provides a convenient hash-based search
to map variable names to variable objects.
8. White box tools assume that users know basically what the tools are doing on the users'
behalf.
7
Chapter 1
Today, the Java and .NET "compile once, run anywhere" credo has created
some innovation in compiler development because this portability creates
demand for compilers. In addition, the increasing use of object-oriented devel-
opment and programming has produced compilers of higher quality, since the
tighter coupling of data and software means that the compiler developer no
longer needs to build enormous tables for the entire source.
Basic Compilers
As I've mentioned, the Basic language was invented by a group at Dartmouth
University in the 1960s. It initially targeted General Electric time-sharing machines,
but a few years later, programmers of Digital Equipment Corporation (DEC)
hardware (the progressively more powerful systems DEC PDP-8, DEC 10, and
DEC 20) developed a number of time-shared Basic compilers.
Early Basic
8
A BriefHistory of Compiler Technology
characters were not a problem in toy and demo programs Oike Print 'Hello world'),
they made large programs for user solutions impossible to develop because the
Basic compiler was not able to read the entire source.
As you crazed coders out there can imagine, there were workarounds for
handling large programs, including using a disk to save part of the source code.
But consider that waiting for even a modem form of virtual memory to catch up
can be very irritating. Another approach was to reduce the usable symbols of
the Basic source code to the smallest possible code. For example, if there were
only 256 different identifiers-such as PAYRATE, GROSS, NET, and so on-in
a Basic program, you could save a lot of space in the interpreted code by replac-
ing the identifier with its position in a 256-position list. This index would take
only 1 byte.
By the early 1980s, many desktop developers had already used various
hacked Basic compilers to create quite a lot of business and other support for
real users. IBM shipped a solid Basic, GW-Basic, with its very rugged IBM PC in
1981. Microsoft offered the QuickBasic compiler and interpretive runtime, which
was used heavily on MS-DOS systems.
Visual Basic
During the 1980s, as desktops became powerful enough to support bitmap
graphics, developers discovered the graphical user interface (GUI). At that time,
there were two common ways of adding a GUI to a program. The most popular
way was to spend a small amount of time on hacking. However, this created con-
siderable inflexibility and unneeded complexity. The other way was to spend
9
Chapter 1
a lot of time on the careful design of an underlying reusable engine for the GUI.
With luck, this would result in code that could be reused by different applica-
tions and perhaps plugged into different systems. lO
The need to generate custom graphics engines was largely eliminated in the
1980s with the introduction of MicrosoftWmdows (the notable exception at the
time being games, which demanded better access to the hardware than Windows
could provide). Windows 3.1 provided considerable additional power in the form
of forms and controls for display and entry of data-at a high and somewhat
hidden price. To create the simplest command button, programmers in C and
assembler had to write large amounts of repetitious code.
Perhaps for this reason, Alan Cooper developed and sold an engine for draw-
ing forms and controls, known as the Ruby form engine, which was usable as
a set of Application Programming Interfaces (APls) from a variety of languages.
He sold this product to Microsoft, who integrated Ruby with QuickBasic. In 1990,
Microsoft announced the stunningly high-qUality product, Visual Basic 1.
I like to code. But I do not like to code the same instructions repeatedly.
Therefore, I was thrilled, when I adopted Visual Basic 3 in 1993, to be able to
summon up simple forms and their controls, even with a language I considered
clunky compared with C.
10. My experience is that this type of approach drives hard-working managers crazy. One rea-
son is that it's an investment not justified by most business cases. Also, in practical terms, it
is a license for ambitious programmers to spend too much time on the interesting, fun, and
possibly renumerative development of the GUI as a product they could, in some scenarios,
resell.
10
A BriefHistory of Compiler Technology
Summary
This chapter provided some historical background. At the bottom of the "dark
chasm and abyss of time," we don't see C. This is because time did not start in
January 1970 (nor will it end in 2034 when Unix runs out of bits to track time
since January 1971).
Brian Kernighan, the author of the basic book on C, The C Programming
Language, actually uses Visual Basic to teach introductory computer science to
non-majors at Princeton. Brian reasons that America's "best and brightest," who
will go on from Princeton to run the country in some cases, need to know about
real programming, much of which is MIS programming. MIS programming can
be intellectually challenging, but is thought to be For Dummies, with the result
that the challenges aren't adequately met, or are met by Dummies, who are
Dummies because of low self-esteem. Compiler design in Visual Basic is one
excellent way to master Visual Basic for other challenges.
Algol, not C, was truly groundbreaking, while Fortran showed it was possible
to develop a compiler that could outperform human programmers. Algol's key
concept-that a list of statements can, in turn, be a statement-generated struc-
tured programming, which Visual Basic inherits.
I conclude that programming is a human adventure and only accidentally
about programming languages and computers per se. Indeed, see Chapter 4's
introductory material for an alarming if not gnomic quote from a hero computer
scientist in this regard, which will help us to focus on the right stuff.
11
Chapter 1
Challenge Exercise
Crazed coders like challenges. Using your existing programming experience,
consider tackling the following challenge. Otherwise, return to it after reading
Chapter 10.
Your end user's system is characterized by the need to enter and frequently
change business rules such as:
Justify developing the code for the business tier in Visual Basic, knowing that you
will need to frequently change the rules. How much time will be spent in main-
taining the code? What will happen if contradictory or conflicting rules exist in
the code, such as the sample rule, plus the following:
Resources
If you are interested in learning more about compiler history, I suggest the fol-
lowing resources:
"On the Cruelty of Really Teaching Computing Science," CACM, Vol. 32,
No. 12, December 1989, page 1404; by EdsgerW. Dijkstra. This article
gives a good idea of Dijkstra's contention that computer science really
is rather different and why it is hard.
"GaTo Considered Harmful," CACM, Vol. 11, No.8, March 1968, page 147;
by Edsger W. Dijkstra. This article is a bit difficult to read but worthwhile.
It ranks as Dijkstra's invention of structured programming, although he
was too humble to say so.
12
A BriefHistory of Compiler Technology
"The History ofVisuaI Basic and BASIC on the PC," by George Mack (2002),
http://dc37.dawsoncollege.qc.ca/compsci/gmack/info/VBHistory.htm.1lrls
Web site describes the background of early Basic. Bill Gates will be the first
to admit that he did not invent Basic.
13
CHAPTER 2
A Brief Introduction
to the .NET Framework
Every few years, the modern-day programmer must be willing to perform
a self-inflicted knowledge transplant to stay current with new technologies.
-Andrew Troelsen
All that is solid melts into air, all that is holy is profaned...
-Karl Marx
WHEN MICROSOFT BROUGHT OUT the .NET Framework, it was a radical shift and
a wake-up call. This chapter describes the basics of this Framework.
According to people I've met at Microsoft, the Framework adds a computer
science level to Visual Basic. However, this doesn't mean that you need to return to
school. Instead, I recommend you refuel in flight. This book will help you to do so.
This chapter is an introduction to some of the issues that arise, in practice,
when code (including, of course, code inside compilers) is reused and how .NET
addresses some of the problems in reuse, including the infamous "DLL hell"
problem. This chapter will explain how the Common Type Specification (CTS)
and Common Language Specification (CLS) provide write-once, run anywhere
interoperability for code in multiple languages. We'll also take an in-depth look
at the Common Language Runtime (CLR) and identify the base class libraries
that support a large .NET toolkit.
.NET binaries provide a layer of information that avoids D11 hell. At the end of
this chapter, we'll briefly examine their structure to see how they accomplish this.
15
Chapter 2
dynamic libraries, but they remain a part of the program. When the program
loads, it gets enough memory to load the largest dynamic library. Then when you
want to run the Accounts Receivable functionality, it loads the Accounts Receivable
library and uses it. When you need Accounts Payable, the program unloads the
Account Receivable library and loads the Accounts Payable library. These
dynamic libraries, however, are closely tied to the program, providing half of
the solution.
The second half of the solution comes from device drivers. In the past, if
you were lucky enough to have one of those RGB monitors that could do color
graphics, and you wanted to write a program to use it, you either had to write
directly to the hardware (not pretty) or write to a device driver.
Writing directly to the hardware isn't ugly because it is hard to do and requires
knowledge. Rather, it is ugly because successful use by your software depends on
a large number of preconditions, which have nothing to do with the needs of the
user. It is very annoying for the end user to need to keep old hardware alive just to
run needed packages.
Of course, just because your code worked with one company's device driver
was no guarantee it would work with someone else's.l This led to more dynamic
libraries, less for memory than to load the code that worked with the driver you
specified.
Driver troubles began to be resolved with Windows, which introduced virtual
hardware. 2 Rather than writing to the driver for the hardware, Wmdows allows the
programmer to write a consistent interface and let the operating system handle
writing to the hardware. This means that you need to write the code only once,
and it will work with any monitor, printer, keyboard, and so on.
It wasn't long before this idea spread beyond hardware, and all kinds of DLLs
were being written that provided some consistent interface for doing tasks. This
made many tasks easier. Rather than becoming a data-access expert, you could
use Open Database Connectivity (ODBC) or later, Active Data Objects (ADO).
Learn a few basics of how to connect, how to request data, and how to update
data, and you could access any ODBC-compliant database.
Of course, ODBC and ADO provide new complexities and new issues, and
they do not always make your job easier. However, improvements in technology
improve your programs, without the need to change code. If the ODBC driver is
improved, then upgrading the driver makes your program work more efficiently.
The database system isn't locked to old hardware and operating systems. On the
1. Nonprogramming computer users are often astonished by the need to acquire new drivers
for new hardware. They are not amused by device driver conflicts. a consequence of the
original (19808) vintage design of PCs.
2. Nonprogramming computer users may be ROTFL (rolling on the floor laughing) because
they still have problems with Wmdows drivers. However. they may fail to realize that driver
problems are an inevitable consequence of the new stuff they have to play with. Newer
software always tends to be more buggy. even though Plug and Play now works in the vast
majority of cases.
16
A BriefIntroduction to the .NET Framework
other hand, accessing the new goodies does impose a converse requirement: the
forced upgrade when the new feature requires a current operating system. On
the whole and from a business perspective, however, this is much easier than
operating museums of computing legacy arcana just to support users.
With the introduction ofDlls and all the derivatives (COM, COM+-the alpha-
bet soup can be dizzying), programmers became more efficient. But most
managers didn't notice, since the programmers were asked to do more work. 3
No longer did programmers need to deal with the underlying plumbing.
Microsoft continued to make their jobs easier by introducing technology like
Microsoft Transaction Server (MTS) to handle transactional processing. Yes, life
was good as long as a few rules were followed.
17
Chapter 2
7. Thou shalt put error checking within all thy code and handle any errors
thou can.
8. When thou encounters an error thou cannot handle, thou shall pass it up
with correct and documented error codes.
9. Thou shalt not change thy error codes, though thou may add new ones.
10. Thou shalt generate a complete executable for the entire system, starting
at day one of coding
11. Thou shalt play nicely with other DILs.
As long as you follow the rules for DLLs, everything works, and you save
vast amounts of programming resources. And following the rules isn't too hard,
unless you happen to live in the real world of users, operating systems, and
third-party controls. Then you could easily find yourself transported into DLL
hell. Let's take a simple (and all too common) example.
You have a project that uses version 1.5 of a common third-party widget
from the Acme Novelty Company. The widget is used a lot in your program,
which is used consistently by the president of your company to keep her fin-
gers on the pulse of the company.
Your company isn't a software company, and the president, like many other
managers, is focused on her job of running a company. She neither needs nor
wants to understand the minutiae of programming. However, she has enough
technical know-how to be able to download and install software. One day, a peer
recommends a demo program. He has the demo from when he installed it last
year, so he gives it to her, and she installs it.
And that program, during the installation, installs version 1.3 of the widget
your program relies on. Now it shouldn't have, because that violates one of the
11 commandments of DLLs. But it does. And the president doesn't notice any-
thing wrong while she's exploring this new software package. In fact, she doesn't
notice anything wrong until that afternoon, when she decides to run your pro-
gram, to check the pulse of the company; in other words, to do her job. And she
gets an error.
The egg hits the fan. She calls your tech support team because your software
is broken. Time, energy, and resources go into finding the cause. And how do you
explain that someone else broke your software? You sound like a weasel.
This brings us to .NET.
18
A Brief Introduction to the .NET Framework
.NET-Beyond DLLs
One of the goals of .NET was getting rid of DLL hell. Another was making coding
and program interaction easier. And thus was born the .NET Framework. To under-
stand the Framework, you need to look at what it's made of, and that would be the
four Cs: CTS, CLS, CLR, and class libraries.
In later chapters, you'll explore the Framework pieces in detail, because the
QuickBasic compiler that we're going to build will use the four Cs. Here, you'll
just get an introduction to how these parts work. However, we'll spend some
time examining the CLR, because that will help you to understand the design
decisions in our compiler.
The CTS
The CTS is the COll1l1on Type Specification. This defines all of the possible data
types and constructs supported by the runtime environment. This means that
a 32-bit integer is a 32-bit integer everywhere. Providing the definitions of the
data types allows everyone to work together. Think of it as the metric scale for
programming languages-a way to standardize.
The CTS is, unfortunately for Visual Basic programmers, based on the C lan-
guage. In consequence, an Integer data type in Visual Basic .NET is a 32-bit integer
in the range -21\31 to 21\31-1, not a 16-bit integer in the range -21\15 to 21\15-l.
Arrays in Visual Basic .NET can no longer start at any index other than zero.
When you declare an array, such as strArray(5), you are specifying not the number
of elements, but the upper bound of the array. strArray(5) declares six elements
now numbered in .NET, at all times, from 0 through 5.
Strings in the CTS are updated differently than strings in Visual Basic 6. Visual
Basic 6's runtime manages strings in their own region because, traditionally, Basic
languages have treated the string as an independent data type and imposed no
fixed limit on strings.
In .NET, Visual Basic strings are implemented using the .NET String object. In
.NET, when the string is altered, the alteration makes a copy of the original string,
throwing away the old string.
TIP In cases where you need to frequently alter the contents of a string, con-
sider using the .NETSystem. Text. Stringbuilder object, which allows for
more efficient modification.
19
Chapter 2
The CLS
The CLS is the Common Language Sped fication. This part of the framework defines
what every language must implement to be a .NET language. It's a subset of the
CTS, because not all of the types defined in the CTS are in the CLS (for example,
Visual Basic .NET does not have the ability to declare an unsigned number).
However, as long as the code you write sticks to the CLS-defined types, it will
interact with code written in any other language without any problems.
The CLR
The CLR, or Common Language Runtime, is the heart of the .NET Framework. The
CLR handles loading your code, managing variables and objects, and providing
the interconnectivity between all .NET programs.
The CLR is structurally a simple virtual machine definable by an interpreter.
The CLR, however, doesn't have the performance penalty of classic software
interpreters, because on first execution the code is compiled, "just in time," to
native code.
The confusing fact is that while you can and s.hould think of the CLR as
a traditional slow interpreter for a virtual machine, it actually converts to native
code behind the scenes. Individual compilers act as if they were creating purely
interpreted CLR code, but behind the scenes, JIT compilers tailor the instruc-
tions to the platform. In this way, the compiler does not need to be rewritten to
generate code for a new or different machine.
NOTE In this book, you'll see two programs (the runtime o/Chapter 3's prod-
uct integerCalc and the testing runtime o/the quickBasicEngine compiler
itself) that show how to develop a virtual machine, similar to the CLR but
much less effiCient.
20
A Brief Introduction to the .NET Framework
123
.,67e-12 I
4,.123 _ The heap contains reference obj ects
'I
of various shapes and sizes, in eluding
system objects like strings and
"I. \ user-defined objects.
\.
t-- The stack contains numbers
and pointers to the heap.
~\ \..
Customer object
\
"This is a string"
As you can see in Figure 2-1, the numbers are represented directly in the stack.
Visual Basic strings, which can have widely varying lengths, are represented by
pointers to the heap. Value objects can also be stored on the heap, using a process
called boxing.
The heap is somewhat like your garage, before your significant other gets
you to clean it up. The garage is where you put objects that don't fit neatly in the
house. The major difference is that the objects in the heap are accessed more
frequently than the foam plastic things that hold electronic equipment, out-of-
date computers, out-of-date computer books, infant carriers for infants who are
now anguished teens, and back issues of National Geographic.
Objects that use the heap are subject to a process known as garbage collection.
Regularly, and at an interval not under your control (unless you force garbage col-
lection by calling GC . Collect), the Framework sweeps through the heap and deletes
objects that are no longer referenced by your program.
21
Chapter 2
The alert reader may have a question here. If, as I said, a string is a reference
object, then if a string appears in the General Declarations section, shouldn't
this force the conscientious programmer to implement dispose? The answer is
that the string object does not and will not itself create "open-ended" reference
objects in the heap. The string will be in the heap, but it will never create any
reference objects on its own. The string is a "closed" object, which, as a string,
can be left on the stack for the garbage collector.
But a true reference object is permitted, now or after being modified, to create
reference objects as part of its state. If you do not implement a dispose for
a true reference object (or fail to call dispose when you're finished), runaway
use of the heap is possible.
It might be the case that you know a reference object is very simple and safe,
and you know that it does not create any further references. However, real
experience in real development teaches you to also know that this situation
might change.
The goal of a team standard dispose is to avoid open-ended situations that
result when a more complex reference object is not destroyed by executing its
own dispose, leaving it and any objects it declares (directly, or indirectly by cre-
ating other open-ended reference objects). The heap becomes cluttered with
dead soldiers, which appear, to the CLR, as needed objects. If you do not
implement a dispose, reference objects will clutter the heap until the system
gets around to freeing their storage.
Portability
A major design objective of the CLR was to enable "write once, run anywhere."
For example, you may want your software to run on the Web, on different servers,
and even on different hardware platforms (and your manager may want this even
more than you do). There are many cases where the cost to create software that
runs on more than one box can't be justified.
However, software should be designed whenever possible to run on more
than one box. This is natural in the university environment, for example, where
faculty members do not want to submit to the rules of a centralized facility. In
industry, we don't talk as much about this need, but the business reality may cre-
ate it anyway. We don't control, for example, an upper management decision to
change platforms or demand that the software run as a Web service.
The CLR deconstructs the idea that computing power takes a positive amount
of extra thinking time, proportional to the power increment. If you follow the rules,
and in a Microsoft tum of phrase "let go and let the CLR," the code will be trans-
portable for free.
22
A BriefIntroduction to the .NET Framework
Of course, we've heard this before. There have been cases where the promise
wasn't fulfilled. But in significant areas, as long as the platform supports the
Framework (which happens to be free), transportability exists to a much higher
degree than it did with COM.
Reliability
A second major design objective of the CLR was to make operations predictable
and to avoid bugs based on creative abuse of data types. A classic example is
using string data for a numeric operation and forcing overflow deliberately for
a result that you "know" will occur. The problem arises when the overflow does
not occur as planned when the software runs on a new machine or in a different
environment.
For example, a poorly written program might read a text field from a SQL
database and immediately try to do arithmetic on the field, creating a crash in
the field. Or the programmer might add one to a number such as 32,767 (the
maximum value of a Short integer) just to transfer control out of a deeply nested
set of procedures.
Respect the CLR, for it encapsulates years of knowledge about how to create
reliable software, on schedule. Lessons learned by Microsoft and incorporated by
the developers of the CLR include the lesson of the stack, the lesson of typing,
and the lesson of just-in-time.
The lesson of the stack is that for solving problems, a machine or paradigm
with a stack is better than a machine or paradigm with a small number of
general-purpose registers. This is because even simple problems can and should
be broken down into simpler subsolutions. Not all managers see this, and this
may be why there has been some resistance to stacks. The stack, despite its inef-
ficiency (the admitted fact that it is a single bottleneck from the standpoint of
multiple parallel threads) represents a problem that has been modularized.
The lesson of data typing is that sometimes your need in C for a void pointer
(a pointer that points to untyped data) or in Visual Basic for a variant represents
poor design. What you really need is code that, if it compiles, will probably work.
If all your variables are of the precise type needed by the solution, you shorten
the time between a clean compile and a correct result.
Class Libraries
The final component of the .NET Framework is the base class libraries. You
can think of these as similar to the operating system DLLs prior to .NET, but they
provide access to all of the basics that you need:
23
Chapter2
• Data access
• Security
• XM:LISOAP
• File 110
• Debugging
• Threading
• User interface
All of this comes together to create the .NET Framework and to change how
your programs work. In VB.Classic, you might create an executable that uses the
VB* .DIl.. to run or VC++ to create a stand-alone executable. In the .NET Framework,
you create a binary file that sits and waits until you run it. It's the same binary file,
regardless of whether it was written in Visual Basic .NET, C#, J#, Cobol.NET or Eiffel
.NET. And although they may have a OIl.. or an EXE extension, they look nothing
like the pre-.NET files of the same type. Instead, they contain a quiet revolution. In
the next section, you'll see what I mean by this.
24
A Brief Introduction to the .NET Framework
The quick response is that .NET does the extra work once and not continu-
ously. Its design lends itself to "just-in-time" processes. Missing is the
redundant and behind-the-scenes work traditional interpreters (including the
Visual Basic 6 interpreter) had to do continuously while an application pro-
gram was running.
There is a performance penalty in both .NET and the Java environment,
although it is not nearly as great as that of traditional interpretation. In return,
we get code that can run as a Web service (allowing authorized remote users
and remote software to use code on any platform). Additionally, DIL hell, as
we know it, goes away.
Summary
This chapter explained how modular libraries of code were necessary for reusable
code, but also created problems. Those problems motivated, and indeed explain,
the features of the .NET Framework. My experience as a programmer and mentor
is that the deepest appreciation of a feature comes through awareness of its
absence.
As Andrew Troelsen implies, the Framework is a major change, which deval-
ues deep hacker knowledge of the specifics of things like variants and the details
of the Visual Basic 6 runtime. The Framework is actually an artifact of computer
science. It uses tested but advanced techniques to enable us to write once, run
anywhere.
As Marx (Karl, not Groucho) saw, we make progress by means of "creative
destruction," and this is why we can't get comfortable with the deficiencies of
Visual Basic 6. Marx said, "All that is solid melts into air," which means that job
you had in COM has gone with the wind.
My advice, harsh as it may seem, is this: suck it up. Learning new stuff is
a great way to stay young.
We're now ready, in Chapter 3, to address another computer science artifact:
a simple front -end compiler, as a flyover of terrain we must walk over, starting in
Chapter 4.
Challenge Exercise
What are the 11 commandments for DLLs? Can you suggest any additional
commandments?
25
ChapteT2
Resources
The following are some resources for more in-depth information about the
Framework.
Visual Basic .NET and the .NET Platform: An Advanced Guide, by Andrew
Troelsen (Apress, 2001). This book gives a good overview of the .NET
platform as it relates to Visual Basic .NET. Consider reading it and work-
ing through Andrew's code.
26
CHAPTER 3
A Compiler
Flyover
For insofar as we understand, we can want nothing except what is necessary,
nor absolutely be satisfied with anything except what is true.
-Baruch Spinoza, Ethics, OfHuman Freedom
BEFORE I VISIT A NEW TOWN on business, I often use Microsoft Flight Simulator to
buzz the city to get a general idea of the lay of the land. This chapter will consti-
tute a "flyover" of compiler theory and the hands-on application of the theory.
You will learn about the phases of a compiler and the three formal approaches
used by compiler developers to complete those phases: regular expressions,
Backus-Naur Form, and Reverse Polish Notation with stacks. To "keep it real,"
you will see how these approaches are applied in working code, the integerCalc
application. Understanding integerCalc is an excellent preparation for under-
standing the more complex quickBasicEngine described in Chapters 5 through 8.
• Scanning: Divide the source code into tokens above the level of a charac-
ter but below the level of statements and expressions.
• Parsing: Use the tokens developed in the first phase to parse identifiers,
expressions, statements, methods, and so on. This phase should create
either object code or an intermediate language.
27
Chapter 3
These three compiler phases are conceptual. Early compilers were often
constructed of separate programs, each of which executed a separate pass. In
these passes, the compiler would read the entire source text to create a table or
file of tokens. Then the compiler would read the token file or table and parse the
tokens to generate object code.
Although the quickBasicEngine compiler we will build in later chapters uses the
serial approach, it is not necessarily the best approach. For a production-quality
compiler, in place of each pass, you can use object-oriented design to fullyencap-
sulate the result of any phase.
Our goal, in writing a compiler, is not to pass over the source code for the sake
of it. Instead, the phase one goal can be an object, a "scanner server," which keeps
track of its current state and can be called to get the next token. The state of the
scanner server can be an input file and a position in the input file.
However, for now, you can think of each phase as reading a large file. In the
scanning phase, this is raw source code (in some instances, preprocessed, per-
haps by the C++ preprocessor or the more limited pound-sign statements found
in Visual Basic). In the parsing phase, this is a token file. In the optimization
phase, this is a file of object code.
As a final phase, many compilers (such as Visual Basic releases 1 through 6)
include an interpreter, which executes the compiled code in an intermediate
form. Although interpreters can be slow, their interactive nature and ability to
modify code and values on the fly make them useful for quick results and offer
easy debugging.
NOTE The .NET CLR is not an interpreter. The intermediate code produced by
.NET compilers is further compiled into native code, as noted in Chapter 2. by
a separate step called lIT compilation.
• Regular expressions are associated with phase one of a compiler: the scan-
ning (lexical analysis) of source code.
• Backus-Naur Form (BNF) is associated with parsing and the initial object
code creation.
• Reverse Polish Notation (RPN) helps with a compiler'S phase three, creat-
ing, without making hardware, a computer on which to run your code.
28
A Compiler Flyover
To keep it real, let's take a look at a very small compiler, integerCalc, which uses
the theories in code to do simple integer calculations. The source code and object
code for integerCalc are available from the Downloads section of the Apress Web
site (http://WtM.apress.com). You will find the code in the egnsf/apress/integerCalc
folder.
Open integerCalc/bin/integerCalc.exe and run it. Type a math expression,
calculating with integers only, and click the Evaluate button to see its calculated
value. Your screen will look something like Figure 3-1.
I (256-1 )/3+«23-5)+45)*8
589
More Close
NOTE If you check integerCalc ~ work with your Windows calculator, you will
find that it works with integers only. For the expression shown in Figure 3-1,
your Windows calculator would return 36.8235....
29
Chapter 3
words together, such as dangling constructs, passive voice, and run-on sentences.
The grammar checker must use the spell checker's ability to form individual words.
Job one in a good compiler, or grammar checker for that matter, is lexical
analysis. The input of lexical analysis consists of the raw stream of characters,
including newline characters. The output consists of a stream of tokens. Each
token is a small data structure, indicating the start, length, end, type, and value
of a meaningful "word" in the text or programming language.
Consider, for example, the Visual Basic statements in Figure 3-2, over which
I have placed column numbers.
• One blank white space character starts at column 14. (White space is actu-
ally shaded as gray space.)!
• From the point of view oflexical analysis, Mod is an identifier (that later will
be classified by the parser as an operator). It starts at column 26 and is
three characters.
1. White space is a C language word that refers to the characters from 0 to blank (ASCII 32).
30
A Compiler Flyover
You can watch the integerCalc application tokenize its source code by run-
ning integerCalc.exe and clicking its More button to see an additional display, as
shown in Figure 3-3. Enter an expression and click Evaluate to scan, parse, and
evaluate the expression. The expanded display identifies (by type, start index, and
length) the position of each token in the source code on its left side.
loom . ,
leftParenthesis at 1-1
number at 2-4
operator at 5-5
nur:lber ae 13-::',g
IOQ•• ral:or at. :'5-150 number at 6-6
nUltber al: 6- 6 rightParenthesis at 7-7
r1qhl:Parenl:heS1S a~
op~ra~or a~ le-le operator at 8-8
nUltber al: ~9 - 20 number at 9-9
r1qhcParenches1s a~ 21-2:
at 22 - 22
operator at 10-10
leftParent~esis at 11-11
leftParenttlesis at 12-12
number at 13-14
operator at 15-15
number at 16-16
rightParenthesis at 17-17
operator at 18-18
number at 19-20
rightParenthesis at 21-21
operator at 22-22
number at 23-23
Figure 3-3. The More display of in tegerCalc shows how the expression is tokenized.
2. Wmdows newlines consist of the carriage return (hex DJ followed by a newline (hex AJ. On
the Internet. this would be a single newline character.
31
Chapter 3
• Brute-force coding
3. Don't inflict regular expressions on end users, but keep them bandy for a fully precise specifi-
cation, and use them in code when you are confident that your maintenance progranuners
will be comfortable with their use.
32
A Compiler Flyover
4. You have seen a form of this in file identifiers such as a* •txt, which defines a "languagen con-
sisting of every file beginning with the letter a and ending with the extension .txt. However,
MS-DOS never really supported regular expressions, aside from a limited form apparently
based on a misunderstanding of the Unix grep command.
33
Chapter 3
~ reLab p;
Tes l Save Sellings Reston> SeI1J11gs I About Test tr'le comman regu Sf eXpiesslOM Close
Figure 3-4. The relab.exe program provides a tool for evaluating a set of prewritten regular
expressions, entering your own regular expressions, and converting them to Visual Basic
declarations.
This tool provides several canned regular expressions for your use; it docu-
ments the most popular symbols; it allows you to create, document, and save
Americas Funniest Regular Expressions; and it can convert regular expressions
with special characters (such as newlines) into readable Visual Basic expressions.
Most important, it allows you to test regular expressions, which prevents bugs in
the code where you use them.
34
A Compiler Flyover
\+1\-1\*1\/1\(1\)1[0-9]+
35
Chapter 3
Parsing is usually based on the representation of the language in BNE The BNF
of a language is a series of productions, of the form nonterminal : = graflf1lQrSymbols.
In each production, non terminal is normally a single word that identifies a grammar
category, such as "noun" in a grammar for English or "statement" in a grammar for
Visual Basic.
Grammar Categories
There are two types of grammar categories: nonterminals and terminals.
Nonterminals are like sentences in the grammar for English, because a gram-
mar for English must provide productions that explain what a sentence is
(such as subject verb object.) Terminals need no further explanation, and they
are detected not by the parser, but by the lexical analyzer. In English, they
would be words. In Visual Basic, they would be identifiers or numbers.
In a massively oversimplified grammar for English, we might have the fol-
lowing productions:
In this grammar, the nonterminals are sentence, noun, and verb. The termi-
nals are John, Mary, likes, and sees. The power ofBNF is that this oversimplified
grammar for a Dick-and-Jane level of English nonetheless allows us to parse
a number of different sentences, such as "John sees" and "Mary likes John."
In a grammar for Visual Basic, the nonterminal statement might look like this.
statement := assignmentStatement
assignmentStatement := IValue = expression
Note that IValue is either ~ simple variable or a reference to an array in Visual Basic.
I call it an IValue as shorthand for location value. This is because in Visual Basic, the
left side of an assignment must refer to a storage location.
BNF Tools
Unlike for regular expressions, no object is shipped with .NET to transform the
BNF to code or interpret the BNE However, such software exists for a variety of
platforms. The oldest example is the Unix program yacc. The problem with these
36
A Compiler Flyover
tools is that their effective use demands that you have written some parser code
by hand to get a feel for its complexities.
ManyVisual Basic .NET authors encourage you to create .vb files with a sim-
ple text editor outside the GUI to get a feel for the way in which forms and classes
are constructed. Similarly, I recommend that you write a parser that implements
the BNF to understand how to construct a BNF for a language.
BNF Design
Take another look at the source code in forml.VB for integerCalc, in the Parser
region. The parser is a series of recognizers for grammar categories of a simple
math grammar. Here is that complete grammar:
The first line declares our goal, which is to parse an expression, such as 1+2-3*4.
It divides the expression in this example between 1 (the add Factor) and the right
side (the expressionRHS), which is +2-3*4.
An add Factor is anything that can be a part of an addition operation. The
expressionRHS is anything that starts with a plus or a minus sign and appears on
the right-hand side (RHS) of an expression.
Now consider this BNF line:
The brackets mean that an add factor can be, but does not have to be, termi-
nated by an "add factor right-hand side." But let's set that aside for now, because
it looks like 1 does not have aright-hand side (the plus sign after 1 starts an
expressionRHS).
And, if you look at the BNF rule that defines term, you can see that a term is
an integer or an expression in parentheses.
1 is an integer, therefore, we have a term, which is also (going back up the tree)
an add factor.
Let's return to the top of the BNE Since we have found an integer, which is
a term and which is a full-bodied add Factor, we can move to the right in the
expression.
Is the string "+2-3*4" an expressionRHS? It does begin with an addOp, because
a plus sign is an addOp (see the rule addOp : = + 1-). The addOp seems to be followed by
an add Factor, the number 2. Therefore, it looks like it starts with the expressionRHS +2.
We then see another call for an optional expressionRHS in brackets inside the
rule for expressionRHS, which means that any expressionRHS can embed one or
more smaller expressionRHS instances at their end.
So, we look for the addOp, which must start any expressionRHS according to
our rules, and find a minus sign. We find in the definition for addOp that a minus
38
A Compiler Flyover
sign, like a plus sign, is considered an addition operation.5 Then we look for the
smallest add factor and find the number 3.
However, take a look at the definition of add Factor. In the expression, 3 is not
followed by a plus sign or minus sign; it is followed by an asterisk. In the definition
of an add Factor, a term (such as the integer 3) may be followed by a different type of
RHS construct. This is the addFactorRHS, the right side of an add factor, which may
start with a multiplication or division symbol (see the rule mulOp : = *I/). Since 3 is
followed by an asterisk and the integer 4, the 4 constitutes an addFactorRHS, and 3*4
is the addFactor that follows the plus sign.
In applying the BNE we are in effect making a tree of strings with longer and
more comprehensive strings at the top, and smaller and less comprehensive strings
at the bottom. Figure 3-5 illustrates this tree of strings.
Expression 1+2-3*4
expressionRHS -2-3*4
Figure 3-5. The tree and the outline representation of the parse of 1+2-3*4
Figure 3-5 also shows that the tree can be alternatively represented in outline
form. This outline view of integer expressions is provided when you click the More
button on the integerCalc screen, in the Parse Outline box, and this box can be
zoomed (see Figure 3-6, later in this chapter).
The application of BNF described here may seem a little imprecise. This is
because BNE in general, does not show you how to write code. The BNF must be
5. It makes sense, when you think about it, to treat subtraction as syntactically like addition. As
you know, in common programming languages, subtraction and addition have the same
precedence and are evaluated left to right.
39
Chapter 3
designed with care. The problem is that more than one BNF specification for
a language can be valid. Some BNFs make it easy to parse; other mathematically
valid BNFs create parsers with loops and ambiguity.
The ideal approach is to find a BNF specification for a language you would
like to convert to CLR code. If you must design a BNF specification, it is wise to
keep it simple. Try to find nonterminals that start with a small number of possible
symbols, as do both expressionRHS and addFactorRHS in the example shown here.
Find nonterminals that reduce to a single, smaller nonterminal, with an optional
trailer, such as expression and factor.
Another key concept in the BNF is the way in which we support parenthe-
ses. It does leave one hole in the BNF considered as a design for coding, which
we do need to address. Parentheses, fortunately, are parsed at only one point.
To support nested expressions such as «(1+1)*3)/4, we declare, in the last pro-
duction term : = INTEGER I ( expression ), that a term can be either an integer
or a complete expression surrounded in parentheses. While this nicely plugs in
recursion to any level, it does have a flaw: the right parenthesis inside the pro-
duction cannot be confused with the right parenthesis inside the expression. In
term : = INTEGER I ( expression ), expression can contain right parentheses.
We need to balance the parentheses to the left of the expression. We handle
this with straightforward code (see the procedure findRightParenthesis). For
example, in «2+1)*3)-5, the leftmost parenthesis is not balanced by the first right
parenthesis, but by the second right parenthesis. Now, in a production compiler
it may not be feasible to do this, because it would involve a lookahead, which
would cause multiple reads to input code. This can be handled, however, by using
a buffer or cache, and in this example, this consideration is not important.
So, you've seen that designing a usable BNF can be a little bit tricky, but you
can use the technique described here to avoid headaches. The key is to write some
sample parsers, as you'll see in the next section.
40
A Compiler Flyover
Since an expression must start with an add Factor, if expression fails to parse
an add Factor in its first step, it returns False. But it then can call expressionRHS,
which returns a Boolean value, as a subroutine and ignore the result. As you can
see in the BNF rule for this procedure, we do not care if the RHS expression fails
to appear; it is "gravy."
Suppose the expression in this example is passed something simple like 1+1.
intIndex is 1; usrScanned contains the three tokens 1, plus sign, and 1; usrRPN is
41
Chapter 3
empty; and intEndlndex is 4. The first line of code checks for an add Factor at
intlndex (at 1) and increments intIndex past the largest add Factor it finds at
index 1, which is the number 1. It cannot go any further because an add Factor is
either a term or a term followed by an addFactorRHS, which starts with a multipli-
cation operator, and a multiplication operator is not found. Therefore, add Factor
returns to the second line, which calls expressionRHS. This is successful at finding
the expression RHS +1, and as a result, it increments intlndex past the end of the
expression RHS, setting it to 4, the end of the expression.
Two lines of code do all this work (take a look at the source), because they
both call routines that call other routines in a rather deep nest.
An obvious question at this point would be, "Well, there might not be an
expressionRHS after the add Factor, but what if there is garbage and line noise?"
Take a look in the source, above the expression method, and at the method
parseExpression. It contains a check of the index (named intlndexl here) passed
by reference among the compiler procedures when the top-level recognizer
expression claims to be done. We report an error if that index is less than the
length (token count) of the source code.
Read the remaining source code to confirm that this code works. Also, note
the support routines, not only findRightParenthesis, described already, but also
checkToken and genCode.
checkToken is our workhorse interface to the scanner data structure. It is
overloaded because it has two jobs. The first is to check for any integer and
return its value by reference to the caller, and the second is to check for a speci-
fied string operator or a string parenthesis. checkToken enforces an important
rule, where a grammar category, like the expression, is something capable of
appearing on the left side of an expression:
This is because a grammar category always ends with some specific token.
Once checkToken has confirmed that the expected token (any integer, a specified
operator, or a parenthesis) occurs at the current position of usrScanned as
indexed by intlndex, it can increment intlndex. This enforces the general rule.
In the much larger parser for the quickBasicEngine compiler we will build in
later chapters, this rule takes on added importance, because we want to report
precisely error and other information that pinpoints offending, or merely inter-
esting, source code.
Therefore, here and in quickBasicEngine, a single interface is always used to
the scanner data structure, and all code makes a blood oath never to look at the
scanner data structure without going through checkToken.
42
A Compiler Flyover
Finally, genCode generates the code. For now, think of it as a black box,
because understanding genCode requires us to move on to our third and final the-
ory, in the next section.
But first take a break from slogging through the theory and run integerCalc
again. Click More and evaluate a complex expression. Look at the list box in the
middle of Figure 3-6. It presents the parser results as the list of nonterminals
parsed, which, as shown earlier in Figure 3-5, is just another way of represent-
ing a tree.
addOp n_w
addFac~or "26"
'term U26"
m·.llOp "In at 14- 4
exp=e~!I~on 1.n parentheses n ( ( ....
Figure 3-6. The list box shows the nonterminals as a nested outline, because unlike
scanning (which scans for nonoverlapping tokens), parsing looks for nonterminals
and terminals that nest and overlap each other.
43
Chapter 3
The list box is an indented outline because, unlike the tokens produced by
the scanner, the grammar symbols nest. Everything in our simple language is an
expression. Expressions consist of addOp, add Factor, expressionRHS, and other ele-
ments. Therefore, a tree-like display is best.
RPN Construction
Most logicians and mathematicians use, instinctively, the language that is com-
piled by integerCalc, where operations appear between operands. However, in
Poland before World War II (a period in which that country enjoyed brief free-
dom from Russians and Germans, and flourished as a result, as it has flourished
since the end of communism), a group of logicians discovered a more elegant
notation, which is named in their honor. 6 In this notation, you just write the
operands before their operator. This means that 1+1 becomes 1,1,+, where the
comma separates the operands.
You never need parentheses! 3*(4+1) is 3 (write operand), 4 (go to the next
operand, just noting the presence of multiplication), 1 (go to next, again remem-
ber add), + (the add is next, obviously since 4 and 1 use it), and * (we're finished).
Early computer designers realized that evaluating Polish expressions is much
simpler than evaluating non-Polish, or infix, expressions. Infix expressions (unless
you use a careful BNF structure, as designed in the previous section) can result in
complex code, which needs to move back and forth in source to finish its job.
Suppose you have a table, accessible only on one end by means of only two
operations: push will add an entry to the top of the table, and pop will remove the
6. One reason guys like Jan Lukasiewicz (the inventor of Polish notation) and Alfred Tarski (the
leading Polish logician before WWII) should be honored is that some of them were interned
and suffered during the war, along with other smart people, who totalitarian governments
don't like. Nazis appear to have been, among other things, guys who flunked math.
44
A Compiler Flyover
most recently pushed entry. 7 Perhaps surprisingly, this simple gimmick handles
parentheses logic well, because parentheses in infix expressions essentially pri-
oritize the contained operators, making operators of lower precedence wait until
the parenthesized operators complete.
Take a look at genCode, in the Parser region of integerCalc. This builds an
RPN expression in the usrRPN data structure. It is passed an enumerator of type
ENUoperator, which can have the values add, subtract, multiply, divide, and push.
It is also passed an operand that is zero for all operators, except push.
When the parser recognizes a term that is an integer, in the method named
term, it calls genCode to append a push operator and the integer value of the term
to the end of the usrRPN data structure. usrRPN is not a stack, but rather an array
that represents the RPN of the expression to be evaluated.
When the parser recognizes the right side of an arithmetic operation, the
operands have already been "pushed" by the lower-level methods that recognize
the nonterminals in the operation. So, all the parser needs to do at recognition
time is call genCode to append the right operator (with a dummy and unused
operand) to the end of usrRPN.
Stack Use
Now, take a look in the Interpreter region of integerCalc, at the interpret Expression
method. It declares a stack as a local variable of type Stack, and then uses a very
simple For loop to index left to right through the usrRPN data structure.
When the interpreter "sees" an operator of type Push, it uses the push method
of the stack object to place the value on the stack. It does so in the pushStack
method using a Try .. Catch block, because at this point, we are leaving our code
and asking a system facility to accomplish something for us. We need to make
sure the external facility succeeds.
In the code for this book, I will always use this rule:
• When using a facility whose code is outside your code, check its result.
Just because you're paranoid doesn't mean you aren't being followed.
When interpretExpression sees an opcode,8 interpret Expression must "pop"
the two operands and, using the facilities of Visual Basic .NET, perform the oper-
ation. The only complexities here are that we need to check for stack underflow,
7. Many ViSUal Basic .NET and C# developers will be familiar with the Stack collection now
available, but it was quite simple to make stack-like arrays in older Visual Basic versions, or to
use collections as stacks.
8. This opcode is known as a zero-address opcode in some contexts, because the opcode, unlike
a typical Pentium opcode, does not need to find anything in memory. Its operands are
already in the stack.
45
Chapter 3
and we need to properly order the operands. For example, stack underflow
occurs in the (invalid) sequence of commands push 1, add, which doesn't spec-
ify what to add 1 to.
If the parser phase has no bugs, it will always generate a correct RPN expres-
sion in usrRPN, and no valid expression will cause stack underflow (1,+ is invalid).
However, we are paranoid and being followed, as noted earlier, so another rule is
as follows:
TIP If the display is too fast, you can check the box labeled Replay, click
Evaluate again, and then use the Step and Back buttons to review the steps.
46
A Compiler Flyover
Figure 3-7. The RPN box of the More display shows how the parsed input
expression has been converted to RPN.
47
Chapter 3
Summary
This chapter has introduced three core theories that you can apply to developing
compilers for the CLR: scanning and regular expressions, parsing and BNF, and
interpreters and RPN.
We've had an almost complete flyover of the entire process of crafting a small
compiler. Although this compiler is of limited practical use, it could be expanded
to parse and evaluate business rules.
At this point, we haven't addressed compiling to the CLR, because witness-
ing the internals of a very small runtime is excellent preparation for Chapter 9,
where we send some object code for Quick8asic to the CLR.
The next chapter describes the front-end scanning and parsing of the com-
plete quickBasicEngine example, which scale up from the methods demonstrated
in this chapter.
Challenge Exercise
As supplied, integerCalc calculates only with integers. In order to understand
how it works, consider updating it to calculate with real numbers.
48
A Compiler Flyover
Your most important task will be to change the lexical analyzer of integerCalc
to scan real numbers. This is the method named scan Expression in the code.
For best results, construct a regular expression that correctly scans all real
numbers, where a real number consists of the following parts:
• Then the value of the e exponent, which you can think of as the number
of positions an implied decimal point, located to the left of the leftmost
nonzero decimal digit at the beginning of the floating-point number,
should move. By default, and if the letter e is followed by a plus sign, the
implied decimal point moves right. If the letter e is followed by a minus
sign, the implied decimal point moves left.
Since the lexical analyzer represents the source code in the USRscanner data
structure, and since scanned values are represented in an object, you probably
do not need to alter anything other than comments in the regions named Scanner
and Parser. The interpretExpression procedure needs some modification in the
way it performs arithmetic, since, as delivered, it uses integers.
When you complete this challenge, you will have experienced modifying
a simple front-end compiler in the .NET platform, and you will have a full calcu-
lator for numeric expressions.
Resources
For more information about compiler theory, refer to the book Compilers:
Principles, Techniques and Tools, by Alfred Aho, Ravi Sethi, and Jeffery Ullman
(Addison-Wesley, 1985). This is the famous "dragon" book, which shows the
compiler developer, armed with theory, defeating the dragon of complexity.
While academic in its tone, it does constitute an excellent reference. In partic-
ular, it contains information on the use of lexx and yacc.
49
CHAPTER 4
THE LATE, HERO computer scientist was just wrong about Basic. Dijkstra's com-
ment is academic sociology at its worst. It creates the illusion that programming
skill derives from the use of politically correct platforms and languages. l Dijkstra
was wrong because Visual Basic is Turing-complete, and it has a formal and sen-
sible syntax. Visual Basic is Thring-complete because you can use it to write any
program, as long as you disregard resource consumption.
I would revise Professor Dijkstra's aphorism. The use of Basic or Cobol as
representative of a good programming language in and of itself cripples the
mind and rots the teeth because Cobol and Visual Basic preserved (until Visual
Basic .NET) some standards and practices created in the Fortran era, which sim-
ply did not allow for effective problem breakdown. This was less a scientific fact
and more a result of a management illusion that programmers should merely
code specifications provided by the "real" experts, and not factor the problem
into subroutines, functions, and objects. Much sloppy programming results from
this false view of the field and the low self-esteem it creates in programmers, who
believe that a disreputable language permits mindless coding. In actuality, the
1. In an interview, Peter Neumann, long the hard-working modemtor of the comp.risks news-
group, told me that Dijkstm struggled with depression most of his life. Many bright people
are depressed because they are powerless to stop other people from making mistakes. Dijkstm,
unlike many successful corporate MIS types, never restrained himself from speaking his
mind. His attempts at constructive criticism sometimes bothered people who had heavily
invested in a paradigm Dijkstra did not like. Paradoxically, all who knew Dijkstra personally
said he was easy to get along with.
51
Chapter 4
52
The Syntax for the QuickBasic Compiler
The best way to understand the examples in this chapter is to run bnfAnalyzer
while you're reading it. As long as you have the Visual Basic 6 runtime on your
machine, you'll be able to run the program. To compile the source provided at
the Apress Web site, you'll need Visual Basic 6 Enterprise or Professional, or if
you have the Learning edition, you can organize the two projects shipped into
a single project.
bnfAnalyzer reads a text file containing the BNF grammar of a language. It
analyzes the BNF specification, finding many errors that would prevent you
from using the specification to write, or automatically generate, a compiler or
that might cause serious bugs. This tool produces a language reference manual,
which includes the following:
• A list of the nonterminals, which are the categories of the language (such
as expression in QuickBasic) that need to be defined in terms of smaller
sequences
• A list of the terminals, which are the categories of the language (such as
stringin QuickBasic) that do not need to be broken down further
You can use this tool to analyze your own .NET language. bnfAnalyzer uses
many of the compiler techniques discussed in Chapters 5 through 8 of this book.
A final section of this chapter, "bnfAnalyzer Technical Notes," will discuss this topic,
but I suggest you read that section after you've studied Chapters 5 through 8.
53
Chapter 4
BNF Syntax
Figure 4-1 shows the syntax of BNR It is available in the fIle named bnfAnalyzer
test 6 (BNF of BNF) .txt that comes along with the downloaded code for the
analyzer.
bnfGrammar : = production +
production . - [ nonTerminal ":=" productionRHS
(NEWLINE I EOF)
production : = NEWLINE ' Allows for empty lines
nonTerminal : = IDENTIFIER
productionRHS := sequence Factor [ sequenceFactor
sequenceFactor : = mockRegularExpression
[ alternationFactorRHS ]
mockRegularExpression : = mreFactor [ mrePostfix ]
mreFactor : = nonTerminal I
UPPERCASESTRING I
STRING I
"(" productionRHS ")"
"[" productionRHS "J"
mrePostfix : = "*" I "-"
alternationFactorRHS ,- " I " mockRegularExpression
[ alternationFactorRHS ]
Now, this is confusing. Figure 4-1 shows the syntax of BNF, a formal syntax
for programming languages, although I just told you that BNF is not a program-
ming language. To make things worse, Iam presenting the rules of BNF in BNF,
as if you knew BNF all along or in a former life.
54
The Syntax for the Quic/cBasic Compiler
BNF isn't a programming language, but all programming languages are formal
languages (but not the reverse). The syntax of all formal languages, by definition,
can be specified in a formal notation like BNE
Let's walk through some of the rules to get a feel for the use of BNE The first
line says, when read properly from left to right, that a bnfGrammar is one or more
productions, where a production specifies possible components of a grammar
category. (The plus sign in production + means one or more repetitions, just as
it does in regular expressions.)
Okay, cool. What's a production?
Glad you asked. Go to the second line. A production is normally a nontermi-
nal, followed by a colon and an equal sign (:=), followed by a production on the
right-hand side (RHS), followed by either a newline character or end of file (BOP).
Note that the nonterminal, :=, and RHS are optional, which means that blank
lines are allowed.
A nonterminal is an identifier as seen in Visual Basic. An RHS is more com-
plex. It is a sequence factor, perhaps followed by another sequence factor.
Okay, what's a sequence factor?
A sequence factor is a mock regular expression, followed by an alternation fac-
tor RHS. A mock regular expression (so-called because it isn't a full-scale regular
expression) is a simplified regular expression that consists of a mock regular expres-
sion factor (mreFactor) followed by a mock regular expression postfix (mrePostfix).
An mreFactor can be one of several things: a nonterminal, a completely uppercase
string (which, in our language, represents a terminal symbolically), a quoted string
using Visual Basic conventions, a parenthesized production RHS, or a left bracket.
As this example shows, you read BNF by following branches of a tree (and if
you examine the code of bnfAnalyzer, you'll see that it represents the source BNF
in a tree data structure). Each branch is less an instruction than a timeless law,
which is always true no matter if the cows come home or not, as we say on the
farm. The comfort of this is that the rules never change.
But let's step back a bit.
Like a programming language, BNF has operators. These include the alterna-
tion stroke I and the mock (as in not complete) support for the regular expression
operators asterisk (*) and plus (+). Oddly, when white space occurs on the RHS of
a production, between two grammar categories, it is an operator that specifies
that the material on its left is followed by the material on its right. Also, a group-
ing of square brackets is an operator, which specifies that the material it contains
is optional.
As in a programming language, these operators have precedence. The
sequence operator (consisting of blanks or white space) has lowest precedence,
followed by the alternation stroke, and then the mock regular expression opera-
tors. However, parentheses can be used to group operations and change this
precedence. For example, if in your language, an a nonterminal consists of a b or
the sequence c d, the production is a := b I ( cd). The square brackets change
precedence in the same way, while also specifying that the bracketed material is
55
Chapter 4
Continued Hnes and new Hnes: Lines that contain individual produc-
tions (definitions) may be continued, simply by making sure that the
first character of the continuation line is a blank. A newline suitable for
the environment in which the analyzer is run (carriage return and line-
feed on Wmdows; linefeed on the Web) is a "real" newline only when it
is followed by a nonblank character or end of file. Suppose a Wmdows
newline is the NEWLINE terminal of your language. In your lexical ana-
lyzer, this would correspond to a small routine that checks for the
proper newline at the current position and advances a scan pointer.
Identlflers: Identifiers follow Visual Basic 6 conventions: starting with
a letter, they should contain letters, digits, and the underscore exclu-
sively. There is no limit to the length of identifiers, except common
sense. However, unlike Visual Basic identifiers, identifiers are com-
pletely case-sensitive, as in the case of C++. The case of the first letter
of the identifier shows its type.
56
The Syntax for the QuickBasic Compiler
A Grammar Test
We'll now test (if you are following along with downloaded software) the
grammar for BNF using the bnfAnalyzer program. Open the bnfAnalyzer.vbp
file using Visual Basic 6 Enterprise or Professional and compile this project,
producing VBPanalyzer.exe. Run bnfAnalyzer.exe.
The first screen presented when you run bnfAnalyzer will be a general
announcement screen, as shown in Figure 4-2. Most of the software in this
book will include these "about" screens, which appear the first time the soft-
ware is run. Subsequently, they are available using a button and/or a menu
item labeled About.
57
Chapter 4
f~ana1yzer.displayAbout
This fo",", and application parses files containing IINl' and it pzpdDces an analysis of the BNI'
definition, including ... list of teaoinal symbols, .. list of nodto.z:ainals, and at least the
start of a reference manual for the language defined by the 1INl".
~rd G. Nilges
[email protected]
ttp://members.sereenz.com/BdNilges
Continue
Then you will see the main screen of bnfAnalyzer. In the list of directories
on the left side of the main screen, find and double-click Test Files, as shown in
Figure 4-3 .
P Progress reports
Nontermrnals Terminals
BNFenlllyzer test 0 (zero length file) W ...
Perse Stetus
BNFenlllyzer test 1 (bl~k file) W
BNF~lIlyzerte. laW r Noreport
BNF~lIlyzerte.t 11 W
BNF~lIlyzer tes 2 (non-null comment) t
BNF~lIlyzertesl3 (1 produClJon).W
BNF~lIlyzertesl3e (1 production).tId
r Complete
report
BNFen ertest 3b 1 roduction tid
aose
In the list of files in the lower-left comer, find and select the file named BNF
analyzer test 6 (BNF ofBNF).txt. Click the Create Reference Manual button. After
58
The Syntax for the QuickBasic Compiler
a sequence of progress reports, you'll see the Reference Manual Options form.
Enter BNF as the language name, as shown Figure 4-4 .
~ r de dS dn It bu e to production t gs
Syntox Reference
This form allows you to tailor the reference manual. bnfAnalyzer supports
two formats for the reference manual. The default text format uses the mono-
spaced Courier New font to format the manual. An option is also provided to
create an XML reference manual.
Click the button labeled Close (and create manual) to see the reference
manual, as shown in Figure 4-5. This report can be selected, copied, and pasted
into a Notepad or Word file.
59
Chapter 4
fmoBNPanalyzer. ml<ReferenceHanual
Where Used
Figure 4-5. Start of the language reference manual produced l7y the BNF analyzer
Nonterminals
Scroll down the reference manual screen to examine the nonterminal symbols,
as shown in Figure 4-6. These are the grammar categories ofBNF that have fur-
ther expansions.
60
The Syntax for the QuickBasic Compiler
x
NON'I'ERMINAL S»IBOI.S
I
Nonter:aina.l Where Used
NOTE If a nonterminal has null in the Where Used list, it is a start symbol.
The reverse is not true. If your language defines the major construct recur-
sivelyas smaller instances of itself, the start symbol column will be blank. For
this reason, I suggest you do not define the start symbol recursively.
Terminals
Scroll down further to see the list ofterminals, as shown in Figure 4-7. Terminals
in our BNF are of two types: strings and symbols that at least start with an upper-
case character.
61
Chapter 4
T!UlMINAL SYMBOt.S
J
arel"actor
mrePostfix
mrePo.tfix
production
" [" areF4ctor
"I" mreFactor
"I" a.lternationPactorRHS
EOI' production
IDENTIFIER nonTerminal
IfflWLINE production
STIUNG arePactor
llPPERCASESTlUNG arePactor
Like the nonterminallist, the first column in this list identifies the terminals
in alphabetical order. Note that several strings are terminals and operators of
BNE We've also identified some terminals as symbols.
Because they start with uppercase, the identifiers EOF, IDENTIFIER, NEW-
UNE, STRING, and UPPERCASESTRING are treated as symbols in the grammar
that will be understood by the lexical analyzer. These follow the convention that
symbolic terminals should be in all uppercase characters, for maximum visibility
in a medium or large BNE
These symbolic terminals can be used to express the fact that a terminal that
will be recognized by the lexical analyzer is a potentially infinite number of sym-
bols, such as IDENTIFIER, STRING, or UPPERCASESTRING; or a nonprintable
string not expressible as a Visual Basic string; or a condition that is not a string
at all, such as EOF (at end of file.) In fact, NEWLINE, in the parser for BNF, is
both a string and a condition. When the BNF lexical analyzer (in the procedure
BNFcompile_scanner_findNewline) finds a newline, it then checks to see if the new-
line is followed by a space and an underscore (the continuation indicator), and
if so, it cancels the newline.
The lexical analyzer's mission in life, which we will revisit in the next chapter,
is to make life easy for the parser in any way it can. Here, it does this by replacing
funky terminals by simple symbols.
62
The Syntax for the QuickBasic Compiler
Recall that regular expressions represent the conditions "start of input line or
string" and "end of input line or string" using the characters caret and dollar sign.
In general, conditions and sequences of characters can be usefully considered by
the lexical analyzer as simple characters and abstracted as simple tokens.
Syntax Outline
Scroll down to see the (pardon my French) piece de resistance, or reference man-
ual outline ofBNF (or any other language expressible in valid BNF) , as shown in
Figure 4-8.
Continue
Any valid BNF can be transformed into an outline of the language, suitable
as a basis for a complete reference manual. This overcomes a major, and quite
valid, managerial objection to the very idea of forming our own language for
business rules, by providing accurate documentation of the language.
63
Chapter 4
We need a less proprietary and more flexible way to format the output, so we can
use Visio and other tools to create documents based on our reference manual.
XML is the best choice.
Click Continue on the reference manual outline screen, and then click Create
Reference Manual again on the main screen. This time, the Reference Manual
Options form will appear immediately, since the manual has already been parsed.
Click the XML format radio button. Make sure the first XML format option, labeled
The XML should include the BNF source as a comment, is unchecked. The other
three options-Individual tags should appear on separate lines, Add BNF source
code as an attribute to production tags, and Comment end tags with the associate
name in the start tags-should be checked. These choices will avoid replicating
the BNF source in a leading XML comment, place newlines between XML tags,
and enhance the tags with source BNE Click the button labeled Close (and create
manual) to see the screen shown in Figure 4-9.
64
The Syntax for the QuickBasic Compiler
Continue I
Figure 4-9. The XML reference manual screen
You can format this XML. Using Visual Studio .NET, create a new Windows
application and choose to add a new item. At the prompt, select XML file and
paste in the XML, commencing with the comment tag <!--. Click Browse with,
save the file, and select Internet Explorer. You will see the formatted XML, as
shown in Figure 4-10.
65
Chapter 4
-->
- <BNF>
- <I'lOntenninals>
<bnfGramrnar I>
<production I>
<rlonTenninal/>
<productionRHS I>
<sequenceFactor/>
<mockRegularExpression I>
<altemationFactorRHS I>
<mreFactor I>
<mrePostfix I>
</nonterminals>
- <terminals>
<x_coion_equals_>:= </X_colon_equals_>
<NEWUNE>NEWUNE</NEWLINE>
<EOF>EOF</EOF>
<IDENTIFIER>IDEN1IFIER</IDENTIFIER>
<UPPERCASESTRING >lJPPERCASESTRING </UPPERCASESTRING>
<5TRING>S11llNG</STRING>
<X_leftparenthesis_>t </X_leftparenthesis_>
<X_rightParent:hesis_» </X_rightParenthesis_>
<x_leftBrackeC>[ </X_leftaracket_>
.-Y rinhtAr::lrln.t ~l.-/¥ ';nhtAr.." ...",t ~
Figure 4-10 shows the beginning of the XML reference manual, which lists
the nonterminals and then the terminals of your language. Scroll down to see
the actual BNF productions, as shown in Figure 4-11.
66
The Syntax for the QuickBasic Compiler
- <bnfProductions>
- <GS name="bnfGrammar">
- <OP name="production">
- <OP name="oneTripRepeatn>
<production !>
<lOP>
<!-- ::::nd o:::e7:::iJ;:!<.epeac -->
<lOP>
<lGS>
<!-- End. .bnfGraTI'If'.ar --:>
- <GS name="production">
- <OP name="production">
- <oP name=nsequence">
- <OP name="optiona!Sequence">
- <OP name="sequence">
<nonTerminai />
- <OP name=nsequence">
<x_doubleQuote_colon_equals_doubleQuote_l>
<productionRHS I>
<lOP>
<lOP>
<!-- End .secr.ler.ce -->
<lOP>
<!-- End cpcicna15ecr.;.er.ce -->
- <oP name="alternatives">
<NEWUNE />
<EOF />
<lOP>
<!-- End alcerr.ati'1e3 -->
<lOP>
<!-- End 3ecr.:;.er.ce -- >
Note that each production will start with a GS (grammar symbol) tag and
contain one or more OP (operator) tags. Each GS tag will name the grammar sym-
bol and show its BNF (if the appropriate Reference Manual Options setting is in
effect). Each OP tag will identify the operator. For example, the first OP tag identi-
fies the production operator :=. The ending OP tag will identify the start tag in
a comment if the corresponding Reference Manual Options setting is in effect.
You can use the XML format with a large variety of formatting tools to view the
language reference. The basic text format is more suitable in simpler documents.
In this section, you've learned quite a lot about BNF, in the way in which many
programmers want to learn-by getting your hands dirty. You've learned how to
use the free tools provided with this book to get started with language design.
In the next section, you'll see how the much larger grammar for our
QuickBasic was built.
67
Chapter 4
2. For the same reason a film can't show all the action and lack of action in a person's life,
a metaprogram discussion cannot reproduce, at base, the writing of each line of code, its
debugging, and its modification. Before "extreme programming," this was endured in isola-
tion by the programmer. Today it is, in extreme programming, a sort of MTV Real World or
Survivor show, with the boring parts left in, and fewer hotties overall.
68
The Syntax for the QuickBasic Compiler
3. Microsoft operates a Microsoft Museum in Redmond, and while waiting for my ride after the
2001 author's event, I was able to see the paper tape Bill and Paul created for their Altair Basic
compiler.
69
Chapter 4
This declares the "what" and not the "how." It says in English, '~ immediate
command is a single immediate command, followed by zero, one, or more occur-
rences of a colon, followed by an immediate command." It is a recursive but not
circular definition, because we can tell, by just looking at it, that the immediate
command on the right side of the production is shorter than the immediate com-
mand being defined, by at least one character-the colon.
We then define a single immediate command:
70
The Syntax for the QuickBasic Compiler
I Assignment
assignmentStmt := explicitAssignment I implicitAssiqnment
explicitAssiqnment := Let implicitAssiqnment
implicitAssiqnment := IValue "=" expression
lValue := typedldentifier [ "(" subscriptList II)"
subscriptList := expression [ Comma subscriptList
4. As early as Microsoft's release of QuickBasic in the 1980s, Go To was useless to skilled pro-
grammers except in error handling. This has been only recently fixed in the provision, in
Visual Basic .NET, of Try . . Catch •• End Try error handling.
71
Chapter 4
Expressions
We need to define the precedence of operators from the very low-precedence
operators And and Or to the very high-precedence multiplication, division, and
exponentiation operators, and we need to account for parentheses.
Figure 4-14 shows how we define expressions. There are a lot of "gotchas"
here, so pay attention.
72
The Syntax for the QuickBasic Compiler
, -- - Express~ons
express~on :~ orFaccor ' orOp express~on )
orOp :- Or
orOp := OrElse
orFaccor :- andFaccor andOp orFaccor
andOp :~ And
andOp : = AnclAlso I
andFaccor : - ' oc I nocFaccor
nocFaccor := ~keFaccor 'nocFaccorRHS
nocFaccorRP.S : - L~ke likeFaccor ~nocFaccorRHS
11keFaccor := concacFaccor ' ~keFaccorRHS;
ikeFaccor~~S :- "&" concacFaccor .1~keFaccorRHS
concacFaccor :- relFacCor :concacFaccorRHSI
concacFaccor~~S := re Op relFaccor concacFaccorRHS
relFaccor :- addFaccor relFaccorRHS
relFaccor~~S :- relOp relFaccor • relFaccorRHS
addFaccor := mulFaccor addFaccorRHSj
addFaccorRRS : - muIOp mulFaccor 'addFaccorRAS
mulFaccor := powFaccor .mulFaccor~~Si
mulFaccorRHS : = powOp powFaccor :mulFaccorRHS,
powFaccor := (" ... " "-") cerm
term : - UDSl.gnedNWtber
scr1DO
Value
True
False
funcc10nCall
( express10n
funcc~onCall funcc~onName "I" express 0 LisC ")"
tuncc~oa~ame :- Abs Asc Ce11 ChI I Cos Eval Floor Inc I
I~f Isnumer~c Lbound Lcase Lefc Len Log
Max ~n M1d Replace I R1ghc Rnd Sin I
Sgn Scr~ng Tab I Trim
Ubound Ucase I Uc~l~cy
'ns1gnedNumber:= ( Uns qnedRealNumber Uos1gnedInceger
: num:'ypeChar :
cyped!der.c1f~er :- 1dentif~er : cypeSuff1x .
CypeSUtf1X :- numIypeChar CurrencyS)~ol
r.umIypeG:'lar : = PERCENT lIHPERSAND EXCLAHAIION POtJNDSIGN
~dent1f1er := ~ecter :eccersNumbersUnderscores
sCI1nq := Double~~oce Anych~ngExcepcDoubleQuoce DoubleQuoce
re:~Op "<" ">" "=" I tt<z:d t "'>=" "=" "<>"
addOp := n_" I"_"
mulOp "'''' 1'' ' '' ' '' \ " "Mod"
powOp := ".~n l nA"
It is one thing to define an abstract BNF that validly expresses the set of pos-
sible sentences in a language, but early compiler designers found that it is quite
another to design BNF from which actual debugged code can be generated,
whether by hand or by using a parser generator. As I mentioned in the previous
chapter, you need to design a real-world BNF with care.
73
Chapter 4
Expression Operators
NOTE Remember that looping isn't recursion. Recursion is applying the same
code to a smaller integer or a smaller set of data. Looping is getting stuck so
that no matter what your code does, it re-creates the same state, including the
position in the parse.
5. Some of my students complain I use math too much. This isn't math. It's symbolic logic. That
is even worse. However, symbolic logic, unlike traditional math, does not require an extensive
background to understand. To understand college calculus, you need to have succeeded at
four years of high school math. To understand these formal notations, you need only read
this book, do the examples on your computer, and, like Billy Crystal's Second Gravedigger in
the Kenneth Branagh film of Hamlet, "cudgel thy brains."
74
The Syntax for the QuickBasic Compiler
a subtract factor, followed by the smaller subtract RHS -c-d. This is the best
sequence because it simplifies generating subtraction operators left to right, as
you'll see in Chapter 5.
Were we to parse a-b-c-d as a factor, a minus sign, and an expression, some-
thing unexpected would happen. In the first parse, the a would be the or factor,
and the expression would be b-c-d. We could generate code to get the value of a,
but we could not generate code to subtract! That's because we first need to gen-
erate code to calculate the value of b-c-d. But if we generate code to calculate
b-c-d, the two subtractions to the right of the first subtraction will be generated
before the leftmost subtraction, and this is wrong. The second subtraction will
happen after the third subtraction. Suppose a=l, b=2, c=3, and d=4. Properly
evaluated, a-b-c-d is -8. But if the subtractions are executed right to left by an
incorrect parser, c-d is calculated first, giving -1. This value is then subtracted
from b to give 3, since the value of c-d is negative. This is then subtracted from
1, giving-2!
Similar problems happen in real and integer division. Basically, subtraction
and division are left and not right associative; therefore, for the productions cor-
responding to these, we define the RHS as starting with the operator.
The attractive feature of defining the RHS of a binary operator in this way is
that the production on the right side has a very simple handle, which is the sym-
bol with which it must begin. We only need to look for the symbol to see, moving
from left to right in the source code, whether the entire sequence is present. This
avoids a bane of early compiler developers called backtracking, 6 which is retreat-
ing from right to left in the source text, because the parser has realized that what
it thought it had does not occur. Backtracking is a problem because we want the
parser to be responsible for various tasks, including the generation of object code,
all of which would have to be undone. This gets nasty.
In general, and in parser theory, the handle is the set of terminal symbols
that can occur validly at the start or end of a nonterminal. The left handle is the
set of terminals that can occur at the beginning. The right handle is the set that
can occur at the end.
In producing a pragmatic production, a rule of thumb is to watch for ambi-
guity, in the form of adjacent nonterminals, whose right and left handles are sets
of symbols that intersect. For example, expression orFactor is an ambiguous
sequence. An expression's right handle happens to be the set consisting of all
identifiers, the right parenthesis, and all numbers. An orFactor's left handle is the
set consisting of all identifiers, the left parenthesis, and all numbers. Since these
two sets have a large intersection, there is no easy way of telling where the or
6. Backtracking was a bane of early compiler developers, both on mainframes of the 1950s
and micros of the 1970s, because whenever the input text was on a serial medium such as
magnetic or even audiocassette tape, the medium had to rewind. The rewind had a high
''fwizz" factor in that it was fun to watch but it wasted time.
75
Chapter 4
factor begins! For example, a+l and b might be the expression a and the or factor
1 and b, or it might be the expression a+l and the or factor b. You might object that
we know that and has lower precedence than +, but we cannot use this "knowledge,"
because we're constructing it as the BNF itself.
Compiler generators such as yacc are developed to efficiently determine sets
of handle symbols in order to find out whether the grammar is ambiguous.
Parentheses
Finally, let's look at the way in which we support parentheses. This is a topic we
touched on in Chapter 3.
Look at the definition of a term. A term is the smallest component of an expres-
sion, and typically it is an identifier, string, or number. However, what parentheses
do in a language like Visual Basic is make a simple term out of complex expressions;
therefore, a possible term is left parenthesis, expression, and right parenthesis.
The ambiguity here is that working from this definition alone, an expression
such as (a* (b-c) ) with nested parentheses would be parsed improperly. This is
because a typical implementation in code of a BNF grammar would find the left-
most parenthesis and conform to the BNE because BNE by itself, does not specify
where to stop.
The BNF definition for term shows the alternatives unsignedNumber, string,
LValue, True, False, functionCall, and the parenthesized ( expression ).
If the candidate string for a term starts with a left parenthesis, we know that
the only valid possibility is a parenthesized expression, because, as we can see
by examining the BNE no other candidate starts with a left parenthesis. The left
handle of an unsignedNumber is plus, minus, and the digits. The left handle of
a string is a double quote, and so on. None of these left handle sets include the
left parenthesis.
The problem, as we've seen in the miniparser of Chapter 3, is that we need
to pass a substring to the expression parser. If we pass the entire source program
one character beyond the left parenthesis, it will be rejected as a valid expression
because it will end with unbalanced material, as in a+l) -b.
Just as we did with integerCalc in Chapter 3, in Chapter 5, we will implement
a simple code workaround as a submethod in the term recognizer, and search
ahead for the balanced right parenthesis.
We've reviewed the critical parts of the BNR You've seen that with only one
exception-the use of code as a workaround to balance parentheses (a strategy
that can also be used with languages of the C family)-our version of QuickBasic
can be formally specified. Let's now take the complete BNF and run it through
the analyzer to see what happens.
76
The Syntax for the QuickBasic Compiler
P- Indude syntax
P- Indude lists shOWIng where symbols are used. in addition to their definitions.
10 the syntax
r Include formal Backus-Naur definitions
77
Chapter 4
The output with these options is just too large to fit in a text box. Therefore,
you will see the prompt shown in Figure 4-16. This will allow you to also store
the output in the named text file.
The reference manual may not fit into the display because it OK
is 76495 characters. it may be truncated at the end. ~
Cancel
aick OK to save the reference manual in the file
"C:\egnsf\bnfAnalyzer\bnf.TXT". change the file id to a
preferred value and dick OK or just click Cancel to proceed,
Figure 4-16. Saving the reference manual for QuickBasic as a text file
You will then see the reference manual, commencing with the nonterminals,
as shown in Figure 4-17. Note that to obtain the properly formatted effect, you
will need to copy the text from the (pinkJ) dialog box and paste it into Notepad,
because it wraps in the dialog box.
.
~_.~~.*_ .•• ~~~w ••• *~
.
* •• **~~ ** •• ~ ••• *.* •• w. . . . w•• *.w*~.*.w.w.~.~
• REF ERE NeE MAN CAL F qR THE QUI C R ·
BAS I C LAN G U AGE
NONTERKlNAL SYMBOLS
addFac~or relFac~or
addFac~orRHS addFac~or and addFactorRHS
addOp St8r~
symbol
andFaccor orFactor
andOp orFactor
andOp 2
a~Clau~e dimDefin1~~on and form4lParame~erDef
8ss1qlllll"'ntStmt: 5~atemencBody
78
The Syntax for the QuickBasic Compiler
TERMINAL SYMBOLS
"&" 1keFaccorRHS
"(A Value and fUDcc10DCall
"0" formalParamecerDef
")" Va ue and funcc10nCall
",.." lIIUOp
powOp
~1gn. powFaccor and addOp
Figure 4-18. Start of terminal list and part of the reserved words for QuickBasic,
including extended reserved words for the trace instruction
79
Chapter 4
Finally, scroll down further to see a complete syntax reference for our version
of QuickBasic, synthesized in English from the BNF alone. Figure 4-19 shows the
beginning of this reference. This is a comprehensive outline for QuickBasic pro-
grams, organized as a sequence of rules. 7 For example, the first rule declares that
an immediate command (used to type an expression for immediate evaluation as
seen in the Immediate window of Visual Basic) is a single immediate command,
followed by zero, one, or more sequences of the form: immediate command.
I
The following are the rules of the language
Take a look at outline item (3). It claims, truly, that an expression can consist
of an or factor, followed by an optional sequence consisting of one of the or
operators (Or or OrElse) and an expression. It claims, truly, that an expression
7. As a "word" person. who went into computing as part of an elaborate draft-dodging scheme
that got completely out of hand. I have always been underwhelmed. to say the least. by the
absence of truly automatic documentation from source code. This is my two cents.
80
The Syntax for the QuickBasic Compiler
can appear in many contexts, including single immediate commands such as Let
A=l+l (where 1+1 is the expression), implicit assignments such as A=l+l, lists of
subscripts as in array(A, B+1), and so forth.
This is useful information, although the raw outline needs to be "decorated"
with tutorials, examples, illustrations, and witty remarks to be an actual reference
manual. If you want to enhance the reference outline extensively, you should use
the option to convert it to XML.
2. Validate the BNF by making sure that bnfAnalyzer, Bison, or yacc processes
the BNF completely and without any errors, before you code the com-
piler, or expect to use the output of Bison, yacc, or any parser generator.
3. If the BNF tool you are using supports comments, comment the BNF
with descriptive information. Consider accompanying each rule with
a complete explanation in your natural language.
81
Chapter 4
6. When coding BNE take your special programmer hat off and wear your
special requirements hat. BNF is a formal language, rather than a pro-
gramming language. It specifies the set of possible sentences in your
language, not how to parse.
7. In spite of rule 6, develop the BNF for the programmer of the parser and
not for posterity.
8. Don't show the user the BNE Instead, use bnfAnalyzer to create the basis
of a rules manual. Don't show the user this manual. Instead, use it as
a basis for a presentation that shows you have done your due diligence
and a reference manual is available for posterity.
8. Many pawnshops will no longer accept old laptops and will make loans only for contempo-
rary laptops that can show DVD movies, since their clientele will often want to watch movies.
82
The Syntax for the QuickBasic Compiler
The tree that represents the compiled BNF (in COLparseTree inside General
Declarations, inside frmBNFanalyzer.frm) is just a Collection. But while most
vanilla collections are simple one-dimensional arrays and hash tables, COLparseTree
exploits the fact that any collection item can be a variant, and in particular, it may
be a collection!
This means that recursive data structures can be, without too much loss of
efficiency, represented as collections that contain sub collections.
For example, item(l) of COLparseTree is a one-dimension collection of all non-
terminals found in the BNE Item(2) is a similar collection of all terminals. Item(3)
is the root of a tree, as described in a comment header placed in the code before
COLparseTree, which represents the compiled BNR
COLparseTree is, in fact, a virtual object, since, for all intents and purposes, it
encapsulates the complete parse in one place that is easily passed between rou-
tines. When a new input file is selected, it is set to Nothing, and then rebuilt when
the user chooses to create a reference manual. However, I was far more concerned
with the reliability than the efficiency of this approach.
The problem with legacy, plain vanilla collections in Visual Basic 6 is that they
can contain any variant value in any item. The parallel problem in .NET is that
plain collections can contain any object in any item. This means that the lan-
guage and the Framework won't enforce any rules on our behalf, and if we mess
up, incorrect results will occur without warning!
83
Chapter 4
Conrinue I
Figure 4-20. Part of the COLparseTree inspection report
Notice that the inspection report explains in painful detail what it needs to
find, as you will see as you scroll.
Am I going overboard? No, I'm not.
A large object, whether it is a true object or a UDT that dreams of being an
object, is something with a determinate state, and that state is either correct or
bad. Abane of maintaining legacy, non-object programs was the way they, as
large collections of disconnected variables, could easily get into a bad state,
never to return.
The inspection routine is not only accessible from the menu, it is also caIled
when bnfAnalyzer parses an input file. This provides valuable ongoing quality con-
trol. We are, in other words, doing real quality control on the ground.
used in our .NET projects). clsUtilities contains a tool for converting any collec-
tion to a readable string called, unimaginatively, collection2String. To see how
this displays COLparseTree, open the Tools menu and select Dump the parse tree.
The result is shown in Figure 4-21.
vbColleetion
( I
vbCollection
(
vbCollection
(vbStrinq("ilIIooedi .. tee.,......nd"). vbLonq(l)) ,
vbCollectioD
(vbString("5ingle~di .. teCa.aand·), vb Long (2) , vbLonq(l».
vbCollection
(vbString("expression·), vbLong(3) , vbLong(2) , vbLonq(43). vbLong(46). vbLong(28) ,
vb Long (61) , vbLong(23), vbLong(33) , vbLong(34), vbLonq(25) , vbLonq(67), vbLonq(68) , vbLonq(26
). vb Long (3) , vbLong(97».
vbColleetioD
(vbStrinq(·explieitAssignaent"), vb Long (4) , vbLonq(2) , vbLong(18»,
vbColleetion
(vbString (·soureeProgr...... ), vbLonq (5), vbLoDq (11) ) ,
vbCollection
(vbStrinq("optionStat"), vbLonq(6) , vbLonq(5),
vbCollection
(vbStrinq (" sourcePrograaBody"). vbLonq (1), vbLonq (5) ) •
vbColleetion
(vbStrinq("sourceProgram[2]"), vbLonq(8»,
olleetion
(vbStrinq("loqicalNewline"), vbLonq(9) , vbLonq(5) , vbLong(11) , vbLoDq(104) , vbLoDq(105»),
vbCollecticn
Conbnue
This shows the collection and its members, which are subcollections using
parentheses whenever a sub collection occurs. This is what I call the "decorated"
approach to showing Visual Basic values in Visual Basic 6 and .NET. The deco-
rated approach serializes variant types and values in Visual Basic 6, and object
types and values in .NET. As you can see, it explicitly identifies not only the val-
ues of collection items, but also their types.
bnfAnalyzer Tools
There are a number of other options available in bnfAnalyzer, all accessible from
the Tools menu, shown in Figure 4-22.
85
Chapter 4
Dump sc:anTabie .. .
t Inspect sc:anTable.. .
• Create language reference manual: Performs the same function that the
button performs.
• Reference manual options: Calls up the options screen for the refer-
ence manual.
• Create parse tree: Creates the parse tree structure, without parsing the
BNF or showing the reference manual.
• Parse the BNF: Creates the parse tree structure and parses whatever file
is selected, without showing the reference manual.
• Destroy the parse tree: Destroys the parse tree, by freeing all subcollec-
tions and then setting the tree to nothing. Freeing all sub collections is
important to avoid COM clutter.
• Dump the parse tree: Creates the collection dump of the parse tree (see
Figure 4-21).
• Parse tree to XML: Converts the parse rules to an XM:L file, as shown in
the previous sections of this chapter.
• Inspect the parse tree: Inspects the parse tree (see Figure 4-20).
86
The Syntax for the QuickBasic Compiler
• View the source BNF: Allows you to examine the source code, but not to
change it. (You can use Notepad to change the source BNE)
• List terminal symbols: Provides only the list of terminals, which is also
embedded in the reference manual.
• Dump scanTable: Prints the scanned BNF in a readable form. This is the
lexical analysis, as shown in Figure 4-23. The lexical analysis starts with
several newlines because the input text file starts with several comment
lines, and comments are ignored by the scanner. The scanner captures the
token type, start index, length, and value of each token. The value is dis-
played using tools in clsUtilities, which serialize unprintable ASCII to
a viewable form.
:::olcer. Type
newll.ne ~o 2 "<ne:wl:.ne>"
newl::.ne 43 2 t'(newllne>"
nt!wl:..ne 117 2 rt(newll.oe>"
newline 168 2 "<new11ne>J'
newline 242 2 It<newll.ce>"
nt!wl1n~ 299 2 lI<newl.:.ne>"
newll.ne 302 2 "<newl~ne>"
newll.ne 305 2 "<newl::.ne>"
new l.ne 33. 2 "<newl~ne>"
noncerml.nalldenc::.fl.er 333 :6 "::.mmedlaceCornmand"
produccl.or~~l.qnrnenc 350 2 H :_"
87
Chapter 4
• Inspect scanTable: Audits the scanned BNF for internal errors and pro-
duces the long and very boring inspection report shown in Figure 4-24.
f~analyzer.anuToolsInspectScanTable_C1ick
Continue I
Figure 4-24. Scan table inspection report
You can select to see a detailed status report from the main bnfAnalyzer form.
To see how this works, run bnfAnalyzer and, on the main form, select a small file,
such as BNFanalyzer test 4, from the list in the lower-left corner of the screen.
Notice that three levels of status report are available in a Parse Status group box
on the main form. Select the highest level of detail: Complete report.
Parse Ste.tus
r Noreport
r Simple report
Ie [Cei'mpfete 1
~.QQ..rL ___ J
Click Create Reference Manual, and then cancel the options form to return to
the main form. Click the Zoom box in the upper-right corner of the form. You'll see
another pink dialog box, as shown in Figure 4-25. This one logs the status of scan-
ning, parsing, and all other steps. Scroll through it to see how the BNF is compiled.
Conbnue
89
Chapter 4
This progress report outlines the top-down recursive descent algorithm used
in bnfAnalyzer. This same approach is used to parse QuickBasic, as explained in
Chapter 7. A goal is set (parse the scanned BNF), then broken down into sub-
goals, and then narrated in this level of detail.
In summary, bnfAna1yzer is itself a form of compiler, which compiles docu-
mentation rather than code. I did a lot of extra work in constructing it in the form
of inspection and dump, so that I could rely on its output. Starting in Chapter 5,
you'll see how these core methodologies allow you to create solid compiler objects.
Summary
You have seen that BNF can be specified using BNF, and, by processing the BNF
file, you've seen how to use the bnfAna1yzer tool included in the sample code.
We've examined how the large BNF for our QuickBasic was developed and pushed
this file through the analyzer to make sure it is valid. And you've read eight rules
for using BNF as a requirements definition language.
We can now write our compiler, which will consist initially of a lexical ana-
lyzer, a parser, and our own "Nutty Professor" interpreter.
Challenge Exercise
Develop the BNF for a simple language that uses letters as logical variables and
the logical operators And, Or, and Not. Your language must support operator prece-
dence such that Or has low precedence, And has medium precedence, and Not has
high precedence. Your language must support parentheses.
Use bnfAnalyzer to make sure that your specification produces lists of non-
terminals and terminals, as well as the reference outline, without error.
Remember how to support parentheses: define parenthesized groups at the
same level as simple variables.
Resources
As noted in Chapter 3, a good reference for compiler theory is Compilers: Principles,
Techniques and Tools, by Alfred Aho, Ravi Sethi, and Jeffery Ullman (Addison-
Wesley, 1985). This book contains an excellent discussion ofBNR
90
CHAPTER 5
YOUR BNF DEFINITION of the language, expanded into a reference manual perhaps
using the bnfAnalyzer software described in Chapter 4, is the detailed design, or
requirements document, for your compiler. It explains, in enough usable detail,
the semantic effect at runtime of user statements.
This chapter will enable you to get started with your own .NET compiler. It
describes the big picture, which starts with lexical analysis in support of parsing.
You'll learn some of the theory behind lexical analysis-just enough to help you
see how code implements theory (whether it wants to or not). You'll then see
how a scanner object (qbScanner) produces scanned tokens for lexical analysis.
The final section of the chapter describes object-oriented design principles
as they apply to the scanner object. These principles will be used consistently in
the rest of the QuickBasic compiler project.
• The lexical analyzer, which reads the raw input text (almost always
a stream of ASCII and Unicode characters) and synthesizes meaningful
lexical units, passing them onward and upward to the parser
91
ChapterS
Note that software tools for working with source code, other than compilers,
might have this structure but replace the code generator with another form of
generator. For example, the bnfAnalyzer tool used in Chapter 4 scans (lexically
analyzes) BNF and parses it to create an internal representation of its structure.
However, bnfAnalyzer generates documentation instead of object code. The yacc
product generates C source code in place of object code, as do many preproces-
sors. This is, in fact, why this book stresses the front end of the compiler, as
opposed to code generation for MSIL. The front end, consisting of the lexical
analyzer and the parser, are utilities that allow you to craft, in any language, tools
to make your job easier.
There are other conceptual units in commercial compilers. For example,
popular units might be optimizers, which take either the source code or the
object code and improve its performance by transformations that are known to
be valid.
Our QuickBasic subset compiler, for example, notices degenerate operations
in the source code, such as division by one or addition to zero. These operations
are degenerate because their result is known: adding zero to a number always
results in that number without change. l Our compiler can, as you will see in
Chapter 6, remove these operations.
Our compiler also contains an assembler that resolves cross-references in
the generated object code. The assembler will be discussed in Chapter 6.
1. Degenerate operations are not like staying out late in clubs and having fun. Mathematicians
call operations like a+O degenerate because, compared with useful operations like a+lO, a+O
is a waste of my time and yours.
92
The Lexical Analyzer for the QuickBasic Compiler
or an underscore, and it must contain one character. 2 A .NET identifier may con-
tain letters, numbers, and underscores up to an arbitrary length, but keep it short
for your sanity's sake.
The rule can be expressed in BNF:
However, notice that the right side of the informal BNF is actually using a notation,
which you may already be familiar with: the regular expression as seen in .NET.
In Chapter 3, we briefly touched on the topic of regular expressions. Here, we
will look at regular expressions and their relationship to formal automata, includ-
ing Thring machines and a specialized, limited abstract machine called the finite
automaton. This discussion should illuminate not only the tools for scanner gen-
eration, but also the manual writing of a scanner. It will show you how to think
before you code the lexical analyzer for your language.
Regular expressions specify in a formal notation the rules for a class of strings.
They originally appeared in Unix and are supported in Linux, as well as in objects
shipped with COM and .NET. Regular expressions are a terse, if not gnomic, way
of expressing the format of expected data. They are used to create good lexical
analyzers, and thinking in terms of regular expressions is an important skill for
the compiler developer. As you'll see, understanding a regular expression allows
you to make predictions about what strings will satisfy it, and this is what makes
a regular expression so very ... regular. 3
2. The ability of a .NET identifier to start with an underscore is a new, and somewhat useful,
feature, added to Visual Basic as of .NET to bring Visual Basic in line with C++ practice. I use
this ability in the code of this book. Shared variables in classes, which are not part of the
object instance's state and which are, as their name implies, shared between objects, start in
my code with an underscore. This reminds the reader that "we're not in COMsas anymore,
Toto," and we are using a new .NET feature.
3. Mathematicians call regular expressions regular not because the regular expressions are regu-
lar; indeed, regular expressions appear rather irregular. However, the strings they specify have
a regular and predictable structure once the regular expression is known.
93
ChapterS
Figure 5-1. Regular expressions in BNF (generic; may miss some features of actual
processors)
J
2.1. This sequence:
2.1.1 AD Alternation Factor
2.1.2 AD Alternation R H S
A Sequence Pactor can appear in II Regex and an Alternation Il H S
AD Alternation Paotor can cODsist of the following:
3.1. This set of alternatives:
3.1.1 This aequence:
3.1.1.1 A Postfix Faotor
3.1.1.2 Tbis optional sequence:
3.1.1.2.1. A Postfix Op
3.1.2 A zero Operand Op
An Alternation Pactor can appear in a Sequence Faotor
An Alternation R H S oan oonsist of the following:
4.1. Thi • • equence:
4 . 1. 1 A STROKE
4.1.2 A Sequence Factor
AD Alternation It H S can appear in a Sequence Faotor
S. A Postfix Factor oan consist of the following:
Continua I
Figure 5-2. bnfAnalyzer output for regular expression syntax
94
The Lexical Analyzer for the QuickBasic Compiler
NOTE For specifics on the Regex object available in Visual Basic .NET, see your
Help system (you did install it, didn't you?). The lexical analyzer described here
implements regular expressions without using a Regex object. Instead, it imple-
ments an understanding, in code, a/the regular expression model o/QuickBasic
syntax, at the lexical level.
Metacharacters
Any ordinary string can be a regular expression. For example, the regular expres-
sion A specifies only those strings consisting of the uppercase A However, it's the
use of special metacharacters that makes regular expressions so powerful. You
saw a few examples of regular expression metacharacters in Chapter 3.
Asterisk
Any regular expression followed by an asterisk specifies all strings that meet the
requirements of that regular expression, repeated zero, one, or more times (some-
times called zero-trip, because zero "trips" are allowed). Note that the asterisk
allows null strings to satisfy its rule and that it allows a potential infinity of strings. 4
For example, A* is satisfied by a null string or any string consisting of uppercase As
only. Also note that this is a recursive definition, because it uses the concept in the
definition ("a regular expression followed by an asterisk"). This is not cheating,
since the inner regular expression is shorter and must meet all the rules of regular
expressions.
Plus Sign
Any regular expression followed by a plus sign specifies all strings that meet its
requirements repeated one or more times (called one-trip). For example, A+ is
satisfied by the letter A and any string of As.
Curly Braces
4. We say a regular expression is satisfied by a string; this means that the string conforms to the
regular expression.
95
Chapter 5
Parentheses
Just as we have already used parentheses to group BNF elements, parentheses are
metacharacters that can be used to group and clarify complex regular expressions,
whether for the sanity of the reader or for correct execution. For example, A(BC) *
is different from the regular expression (AB)C*. The first regular expression is sat-
isfied only by strings that start with A followed by zero or more repetitions of Be.
The second regular expression is satisfied by strings that start with AB followed
by zero or more repetitions of e.
Vertical Stroke
The vertical stroke character (I) may be used to specify that, at the point where
it occurs, the regular expression on its left is alternated or Or'd with the regular
expression on its right. The regular expression (A*) I (B+) uses parentheses (which
are actually unnecessary) to specify the valid "set of all strings," consisting of
the null string (because the left side uses zero-trip iteration), a string of As, or
a string of at least one B.
Concatenation
To place a regular expression next to another regular expression is actually to use
an invisible or implied operator, that of regular expression concatenation. This
arrangement specifies that the regular expression on the left is followed by the
regular expression on the right, and that correspondingly valid strings must sat-
isfy the regular expression on the left and then that on the right, moving in that
direction. This invisible operator is comparable to concatenation in BNE For
example, A*B+ specifies zero, one, or more occurrences of the letter A, followed
by at least one or more Bs.
Backslash
96
The Lexical Analyzer for the QuickBasic Compiler
97
ChapterS
a simple example whose specific treatment may vary from one processor to
another, with no error indication being typically provided, depending on the way
the processor is implemented.
Consider each regular expression processor a new language, potentially dif-
ferent from the regular expression processor it replaces. This is very powerful
stuff and confusing as hell. In the MIS world, as opposed to the more gnomic
Unix world, you need to document regular expressions and avoid complexity for
its own sake. Avoid ambiguous constructions.
In BNE two adjacent grammar symbols should not share a right handle and
a left handle, because this ambiguity can result in parse bugs. For example, the
sequence "identifier identifier" is probably not valid, since an identifier may
start with an alphabetic character or an underscore, and this set of characters
overlaps the set of characters that may end an identifier (alpha, underscore, or
digit). Likewise, it is usually not a good idea to concatenate two sub expressions
in a regular expression such that the set of characters that ends the first subex-
pression overlaps the set of characters that start the second sub expression. For
example, ([ abc] d*) * ([ def] g) may behave in unpredictable fashion, because the
d might end the first part or start the second part.
And bear in mind that regular expressions can be used, not as a program-
ming language driving a regular expression engine, but as a way of formally and
in detail specifying a syntax for coding. Indeed, this is what has been done in
quickBasicEngine.
Turing Machines
The most famous abstract machine was also the earliest. Alan Turing described
his 1936 Turing machine to show the limits of what is computable.
Now, what's a Turing machine? Is it a real computer in a museum, like the
Commodore or Speak and Spell, from long ago? No, the Turing machine is a purely
paper machine, the ultimate "Nutty Professor" computer- my affectionate term
for computers that are described but never built.
98
The Lexical Analyzer for the QuickBasic Compiler
99
ChapterS
Finite Automata
100
The Lexical Analyzer for the QuickBasic Compiler
101
ChapterS
Save
N'l.lll s:.rlng:
One b!4nk ehoroccer:
M- tlple hnu: Ll~. 1 of 3"O~O.3,'OOO:OLin. 2 of 3,J00013 •• 00010Llne 3 of 3
Test Save Sett ngs Reste<e Setungs I About Test thecanmon regulorexpresslO<lS Close
Figure 5-3. Use relab.exe to test, save, and document regular expressions.
Notice the list of regular expressions under the label Regular Expressions
Available. Double-click the last visible entry, which starts with "Simple Visual
Basic identifier (release 6 and before)." Now, drop down to the light-gray area
under the label Test Data (the darker gray area under that label is for storage of
your favorite test strings), and enter the string 12345 _a abce identifier1.
Before you do anything else, ask yourself, "What is the first (leftmost) iden-
tifier in the string?" Write down your answer.
Now click the Test button in the lower-left corner of the form. Since one of
the purposes of the laboratory is to allow you to save your regular expressions
and test cases, you will see a rather poisonously green screen, as shown in
Figure 5-4.
102
The Lexical Analyzer for the QuickBasic Compiler
Figure 5-4. Trust me, this screen, which prompts you to identify your test data, is
a yucky green.
The green screen also allows you to control the way in which you save test
data. As you can see, you can tell relab to never either prompt or save test
strings, or you can tell relab to always save, but not prompt for a description.
The Tools menu of the main form will allow you to bring up this screen without
adding test data.
When you are returned to the main form, be sure to click to the left of the test
string and in the light-gray test string area, since the tester will always start at the
location you specify. Click the Test button in the lower-left area of the form.
Oops, why was a in _a highlighted? This is because regular expressions have
no opinion about the strings that surround them, and this is something to keep
in mind. When used to find a string, they simply search for the handle of that
string, consisting of anyone of the set of characters that can start a string that
satisfies the regular expression. For the regular expression we've selected, this set
is [A-Za-z] (in regular expression set notation). Therefore, this is where the cur-
sor has moved. Regular expressions, unlike BNE do not care about context.
Press the right-arrow key, and then click the Test button to see the next regu-
lar expression, abce. Press the right-arrow key and click Test once more to see the
last regular expression, identi fier1.
Try a new regular expression. In the text box under Regular Expression at the
top of the screen, enter the regular expression ("[ ]+[ ]+)*. Our goal is to find
a series of blank delimited words.
NOTE This regular expression has a bug. What is it? If you know what it is,
enter the buggy regular expression anyway, without fixing it, in order to fol-
low the text.
103
ChapterS
Press the Tab key to exit the text box. You'll see a blue screen that allows you
to add a regular expression description. Enter the description Parse words, as
shown in Figure 5-5, and click OK.
Parseword~
Figure 5-5. The blue screen (but not ofdeath) allows you to describe the purpose of
regular expressions.
Move back to the light-gray Test Data area and enter Moe Larry Curley,
with a few spaces before Moe, perhaps between the other Stooges,5 but no spaces
at the end! Click Test to see the green Add Test String screen (Figure 5-3), enter
a description for the test string, and click OK.
Click all the way to the left of the test string in the light-gray box, and then
click Test again.
Strange-only the blanks in front of Moe are highlighted. This makes no
sense, since our goal was to find a sequence of blank separated words, and we
entered "find a nonblank sequence, find some blanks, and repeat."
But we made a simple clerical error. We entered the caret before the square
bracket. In this position, it matches "the start of the input."
Fix the problem in the black-on-white text box, and tab out to be prompted
for a description of the new regular expression. (The Delete button above the
Regular Expressions Available list box allows you to delete old regular expres-
sions.) Click at the far left of the light -gray test data box and click Test. Oops,
something is wrong.
5. The Three Stooges were three American comedians of the 1930s who lost title to their films.
As a result, their films were repeatedly shown on American television during the 1950s. Their
name became a byword for cluelessness. They may correspond in Russia to The Five Stupid
Guys, or in India to The Junglee Fools from the Country.
104
The Lexical Analyzer for the QuickBasic Compiler
OK
The relab application allows us to "cudgel our brains" in isolation from code
using our regular expressions and to focus on cleaning them up. It's no fun to debug
a regular expression inside the business tier, in the server room, at 3:00 AM.
If you don't see why no match was found, ask yourself-cudgel thy brains-
what is the handle of the leftmost unit. Since the leftmost meaningful unit is
a nonblank, the regular expression doesn't start at the beginning of the string
Moe Larry Curley, with three blanks.
But shouldn't the search for the regular expression find, starting at the first
blank, the regular expression that starts with M{ It does not because we're start-
ing inside characters that are valid inside the regular expression handle.
Furthermore, there is a bug inside the regular expression. It doesn't allow the
input string to start with blanks. Many text strings will start with blanks. And it
actually requires that the input string ends with a blank, which is not true of most
input strings.
In other words, the regular expression is completely broken, showing the
value of relab. It actually needs to be ([ ] *[" ]+) *.
Strangely, the best way to express the fact that spaces occur between words
is to start with zero-trip spaces, because the one-trip nonspaces required by
[" ]+ will always parse the word. Placing the zero-trip spaces first defines a word
as "that which is preceded by zero or more spaces."
Enter the correct solution, and document it in the Add Regular Expression
box (Figure 5-5) if you like (or press Cancel to skip this step; note that this will
cause the solution not to be stored) . Click again at the left side of the test string
to see the correct answer finally highlighted.
105
ChapterS
NOTE The relab program doesn't actually convert the regular expression to
code that doesn't use a regular expression. Instead, it produces the formatted
definition of the regular expression commented with the definition. It is
a more formidable task to convert the regular expression to code, although
you can do that using the lexical analysis and parsing methods described in
this book. However, you also need to know how the regular expression is trans-
lated to a nondeterministic finite automaton and from that to a deterministic
finite automaton. If you're interested, refer to Aho, Sethi, and Ullman's "dragon
book,"Compilers: Principles, Techniques and Tools (Addison-Wesley, 1985).
When you leave the regular expression laboratory, it will save all of your test
expressions and test strings in the Registry in a standard location. You will have
your stash of tested and documented regular expression tools-your very own
gnome factory.
TIP To create a laboratory for a programming team, in which you can share
regular expressions and test data, you can convert the source code for relab to
save information in MicrosoftAccess or SQL Server. See the methods
form2Registry and registry2 Form for the code that should be modified.
• The regular expression for a Visual Basic comment that starts with an
apostrophe, extends to the end of the line, and contains no tabs or other
white space characters other than the blank
• The regular expression for a Visual Basic comment including end of line
106
The Lexical Analyzer for the Quic/cBasic Compiler
• The newline for Windows (carriage return and linefeed) or the Web Oine-
feed only)
TIP See the Visual Studio Help system for more application-oriented regular
expressions, including regular expressions for phone numbers and ZIP codes.
Most of the included regular expressions have to do with parsing source code.
However, I don't recommend their use in a full compiler. This is because the com-
mon regular expressions do not take context into account and blindly accept the
next string that meets their rules.
For example, a Visual Basic identifier by itself is a valid formal parameter dec-
laration when Option Strict is not in effect in Visual Basic .NET, as in Private Sub
A(B). The ByVallByRef clause is not required (it defaults to ByRef in COM and to
ByVal in .NET), nor is the As clause, although omitting the As clause is always bad
practice in COM and .NET. This means that if the common regular expression is
used in the middle of arbitrary source code to find the next formal parameter, it
will return a false positive when an identifier occurs to the left of the first formal
parameter. In Private Sub A(B), the identifier Awill be mistakenly recognized as
a Visual Basic 6 formal parameter definition using the regular expression supplied
in relab, as shown here.
107
ChapterS
The regular expression can be used only after a procedure header has been
located, along with an immediately following a left parenthesis.
The common regular expressions were developed for a variety of software
tools that read and examine Visual Basic source as quick solutions to client prob-
lems, including the need to identify aU procedures of a certain type. They use an
ad-hoc or "lazy" approach to full-scale parsing of Visual Basic that is interested
only in certain strings.
Full-scale parsing, almost of necessity, involves parsing not characters (as do
these regular expressions) but of scanned tokens, such as those produced by an
object like qbScanner, described in the next section.
The final feature of relab that I would like to show you is its regression test of
the common regular expressions. Although the regular expressions are hard-coded
and inside a Shared (static) class, utilities.dll, I was nervous about getting them
108
The Lexical Analyzer for the QuickBasic Compiler
right, so I included the regression test feature. Also, since you have the source,
you might change them.
Click the button labeled Test the common regular expressions, on the bottom-
right side of the form. You will see a success dialog box, followed by a Zoom box,
which provides a text box view of a report, as shown in Figure 5-6. The report
shows a series of test cases applied to the common regular expressions, such
that the application is tested against expected results.
Close
We've revisited regular expressions. The next step is to see how we've wrapped
the hand-coded lexical analysis into an object, which gives us a reusable tool for
scanning QuickBasic and languages with related syntax.
109
ChapterS
FIe Tools ~
:~
Seen Scennext
Reset Inspect
Objectto XML I W Include Abcut Info
110
The Lexical Analyzer for the QuickBasic Compiler
NOTE We could downsize the lexical analyzer out of existence, since what it
actually does is a low-level parse. We could use the techniques described in
Chapter 7 instead. But this would make the parser much too complex.
Token Types
• Apostrophe
• Ampersand
• Colon
• Comma
• QuickBasic identifier, which has the same syntax as Visual Basic identifiers
prior to .NET
• Arithmetic and other operators (a single token type that excludes the
ampersand)
• Semicolon
• String
• Unsigned numbers
• Pound sign
• Dollar sign
Note that our scanner does not limit the length of identifiers, as did older
QuickBasic and Visual Basic processors. This was necessary in older compilers,
where tables had to be carefully allocated in C or even assembler to preserve
scarce memory. In our implementation of quickBasicEngine, we have the nearly
unlimited length String data type, so this limitation is not enforced.
111
ChapterS
NOTE Several complex strings are provided in qbScannerTest. They test lexi-
cal analysis ofstrings and the Basic rule that doubled internal double quotes
represent double quotes.
Whether or not we use a regular expression to scan these tokens, any sensi-
ble code will be, in effect, an implementation of a regular expression. This raises
a problem with token schemes, including any possible proposal to make signed
numbers into tokens.
Ideally, each distinct token type should have a different handle-a different
set of characters that may appear at the beginning of the token. In Chapter 4, the
handle of a grammatical class was the symbol, or set of symbols, that could begin
the grammatical class. For example, a Visual Basic expression starts with an iden-
tifier, a number, a plus sign, or a minus sign.
We would like to simply scan left to right for anyone of the set of characters
that can start a token, for each token, and take the first token we find left to right.
There are, as I will show, dangers in this simple plan, but basically it's a good idea.
In fact, each token type in this data model has a different handle. The very
simple token types for single characters (apostrophe, ampersand, colon, comma,
semicolon, dollar sign, and pound sign) each has a unique handle: the character
to which it corresponds, which does not appear anywhere else. Parentheses are
restricted to the parenthesis token type. Identifiers start with letters and under-
scores, which appear nowhere else. Here are some examples:
• A letter starts an identifier, or, possibly, an operator like Mod, which has the
form of an identifier.
NOTE The ampersand is not included with the operators because it does dou-
ble duty in Quic/cBasic as the string concatenation operator and a type suffix
in an identifier. (Quic/cBasic has a legacy and ugly feature, which was pre-
served in Visual Basic through release 6: you can define the type of a variable
using a special character at the end of its name.)
112
The Lexical Analyzer for the QuickBasic Compiler
6. The exponent represents shifting of the value by powers of ten. Exponents are used by nutty
professors, mad scientists, and disturbed engineers to represent the very large and the very
small.
113
ChapterS
are ways to code around the problem. The difficulty is that there are many ways
to code this, and nearly all of them are wrong.
You could look for a number to the right of the operator or sign, but this
means either one of two things: you have code for a token type of unsigned num-
ber anyway, which is useless to your user but that supports your actual number,
or else you merely move to the right and look for a digit. Oops, remember, a valid
unsigned number can start with a decimal point, and your code has to work for
-.1, which is valid. Consider a- -1, which is ugly but valid. Ifwe defined a signed
number as a token, when we encountered the minus or plus sign, we would need
to back up and examine context.
Another consideration is that your scanner would need to violate its com-
mitment to basically act like a finite automaton, and move left to right. Where
did this commitment come from? It came from the fact that our objective is to
broadly define the scanner object's behavior in clear and understandable terms,
as an implementation of a finite automaton, and a finite automaton moves left
to right only. Also, if the scanner can move backward, and undo scanning, this
makes scanner progress reporting all the more complex. As discussed in the "The
Scanner Object Model" section later in this chapter, qbScanner exposes events to
let user code manage progress reporting. If the scanner moves backwards, the
user will be confused by progress reports that back up. More generally, any extra
features of the scanner generate work as the scanner moves from left to right,
and this work would need to be unperformed and undone on backup. Finally, if
you want to use the scanner in multiple threads or as a scanner server that pro-
vides tokens on demand, a scanner that backs up will make the final product
very complex.
Fortunately, there is a simpler solution, and this is to simplify the scanner.
The scanner promises to give the parser unsigned real numbers only. It lets the
parser worry about the difference between a plus sign and a unary positive sign,
and between a minus sign and a unary negative sign. The result is that we can
implement the scanner by scanning forward for the handle of each distinct
token at any instant in scanning.
Scanner Implementation
Unfortunately, Visual Basic .NET does not provide an easy way (as do C and C++
with strspn and strcspn) to find anyone of a set oftokens, other than by using
the regular expression object (Regex), which is overkill for such a simple task.
Instead, if you examine the qbScannerTest project in the source code, you will
find a project and a stateless class (one with no variables that occupy storage at
runtime in General Declarations) called the utilities class. This class exposes
the verify utility, which scans for a set of alternative characters, or for the com-
plement of this set: the set of all characters not in the specified set.
The proposed design looks simple in structured pseudocode:
114
The Lexical Analyzer for the QuickBasic Compiler
Do until done
Find the leftmost token
Add It to the scanner's collection of tokens
End Do
However, there are two problems with this initial design. We will need to
have some sort ofindex to characters in source to keep track of position. This
initial design does not tell us how to manage the index, and this management
is tricky. Also, the pseudocode might result in slow real code, since Find the
leftmost token implies an inner loop.
Here is how we can manage the index:
This more refined design neglects the possibility of blanks between tokens.
This is a fairly simple issue to resolve-just place a simple loop, which won't
make the code much slower (because, generally speaking, one blank will appear
between tokens in source code) under the Do until and before the Find the
leftmost token.
However, the wasteful inner loop (or straight-line sequence oftests, which
has the same constant effect on runtime) remains. It is wasteful because, in the
case of tokens that overlap, it will return false positives. For example, if we are
scanning the string "Identifier" (a string containing quote, Identifier, quote),
the inner loop will find the Identifier one position beyond the start of the
string. It won't return this bogus Identifier as a real token, because when the
string is found to be the real token, Add the length of the leftmost token will
shift the index over the false Identifier. But in a more complex case, such as
the string "*/Identifier", the scanner will scan two bogus arithmetic operators
unnecessarily.
115
ChapterS
In the case of the string "*/ldentifier", the programmer obviously means "a
string containing an asterisk, a slash, and the word Identifier." However, the inner
loop in the preliminary pseudocode will find a string at positions 1..14, an aster-
isk at position 2, a slash at 3, and the identifier Identifier at 4.
It would be better to create an "anticipatory" scan, and this was implemented
in qbScanner. In this algorithm, each trip through the major scan loop goes through
all or some tokens. For each token that hasn't already been found to the left of the
index, the scanner locates this token. It doesn't do this in an inner loop. Instead, it
uses lnstr. Then it selects the leftmost and widest token. The next loop can sim-
ply ignore false positive tokens, which are inside the leftmost and widest token.
Consider the quoted string "*/ldentifier". The first time through, the inner
For loop will find all three token types: a string, an asterisk and a slash, and an
identifier. But it can also note that the string is the leftmostand widest token.
Therefore, the stringwill be selected as the next token.
Of course, the scanner could also find other tokens beyond the string. Suppose
the string is followed by a number, one beyond the end of the string at position
15. Since the number is a different token type than the asterisk, slash, identi-
fier, or string, it is also recorded in an array in the For loop. In fact, consider
what happens when control returns to the For loop a second time: the main
scan index will have been increased by 14 characters, since this is the length of
the string. The second execution of the For loop will simply move beyond all
preset tokens when they occur to the left of the main scan index, and scan for
the next occurrence starting at the main scan index. Recall that its loop text is
"find its leftmost token starting at index"!
NOTE Don't be overly concerned that "* IIdenti fier"123 is, in syntactical
terms, complete garbage. Recall that it is the parser's job to worry about garbage
at this level. As far as the scanner is concerned, this string is just fine, and it con-
sists of two tokens: a string followed by an integer.
116
The Lexical Analyzer[or the QuickBasic Compiler
Note that completely to the left of the leftmost token "candidate" means
that it is not enough to compare the starting index of the candidate with the
starting index of the anticipated token, because tokens can overlap. Instead, the
start index plus the length of the candidate token must be less than the start
index of the anticipated token.
All this seems fairly complex. In a nutshell, the algorithm does the following:
1. Finds the leftmost and widest token, while also finding a set of useless
tokens inside the leftmost and widest tokens, and more usefully finds
another set of tokens fully to the right of the leftmost and widest token
2. Adds the leftmost and widest token to the output list of tokens, sets the
character index one past the end of the leftmost and widest token, and
repeats until done
A Scan Test
Let's try running the scanner for the test tokens. Click the Test button on the
qbScannerTest GUI (Figure 5-7). You will get a Yes/No message box announcing
success. OickYes to see the output, as shown in Figure 5-8. As you can see, the
output contains a test string, the expected results, and the actual results.
117
ChapterS
Th.. test .t~ing is: • , : •identifier "0001310'00010+ - • I ( 1 ; -.tring" 32761 -32167 .. -12 , I , $ "Thi.
atri.n.cJ i. ""fancy ..... "'1'bl • • triDg t. ,nfaDCY·". It .... contain." .. ""innor"" " .... I'1 ...... tring."""., ........ 2 endId
U08220Tbh ltd"", u .... ...art quote.UD8221
••••• ~ R£SOLTS •••••••••••••••••••• l1li • • " ............ " •• " ••••••••••••• , ••••••••••••••••••
• 1Ipo_ t~opb..U .. 1 : 1 • •
• Amper ...nd93 .. 3:1 ,
.. Colon95 .. 5:1 :
• co..a@7 .. 7:1 • I
• rdentifi"rQ9 .. 18: 1 identifier
• N.."Un"a20 . . 21:2 "00013""'0010
• 0pe~"tori22 .. 22:2 •
• Operator824 .. 24:2 -
• Oper"tor@26 .. 26:2 • -
• Operatori28 .. 28:2 I
• Par.... th.. §b&30 .. 30:2 (
• P"renthe.io832 •. 32:2 1
• Seaic:olon834 .• 34 : 2 ;
• Sttinq936 .. 43:2 - . ."lng"
• Un.igne<Untoqer845 .. 49: 2 32161
• Oper"tor8Sl. .51:2 -
• on_ign~alNmobe~fS2 .. 60' 2 32761 .. -12
• l'e~oent@62 .. 62:2
• ~al..... t!.on@64 .. 64:2 I
• Pound966 .. 66:2 ,
• Currency@68 .. 6B:2 S
.. String@70 .. 96:2 "This .t.rln<J t ..... fancy·'.. ... ..
.. Str1ng@99 .. 170:2 "Thi. strinq 1s .. ltfancy ..... It:. ""cont.ain. '" 'l"iaoer"" "'''''"''~.tring . .... ''It''I'I .. !It
• l'eriodll112 .. 112: 2 . •
- On. ignedl1aa.lNwober81 74 .. 11 5 : 2 . 2
Close
The test string exercises qbScanner for each token type to regression test the
scanner when its source code is changed. It also includes marginal tokens to
make sure the scanner works for these cases. The test string uses XML notation
for the nondisplayable characters in a newline.
The expected and the actual results list tokens in a serialized form, where to
serialize an object is to convert it to printable characters. We have converted
newline in the test string to a displayable form given ourWmdows international-
ization locale (which, for us, is ASCII).
In the scanner data model, each token is an object of type qbToken, and its
toString method creates the view of the token in the actual results.
For each token, the scanner has named its type, identified its starting and
end index, shown its line number (to the left of the colon), and displayed its
value in an expression of the following form:
We really want the line number, by the way, since this will help display errors.
Programmers can't find character indexes as readily as they can spot line numbers.
To ensure that nondisplayable ASCII and Unicode characters are displayed
properly, a utility function (string20bject, available in utilities.d.ll and for which
source code can be found in utilities.vb) converts nonprintable values to their
118
The Lexical Analyzer for the QuickBasic Compiler
XML format, which is ampersand and pound, followed by the five digit decimal
value of the ASCII or Unicode character. Ideveloped the string20bject to sup-
port serialization because it drives me nuts when nonprintable characters such
as newline sequences appear in output.
Scroll down to see the complete actual results and how numbers are han-
dled, as shown in Figure 5-9 .
...... , AC'TUAJ.. JU!SOLTS ••••• , ••••••••••••• ,." •••••••••• " ................... 111 •••• 111 • • • iII • • • • • • • • • • • • • • • •
• Apo. .ropheU .. 1: 1 '
• lUIIperoand!!3 •. 3: 1 ,
• ColonSS .. 5:1 :
• C......91 .. 1:1 •
• IdentUierU .• 18: 1 id.... tifi"r
• lIewliDe920 .• 21:2 "00013"00010
• Op<lrator@22 .. 22:2 +
• Op<lrator824 •. 24: 2 - I
• Op<lrator@26 •. 26: 2 •
• Op<lrnor828 •. 28:2 I
• Parenthulo830 .. 30: 2
• Parenth ... h@32 .. 32:2 )
• S-icolon@34 .. 34:2 ;
• StriD'Jl!36 .. 43: 2 ".tring"
• Ol,.l.gnedIDtoqer@45 • . 49:2 32767
• 0per&tor@Sl .. Sl:2 -
• On.ignedR<oalllUllber852 .. 50: 2 32767e-12
• Perc.... tU2 .. 62:2 ,
• hoI ....... to .. U4 .. 64: 2 I
• Poundtl66 • • 66:2 I
• cw:rency968 .. 58:2 $
• S<riD9870 •. 95 : 2 "Tab .trinq h "-fancy"".-
• Strin989a .. 170:2 "This strinq 1. ""fancy·", It ""cont4in.s"" ""inner"" ,..,. .. ·' .... striDg...... ·.,." .... 111
• Period@112 .. 112:2 .
• OnoignedR<oalllUllber8114 .. 175: 2 .2
• Identlfier8177 .. 181: 2 " .. dId
• String.183 .. 213: 2 "08220Tb1a •• ring u.ea .... rt quote."08221
................................ ",. •••••• ,. •••• ,. ............................................. *•••• ,.
Close
The pure model of an unsigned number has actually been changed! It's true
that the real number -32767e-12 has been divided into a unary minus followed
by an unsigned real number. However, note that in the case of real numbers
only, we've slightly violated the rule that two distinct token types must have two
distinct sets of leading characters.
Unsigned integers may start with any digit; unsigned real numbers may start
with a decimal point or any digit. The character set "any digit" is shared by two
distinct token types.
The string 32767 was scanned as an unsigned integer, while -32767e-12 is
scanned as an operator, followed by an unsigned real number. We've kept our
promise not to include signs in numbers and let the compiler sort them out; how-
ever, we return two different, and apparently overlapping types, integer and real,
which of course will have common handles. This is readily explained.
119
ChapterS
We could have simplified the scanner to just parse integers and let the
parser synthesize real numbers. Real numbers have a sensible BNF syntax. But
overall, it is the scanner's job to make life easy for the parser.
Instead, the token type Unsigned Real is a synthetic type that is based on
Unsigned Integer. When it's time to find a real number in the inner, anticipatory
scanner loop, the real number finder looks for an integer, followed optionally by
a decimal point, followed optionally by another integer, e, exponent sign, and
a third integer. The result is that in the anticipatory table described earlier in
pseudocode, there will be overlapping entries. To select the correct entry, the
scanner not only takes the leftmost token, but it also takes the widest token.
I have rather slyly postponed this discussion because I wanted to show how to
stick to an ideal as long as possible, and then compromise. This issue could be
a bug, but it isn't as long as the anticipatory loop selects the widest token. And it is
an example of the kind oflow-Ievel and painful issues that arise in lexical analysis.
7. I find that while I am at times late with my software, especially when I have no say on the
delivery date or am being passively aggressive, what I deliver is sound, as long as I have taken
the time to use a structured approach. That way, I don't get any business as a "consultant"
because the software works--<Jops. Seriously, the time-to-market statistic doesn't completely
capture software quality.
120
The Lexical Analyzer for the QuickBasic Compiler
• The end index of the token in the source code (which, as a property of the
token, can be calculated from the start index and the length)
Note that there is no need to store the actual value of the token-doing so is
a waste of space. The source code is a part of the scanner state, as is the start and
length of each token; for this reason, we can always get to the token by finding
the Mid (substring) of the source code using the token's start index and length.
Classification of Objects
The approach classifies objects into stateless and stateful objects. Stateless
objects consist of pure code. These objects define static (shared) methods and
properties, and never need to be created using New.
NOTE A good example of stateless objects in the code for this book is utili-
ties.dll, a large collection of string handlers, math gizmos, and other methods
I have found useful over the years.
121
ChapterS
state, the facts about a variable's type, but it also exposes a set of shared methods
for working with types in the abstract, including a shared method for telling
whether two types enclose each other.
All stateful objects have a Name property so we can identify different objects in
output. Name defaults to classNamennnn date time, where nnnn is the sequence
number of the object as created in the process.
Take a look at the New constructor for qbScanner in qbScanner.vb. Note that
it references a Shared variable named _INTsequence. It starts with an underscore
because it is shared. It then contains the Hungarian prefix for its type INT, and
then contains a descriptive name, sequence. I uppercase the Hungarian prefixes
of variables in the Common Declarations section.
We use the threading model to increment the sequence number for the
default Name, since to add one to it would make the object unusable when run-
ning more than one instance of the object in multiple threads.
inspection of the test scanner has succeeded: dck Yes to VIe\V the report: ck:k No to retum to the
main form
Yes No
122
The Lexical Analyzer for the QuickBasic Compiler
This message must appear. I mean it. This is because inspect, in qbScanner
and the other stateful objects shipped, checks for errors that would be the result
of serious internal problems.
Let's look at what it checks. Click Yes in the message box to see the inspec-
tion report, as shown in Figure 5-10.
If the code is null and t.n.dicated u fully sCMned the scan count .ust be zero.: OX
Four assertions have been tested against the state, the General Declarations
variables of the qbScanner instance. Figure 5-11 is the declaration of the object
state, as the TYPstate structure followed immediately by the state instance,
USRstate. These assertions concern usability, the scanned tokens, line numbers,
and the source code.
123
ChapterS
NOTE These rules should not fail. If they do, this means one of two things.
Either my ham-fisted original code has a bug (otherwise known as an issue or
feature) or you have modified the code. Ifit is my bug, send me e-mail at
spinozallll@yahoo. com. If it is your bug, fix it.
Usability
The first assertion is that the object must be usable. Usability is part of the core
approach.
Object-oriented design with stateful objects raises an interesting problem.
This problem existed before "objects," but only object-oriented design gives us
a way of facing the problem squarely.
Old programs of the legacy sort often have thousands of variables and con-
ditions. Often, only a few combinations of these variables and conditions are
actually valid. Statistics, along with Murphy's Law, predict that these old pro-
grams might enter a state-a combination of values-which is unexpected, and
of course, they do.
The mentality of a non-object-oriented programmer in a language like C is
that "my program is special and will not enter a bad state-ever." This neglects
the fact that, as hero computer scientist Dijkstra pointed out, anyone program is
best viewed as a set of related solutions that evolves over time. A payroll program
is a member of related solutions, some of which the user wants, some of which
the user would prefer, some of which the user will put up with, some of which
the user will need next year, and so on-you get the picture.
This means that in a non-object-oriented language, it is just too easy to add
variables over a life cycle in such a way that invalid combinations occur. Many of
these combinations are benign tumors in the sense that they don't change results;
others are malignant.
Using an inspect procedure, especially if it is executed automatically or at
a regular interval, can tell the object if it is "sane" and has valid combinations of
values (depending, of course, on how many conditions are actually tested). What
is interesting is what the object can do if it does find a problematic state. The
object, unlike legacy code, knows it is no longer in a state of grace. And, unlike
legacy code, the object can do something with this knowledge. (Note that this
discovery has nothing to do with compilers but everything to do with building
good software.)
It can, as the objects in the compiler do, set a variable, in its state (called
booUsable in the code) to False.
Whenever a Public procedure (property, method, or event handler) is exe-
cuted subsequently, the object can raise an error and return a sensible default,
instead of making life worse for itself and the rest of the world by returning
bogus results, or, worse, doing damage to data outside itself. Therefore, at the
124
The Lexical Analyzer for the QuickBasic Compiler
conclusion of the New constructors for objects with state, we explicitly inspect the
initial state, and, if it is valid, we set usability to True.
Many objects with state contain reference objects that occupy .NET's heap
storage, and we consistently follow a .NET rule. This is to expose a Public proce-
dure (called, consistently, dispose in our code), which the object user is urged,
on pain of 20 lashes with a wet noodle, to use when the object is no longer
needed. The original purpose of dispose was to ensure that reference objects
did not clutter the CLR heap unnecessarily. However, in the QuickBasic com-
piler code, objects consistently self-inspect when dispose is executed as a sort of
global sanity check.
Scanned Tokens
The next assertion is about the tokens that were scanned. Since each is a stateful
object, each must pass its own qbToken. inspect procedure. Also, the tokens must
always form an ascending series of nonoverlapping tokens. They don't need to
be contiguous and won't be contiguous in general, because blanks may appear
between them. If this rule is violated, the entire meaning of the scanner has
been damaged.
Line Numbers
The third rule is trivial compared with the other rules. The line number starts at
one. Therefore, it must start at one. It can't be zero or negative.
Collection Structure
We need a collection structure rule whenever we use the old standby Collection
object to structure a tree or other data structure, by including collections as
members in collections. Here, the collection is an index to map line numbers to
character positions. The key of colLinelndex is the line number, prefixed with an
underscore (because the legacy collection would otherwise treat the line num-
ber as a numeric index and not a key). Its data is the line number, the start
index, and the length of the line. This is represented as a three-item, unkeyed
sub collection.
In developing this collection with structure, we are performing object design
without explicit stateful objects such as "line number index entry" and "collec-
tion of line number index entries." Logistically, it is unnecessary to go crazy and
develop a full-dress object for each and every potential object. Logistically, it cre-
ates some source bloat because a project and a project's files exist for each
possible object. Therefore, it makes sense for simple objects, such as an object
that maps line numbers to character positions, to use a collection in a structured
fashion. However, it does require inspect to check the structured collection for
correct structure, since this is not enforced by an explicit object model.
125
ChapterS
Collections of Collections
You'll see more of these collections-on-steroids in the code; get used to them.
Ordinarily, we think of the collection as a hash table or one-dimension array.
However, the true power of the collection is obtained only when you realize
that its entries can be objects, in particular, collections.
Because collections can contain collections, a collection can represent not
only a simple array of basic data type but also an array of records, as has been
shown here.
Recall the bnfAna1yzer tool introduced in Chapter 4. It stores the "parse tree" of
the input BNF in a collection, which contains collections.
Of course, we can go overboard in using the collection in this way. I may have
done so, in fact, in bnfAna1yzer. It might have been actually better to develop
the parse tree as a separate COM object (recall that bnfAna1yzer uses COM).
However, COM objects don't playas nice as .NET objects do, and I wanted to
get the project done quickly.
Source Code
The final inspection rule tests the source code against the scanned tokens. The
first scan token must coincide with the first nonblank character in the source
code; the last scan token must coincide with the final nonblank characters in the
source code.
126
The Lexical Analyzer for the QuickBasic Compiler
<1--
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 • • • • • • • • • • • • • • • • • • ,. • • • • •
.. f#>Sc:anner
•
.. The qbSaanner 01.". ScaD. i.aput source code and provide., OD c:ir&a.nd, 5QAnned ..
.. source to.Jc.elU and lines of aourco code. Thi. c1 ••• u.e. "la~y" evaluation,
.. seannioq the source code on.ly when DeCCI.a.ry. and when an UDparSed token t. ..
• roquested.
I
·
• Thi. 01 .... ..... developed _nc:inq on 4/30/2003 by
• !dward G. /lUg".
• [email protected]
• bttp://.-ben .• 0<ee1>1.0OlO/edNHge5
. . . . . . . . . . . . , • • • • • • • • • • • • ' . . . . . 6 • • • • • • • ,. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
--> -
<qbSoa.nnor>
<, --
Object instance Daile -->
<NAIIe>qi>SC5l1D""OODl 3/2/2004 8: 21: 08 PH</II.....,.
<1-- Tn.. : objoot h u.abl" -->
<o.ablo>Tno</O.5blo>
<1-- SourOfli code (trnneatod to 100 cbaracter. -->
<SouroeCodo>
bpo.; ''''''P ; : • identifier ''''''I';.00013,_;.00D10+ - • / ( I ; 1
iquot;.trtng'q\lot; 32767 -32761,,-12 , 1 • $ 'q\lot;Tblo otring' is
'q\lot;'quot;fllDc:yiquot;iquot;. 'q\lot;
</Sourcecod,,> !
<t- - L.aat token array entry in u.e -->
<Laot>O</L... t>
<1-- Toltal,. paraod (truncated to 100 tokeD(.) -->
Close
This XML has been formatted for easy readability, and it is heavily com-
mented. In particular, the paragraph that describes the class is also available as
the value of qbScanner's read-only, shared About property (another core proce-
dure, which in most objects, will supply information about the purpose of the
class, as well as my name, e-mail address, and Web site).
Options of the XML object, corresponding to check boxes and text boxes on
the scanner test form, allow you to suppress either the leading box comment or
the line comments that describe each state variable.
Clicking the Test button in the qbScannerTest program's GUI executes the test
method of the scanner, which presents a test instance with a string containing
all possible tokens and some marginal difficult cases (see Figure 5-8, earlier in
the chapter). It compares the serialized list of actual results with the serialized
expected results. If they match, the object does not complain. If they do not
match, the form will display an error message, and the object will mark itself as
not usable.
127
ChapterS
A concern in the scanner (and, as you will see in Chapters 7 and 8, also in the
compiler and the interpreter) is the ability to accurately report progress in scan-
ning, parsing, and interpreting large source programs. I wanted to avoid the
irritating and vague progress reports we sometimes see in Windows.
At the same time, it is a bad mistake to make an object with code that builds
forms, but whose mission is not to draw pretty forms on the screen. This is
because this code must then import and reference System. Windows. Forms, which
bloats it for no good reason, and worse, locks the code into the Wmdows client
environment.
8. Dijkstra was less an ivory tower theorist than someone who actually believed that you cannot
separate theory from practice, high-level design from mindless code, and so on. Perhaps for
this reason, two of his results (structured programming and semaphores) are actually useful
to ordinary slobs.
128
The Lexical Analyzer for the QuickBasic Compiler
If, all of a sudden, you decide that the code in question would make a spiffy
Web service, you are in a world of hurt when the object, like the scanner or the
quickBasicEngine compiler itself, is large and complex. You must go through the
object and find each and every line of code that has to do with presentation and
make this code conditional on the mode of presentation.
You wind up with Frankencode, a dismal monster howling on the blasted
heath for its author's ass, because it knows, as did Mary Shelley's famous mon-
ster, that its life has been destroyed by its very fabrication as "a thing of shreds
and patches." Unlike Alice Cooper, singing "Feed My Frankenstein" in Waynes
World, your code might be good looking but will be evil.
Therefore, we need a way to separate presentation from logic and to have
a way for the nonvisual object to display its progress. One way would be to
have the presentation logic inherit the nonvisual object. This makes some
sense in a language that allows mUltiple inheritance. However, Visual Basic
doesn't allow multiple inheritance, meaning that the presentation logic can
present only one object. Also, it doesn't make much sense to say that a mere
progress report is-a compiler. This, in Shakespearean terms, dresses the
progress report in "borrowed robes."
Instead, we use an event model in the scanner and elsewhere to transmit
events, which can be ignored, used to display progress on a Windows form, or
used to display progress in a Web service. Figure 5-13 shows the event model of
the scanner. Note that to actually obtain these events, qbScanner must be declared
using the WithEvents keyword and inside General Declarations.
The scanEvent event fires each time a new token is found. It provides the token
object, its start index, its length, and the total number of tokens found so far. The
value and type of the token object is found using its properties. This allows the GUI
to extend a progress bar, highlight the code being scanned, or both.
The scanErrorEvent event fires each time a user-related error such as unrec-
ognizable characters occurs. It describes the error, identifies where it occurs by
absolute character position and line number, and, in some cases, provides addi-
tional tips.
I have followed the object-oriented practice described in this section consis-
tently in all stateful objects of the compiler. When I make additions to the core
129
ChapterS
set, I will note them in the book. In particular, further objects will incorporate
a test method, which will, inside an object instance, allocate a test instance as
a local variable and then run a series of prepared tests. This will allow me to
expose on object test forms, similar to the form of qbScannerTest, a Test button
that runs portable regression tests, not inside the form (as is the case here), but
inside the object.
Summary
This chapter described the development of the first major objects (qbScanner,
which has qbTokens) of the compiler for its first task: lexical analysis. We have
used modern techniques to support a legacy language because object-oriented
development makes compiler development a much more visible and less arcane
process.
Before object-oriented development, developing compilers involved a great
number of tables interlinked in complex ways. They had a tendency to get into
combinations of states that resulted in bugs, some of which were exploited by
the compiler's user community and became features.
Object-oriented development does not dramatically increase the speed at
which compilers are developed. In this case, I promised Dan Appleman (this book's
editor) that I would refactor and make more intensively object-oriented the orig-
inal compiler for QuickBasic that I had demonstrated to him at the Visual Studio
rollout festival in 2002. I had decided to do so because it is very hard to explain
a compiler in depth without showing that it is made up of distinct modules and
without using the Windows form to exhibit internal behavior.
Refactoring each object demanded a heavy investment of time, not just in
coding, but also in preparatory documentation. In the preparatory documenta-
tion, I defined the object model, the supporting object state, and the behavior
of each Public procedure. I implemented the core procedures, including inspect
and object2XML. I built a form to show off the object, which is a pain when you're
born to code and not to be a glorified Etch-a-Sketcher.9
Too often, quality is mapped onto time to market. Dan told me he wanted
a quality book, with quality software, and I hope that the use of the object-
oriented paradigm here and in the rest of the book will ensure this. The tlyover
mini-compiler of Chapter 3 and the bnfAnalyzer COM object of Chapter 4 were
merely small applications by comparison.
9. In particular, it made me crazy to have to select label colors. What is the color of a scanner?
What is the color of a variable type? "Colorless green ideas sleep furiously" (Noam Chomsky).
The overall goal is to have each form highlight its labels with a memorable primary or bright
color, like a property card in the game of Monopoly. Thus, the scanner's color is dark blue.
130
The Lexical Analyzer for the QuickBasic Compiler
Challenge Exercise
From the code of the scanner and/ or the text of this book, reverse-engineer the
regular expression that defines each token type supported: identifier, operator,
number, string, and so on.
For example, a string is defined using the regular expression:
"([A"]*(""){O,l})*"
Resources
For more information about regular expressions and compiler design, refer to
the following:
131
ChapterS
132
CHAPTER 6
OuickBasic
...
Object Modeling
The law wishes to have a formal existence.
- Stanley Fish
IN THE PREVIOUS CHAPTER, you saw how the lexical analyzer, or scanner, trans-
forms the raw characters of source code into a stream of token objects, where
each token object has a start index and a length. In the next chapter, you'll see
how this stream of token objects is converted to a nested structure of BNF gram-
mar categories, as described in Chapter 4, while also emitting output code for
a "machine," which exists purely as a software simulation of the Nutty Professor
machine.
But before we get to the flagship object quickBasicEngine in the next chapter,
we need to build two .NET objects, qbVariableType and qbVariable, to represent
data types and their values. And to do that, we need to model the data, since it's
always a bad idea to develop a language- whether for the .NET CLR or any other
platform-without a clear idea of how to represent the values and types of values
of the target language. We did not need to concern ourselves with these issues in
the flyover compiler of Chapter 3, because all the values in that example are num-
bers that were easily mapped to the double-precision number type (which is able
to handle integers as well as real numbers). However, in scaling up to our QuickBasic
compiler, we need to do some hard work. We want to make sure that no variable
is instantiated in our compiler without complete, strong typing. The payoff is
that all parts of the quickBasicEngine speak the same language about variables.
In this chapter, we will go through the same cycle of design, code, and test as
in previous chapters. The GUIs for testing the implementations of the QuickBasic
variable type model will give you a hands-on demonstration of what is, honestly,
rather dry (but necessary!) material.
133
Chapter 6
Scalars
Scalars are simple Basic values that can be anyone of the following types:
• Boolean-True or False
• String-Strings of characters
134
QuickBasic Object Modeling
NOTE Older Visual Basic developers may remember that strings in Visual
Basic through release 3 were limited to 64KB, in all probability because the
C-language runtime represented string length in an unsigned 16·bit C integer;
which can range from 0 to 211.16-1 (64KB). Quic/cBasic shared this limit before
release 4, which produced interesting bugs and fascinating hacks for longer
strings. For example, in those pre-object days, I wrote a procedure in a classic
Visual Basic module that stored 64KB chunks in an array. Visual Basic 4 made
it possible to rewrite this as a "long string" object much more elegantly and
without exposing the array, but it also removed any need for the object, since
Visual Basic 4 increased the string limit to about 211.32.
1. QuickBasic shared this surprisingly narrow integer range with Visual Basic releases 1 through 6.
Its narrowness results from the fact that in QuickBasic's salad days, microcomputers still often
worked in words (units) of memory that were only 16 bits long.
135
Chapter 6
decimal point left in, or (when the decimal point is unspecified) implied to the
left of the mantissa. Here is an example: -1.2e-3 =-1.2 * 10"-3 =-.0012.
Variants
You are probably familiar with the variant, which is a variable no longer supported
by .NET. Considered strictly as a type, the QuickBasic or old Visual Basic variant
is a container for another type. The contained type can be a scalar, a null, an
unknown, or even an array or UDT, but it cannot be another variant type. Variants
cannot nest within each other.
NOTE If a lIariant could contain a lIariant, this would raise the hard-to-
model possibility ofmultiple-lellelllariants. Code using such lIariants would
be hard to debug, and in the absence of object-oriented design, complex lIec-
tors and tables would be needed. Howeller. this situation is easy to model in
object-oriented design. The lIariant-containing lIariant would simply contain
an object: a distinct lIariant. One problem would be alloiding loops, presum-
ably in object inspection, where the same object appears more than once and
the lIariant directly or indirectly refers to itself.
We need the abstract variant for arrays of variants because, in the model,
we cannot declare that an array type is "array of variant integer." In QuickBasic,
a variant array is declared simply as a variant, and it's not possible to declare that
"my array is of variants that must contain integers."
136
QuickBasic Object Modeling
In the pre-.NET runtime, each variant had to carry type information, familiar to
coders of APls, and each variant was a vector of storage. As such, some extra code
had to be executed to get to the value of the variant, or to "unbox" the value. And,
of course, the extra bits took space.
But note that experts on software performance, including Steven Skiena in his
1997 book The Algorithm Design Manual, urge programmers to avoid penny-
wise, pound-foolishness. The major determinant of efficiency is, as Skiena shows,
the overall form of the module's execution time formula. For example, if it mul-
tiplies the number of input records by itself in an MIS program, the program
will run fine for small sets of test data, but it will crash when it goes live.
Many MIS programs still have a classic loop form, which means that their exe-
cution time formula is the multiplication of the constant time for processing
a record times N, where N is the number of records. Skiena's lesson is that you
don't want this to be multiplied by N itself. For example, it's obvious that when
sequentially processing a large table, you should not search the entire table or
a table of nearly equivalent size for each record in the table.
The straightforward formula N" K for the efficiency of a program (number of
records times a constant time) will not be substantially changed by replacing
a scalar with a variant, especially when there is a good reason for doing so. The
replacement does not alter the overall execution time formula; it changes only
the time for processing a record by a small, fixed amount.
In COM, there was often a good reason for using variants: they provided a lim-
ited object-oriented capability. They were useful for a primitive, and admittedly
unsafe, form of polymorphism when the COM programmer needed to represent
fuzzy data. For example, a real-world application might exist with an orderQty
(order quantity) in which the user needs to represent a fixed and known num-
ber, a completely unknown value, or a range of values. An example is that some
customers might like a minimum quantity in cases where the order cannot be
fulfilled from the warehouse in full and on time. If the variant effectively and
in context represents this situation-as an integer, a null for the completely
unknown scenario, a string representing a minimum quantity, or even a com-
plex formula (translatable by the compiler itself as a business rule, as I will
show in Chapter 9)-then the variant is a technique for representing an object
in a rather lightweight fashion.
It is also said that variants take too much space. However, depending on the
type, they take a small, constant amount of space. Variants take too much
space only when their additional bytes are repeated in large arrays.
Urban legends about variants can therefore be laid to rest.
However, there are many stressed-out programmers trying to maintain code in
which the overuse of the variant has created a toxic smog. In this toxic smog,
you cannot tell when you look at vntFoobar what it might contain! If you have
experience with older Active Server Pages (ASP) or Visual Basic for Applications
(VBA), you know what I mean-in VBA for ASp, everything was a variant, and it
drives you nuts.
137
Chapter 6
Arrays
Arrays should have the following properties:
• Upper bowtd: This denotes the variable type of each array entry, as a con-
tained qbVariableType (each qbVariableType has a qbVariableType delegate).
• Scalars
• Variants
• Arrays
• UDTs
2. This feature has been largely found to be useless, or not useful enough to warrant modifying
the architecture of .NET.. NET architecture is based on the runtime semantics of C, and C did
not include this dubious feature.
138
QuickBasic Object Modeling
Unlike variants, UDTs are recursive and can contain UDTs. Within a QuickBasic
UDT, a COM Visual Basic UDT, or a .NET structure, the As clause can be itself a UDT,
with one exception: it cannot be the same UDT as is being defined.
UDTs cannot contain unknown or null data types.
139
Chapter 6
type b when all possible values of type a can, without loss of information or error,
be assigned to a variable of type b. Table 6-2 shows the convertibility of types in
the model.
Note that two arrays cannot be converted to each other in this sense, because
the assignment of an entire array is not supported, and because changing any
attribute of an array makes a distinct type. While the assignment of UDTs is sup-
ported, this is only possible when their list of members is identical, thus no
conversion is involved.
Figure 6-1 shows some examples of how the object should behave. In the fig-
ure, boxes with a heavy border represent full-scale qbVariab1eType objects, with
state; boxes with a light border denote the "lightweight" enumerator (ENUvarType)
that represents the variable categories as one of unknown, null, Boolean, byte,
integer, long, single, double, string, variant, array, or UDT.
140
QuickBasic Object Modeling
Integer
Array of Integer
lot"" II Dim LB UB
Array of Variant
lot"" II Dim LB UB
Variant
NOTE The list of lower and upper bounds for an array is sometimes referred to
as a dope vector. This has nothing to do with the Three Stooges or Five Stupid
Guys. It merely provides the "dope" about the area: the information.
141
Chapter 6
3. What I mean is that it's based on the palindrome-a string like "aha," which reads the same
forward and backward.
142
QuickBasic Object Modeling
qbVariableType Serialization
We need a language for describing types, since only the simple scalar types can
be identified by enumeration. There are, strictly speaking, an infinite number of
different array types, because each change in dimension or bounds makes a new
array type.
There are also an infinite number of different UDTs. Variants, which can
contain arrays, only complicate the issue. Therefore, the toString method of
qbVariableType and qbVariable should return an expression that, in all cases, is
acceptable to the fromString method arid re-creates a clone of the original object.
For a simple scalar, null, and unknown type, this may be the name of the
type and one of null, unknown, Boolean, byte, integer, long, single, double, or
string. For a concrete variant known to contain a scalar, unknown, or null type,
this can be an expression of the form Variant, type, where type is the name of the
simple type. For example, a variant that contains an integer is represented as
Variant, Integer.
The fun starts when we deal with arrays. We need to identify the fact that we
have an array, identify the type of its entry, identify its number of dimensions (one
dimension is probably used in 90% of all MIS programs, but you never know), and
in QuickBasic (as forVlSual Basic COM), we need to support nonzero lower bounds.
Therefore, for an array, the type expression should be Array, type,boundList. The
type is the type of each entry. The boundList specifies both dimensionality and
bounds, since it will be a list of 2* n entries, where n is the number of dimensions.
Each list entry will be of the form lowerBound, upperBound. For example, a two-
dimensional array of integers might be Array, Integer, 1, 10,0,5 if it contains ten
rows (numbered 1..lO) and six columns (numbered 0.. 5).
The need for the abstract variant arises at this point. As noted earlier in the
chapter, in specifying a Visual Basic COM or QuickBasic array, you cannot say
"this array is restricted to variant integers." Therefore, the toString/fromString
language for arrays cannot allow the syntax in Array, (Variant, Integer) , 1, 10,0,5.
Instead, the syntax must be Array, Variant,l,10,O,5, and the type should be an
abstract variant that contains the unknown type. 4
A UDT is represented as UDT, typelist, where typelist is a comma-separated
series of parenthesized type expressions. Each parenthesized type expression
should be in the form name, type. In this form, name is the member name, and type
4. I created the unknown type because of an ultimate goal to write a compiler and an inter-
preter for full symbolic evaluation of business rules and source programs with partially or
fully unknown values, but I was delighted to find a use in this context for this feature.
143
Chapter 6
144
QuickBasic Object Modeling
implemented in our compiler. Note that this syntax allows a variant to contain
an array, as in Variant, (Array, Integer, 1, 10).
qbVariable Serialization
The goal of the qbVariable object is to represent the value and reference the type.
Therefore, the overall syntax of the fromString/toString language exposed by
qbVariable should be type: value. The type should be an expression in qbVariableType's
language. The colon is a safe delimiter, because if you examine the qbVariableType
BNF shown in Figure 6-2, you will see that the colon cannot occur anywhere in
a qbVariableType expression. Colons can appear in values (for example, inside
quoted strings), but what matters is that the leftmost colon must appear if a type
and value are specified.
NOTE One primary reason for using BNF as described in Chapter 4 is the
ability to examine the BNE by hand or automatically with a tool similar to
bnfAnalyzer, and make sweeping generalizations about the language, such
as that a colon cannot appear in a serialized qbVariableType.
146
QuickBasic Object Modeling
• If the value is a single string in quotes and using Visual Basic rules, it is
converted to a string.
• If the value is the keyword UNKNOWN or NULL, the type is the corresponding type.
147
Chapter 6
fromSt:rlliq fromScn.nqType
fromStrl.nq fromStrl.nq7a ue
fromSt:ru:g :'"' fromScr1ngi~ chValue
fromScrl.I:q fromScrinqType COLON fromScr1nqValue
fromSc:n.nq COLO. fromScr1nqVa:ue
fromSt:r1ngType := baseType udc
beseType := 31mpleType var1enc:ype errey:"ype
sl.mpleType := : 'T t:ypeName
t:ypeName := BOOLEAN I3YTE INTEGER LONG SlNGLE DOOBLE I S!RING I
ON. .<NOWN I NULL
variantTy~e := abst:ract:Var1ant-ype COMMA varT~~e
varType := 31mpleType (arrayType )
arrayType :a ,VI: ARRAY,arrType,bo ndLl.sc
arrType :- 3impleType ab3t:ract:Var1antType parODI
parODI := LEFTPAREN7HESIS udt R:GHTPAREITHESIS
udt := :VI UOT,t:ypeLl.st
typeLl.sc :- parMeroberType : CO!1HA cype:1.3t J
parHemberType := LEFTPAR MEMBERNAME,baseType R GHTPAR
ab3craccVar1anc!ype :- :VI VARIAN!
bound:l.3C :- boundLl.3tEntry boundL~3tEntry C01~1A boundL1.3t:
boundL1StEntry := 90ONDINTEGER,30ONDINTEGER
31mpleType : - :V1:" c ypeName
t:ypeNa.me : = 300!..EAN !lY"!E INTEGER I :'ONG I S!NGLE I !>OUSLE I STRING I
\JNKNOWN NULL
varl.ancType := abscractVarl.ancType,var:"ype
varType : = 31mpleType (arrayType)
arrayType := "VI: ARRAY,arrType,boundL13t
arrType :- 31roP1eType ab3cract:Var1ant:Type
ab3t:ract:VariancType := : '7: VA.!l.lANT
boundL13c := boundL13cEntry boundL1stEncry, boundL~3c
boundL1sCEntry := aOUNDINTEGER,30ONDINTEGER
fromScr1nqValue :- ASTERlSK I fromStrl.nqlondefault
fromStrl.nqNondefault := arrayS l.ce ( C01~ fromScrl.ngVa ue
arraySll.ce := elementExpressl.on (fromStrl.DgNondefault
elemencExpre3310n :- e ement " repeater .
elemenc := scalar decoValue
scalar :- NUMBER VBQOOTEDSTRING ASTERISK I TRUE FALSE
decoValue := qu1ck3asl.cDecoValue netDecoVa ue
QU1ck3asl.cDecoValue := QOIC~ASICTYPE ( scalar )
netDecoValue :- netDecoValue := : SYSTEM PER:OD : _DEtITIFIER
LEITPARENTHESIS ANYTHING RIG:!.!PARENTHESIS
repeater :- LEF7PAR ( lNTEGER I ASTER:SK ) R:;:G~PAR
148
QuickBasic Object Modeling
String String
Array Collection
Of course, this isn't as straightforward as it looks. Let's see how the map-
ping works.
Scalar Mapping
We start off easily, with Booleans represented by .NET Booleans and bytes repre-
sented by .NET bytes. QuickBasic integers are represented accurately in .NET by
short integers. However, .NET integers are 32-bit and cannot accurately repre-
sent QuickBasic integers.
NOTE Of course, all QuickBasic integers in the 16-bit word can indeed be
represented by .NET integers in the 32-bit word. However, code that depends
on the word size for accuracy won't work the same. You may think that code
should not depend on the word size, and it shouldn't in most cases; nonethe-
less. the compiler and runtime must account for this fact.
149
Chapter 6
QuickBasic long integers, which are 32 bits, are represented by .NET integers.
Now things get a bit messier. QuickBasic single-precision reals are represented
inaccurately by .NET singles. The representation is inaccurate because no .NET
floating-point tool, out of the box, provides support for QuickBasic floating-point
values precisely. To represent QuickBasic single-precision reals, we would need to
write a software simulation for this type of floating-point representation (and we
won't do this).
QuickBasic double-precision reals are represented, again inaccurately, by.NET
double-precision reals..NET is more mathematically accurate, but the older inac-
curacy isn't accurately simulated (whew).
QuickBasic strings are represented accurately by strings in the compiler
because a limit of 64KB characters is actually enforced, both to be accurate and
also as a sort of trip down memory lane, back to when strings, outside the C lan-
guage, were severely restricted in length. This retro feature can be suppressed by
a compiler option, but you need to go to the code for details.
Array Mapping
Arrays are represented by a collection with the following constraints, which will
be checked by the core inspect method of qbVariable:
It would be a bad mistake to map QuickBasic arrays onto .NET arrays. Each
.NET array has a fixed dimensionality, even when it is dynamically allocated using
ReDim. We would go crazy trying to represent a QuickBasic array using a .NET array,
creating and re-creating arrays.
It would be a simpler matter to just represent any QuickBasic array using
a single .NET array of objects with one dimension, and convert multidimensional
subscripts to a single .NET subscript The math is easy. However, the overriding advan-
tage of the collection approach is that much of the access can be pushed down
into an independent object that is outside the compiler. This object is named
collectionUtilities, which provides several tools for working with the classic col-
lection, including tools for serialization and deserialization, and accessing recursive
collections that contain subcollections. The collectionUtilities tool is available
from the Downloads section of the Apress Web site (http://WtIW.apress.com) , in the
egnsfl collectionUtilities/bin folder.
150
QuickBasic Object Modeling
of a type at all. Furthermore, inheritance would not solve the problem that the .NET
representation of the type might be at odds with the inherited type attributes.
Therefore, I decided to use delegation rather than inheritance. The next sections
describe the details of the qbVariableType and qbVariable implementations.
qbVariableType State
In terms of object taxonomy, it's obvious that qbVariableType will have a state.
Figure 6-4 shows the State section of the code.
152
QuickBasic Object Modeling
Most fields in the state are self-explanatory, but the objTag is a bit of a mys-
tery. It supports the core Tag property shared with a number of the compiler
objects, including qbVariable. This allows us to use code to add objects and data
to a variable type instance in a spontaneous manner to meet extra needs in the
compiler, or in any program that uses qbVariableType as a stand-alone object.
The Tag property is not visible to the user of the QuickBasic system; rather, it is
a convenience for using the object.
The additional fact that the qbVariable object must, in many cases, delegate
when a variant, an array, or a UDT contains one or more types means that a full-
fledged stateful object, using the core methodology introduced in Chapter 5, is
needed. Therefore, qbVariableType informally implements the core methodology
as a stateful object.
At all times, a qbVariableType instance is either usable or not usable. The
instance becomes usable at the end of a successful new constructor call, and it
remains usable until the object is disposed of or a serious internal error occurs.
qbVariableType also has a Name property, which defaults to qbVariableTypennnn
date time to identify the type in XML output and other messages. This doesn't
name the type; it names the object instance.
153
Chapter 6
<1--
.*.***** ••• * •• ****** •••••••••• ******** •••• _****.*.**** ••• ***.***.*********.*****.*.*
• ..
• variableType •
• •
• •
• •
• This class represents the type of a quickBasioEngine variable, inoluding support ..
.. for an unknown type and Shared methods for relating .Net types to Quick Basio *
* types. *
*
..*
..
.. This olass was developed o~~ncing on 4/5/2003 by
*
.. Edward G. Nilges
*
.. spinoza11l19yahoo.COM
*
* http://members.soreenz.oom/edNilges
*
.* •
*
• *
.. This instance represents the following variable type:
• *
* Type:
*
*
* M~~r1: Variant containing 32-bit Long integer in the range -2**31 .. 2**31-1 ..
.. Member2: Variant containing 32-bit Long integer in the range -2**31 .. 2**31-1 ..
.. Me~r3: Variant oontaining Boolean ..
.. End Type: total size is 3 ..
• *
..* •
*
* CACHE INFO •
*
.. A.cache of recently parsed variable types is maintained to save time: here is •
* the state of the cache.
.. *
* Caohe status: available •
.. Cache maxiM~~ size: 100
* Caohe current size: 7 *
• Cache contains: "Unknown", "Variant, Lonq", "Long" I "Variant,Boolean ll , . . , .
*
•
•••••• ** ••••••• ** ••••••••••••••• **.*** ••• ** •••• * ••••••••••••••••••• *** •••••• * •••• * ••*
-->
<qbVariableType>
<!-- Indioates the usability of the objeot -- >
<booVsable>True</booVsable>
<1-- Identifies the objeot instance -->
<strNa~e>qbVariableType0001 3/4/2004 6:25:56 ~K/strName>
<.-- Identifies the variable's type -->
<enuVariableType>vtVDT</enuVariableType>
<1 -- Identifies the type of a contained variable -->
<objVa>::Type>
(1
~quot ;"lemberl~quot;
Va>::iant,Long)
(2
As shown in Figure 6-6, this XML includes type information when the
instance contains a type as part of the objVarType tag. In all cases, this can be just
the toString method output (with commas changed to newline for readability)
because it contains all the information about the embedded type, rather than
154
QuickBasic Object Modeling
the complete XML for the embedded type. In the example, a UDT is shown with
three members as seen in the comment block.
<qbVariableType>
<!-- Indicates the usability of the object -->
<booUsable>True</booUsable>
<!-- Identifies the object instance -->
<strNama>qbVariableTypeOOOl 3/4/2004 6:51:17 AM</strName>
<!-- Identifies the variable ' s type -->
<enuVariableType>vtUOT</enuVariableType>
<!-- Identifies the type of a contained variable -->
<objVar'l'ype>
(1
'quot ; Nember1'quot;
Variant , Long)
(2
'quot;Member2'quot;
Variant , Long)
(3
'quot : MAmber3'quot ;
Variant , Boolean)
</objVarType>
<!-- Identifies the bounds of an array type -->
<colBounds>L~ptyCollection</colBounds>
<!-- Identifies type ordering -->
<colTypeOrdering>noColleetion</colTypeOrderino>
<!-- Identifies type containment -->
<booContained>Unalloeated</booContained>
<!-- User's tag -->
<objTag>'quot ; 'quot ; </objTaq>
</qbvariableType>
~
IlIapeotlon of "qbVarlableTypeOOOl 3/4/2004 6:57:44 AM" (1lDT. (_rl,VarilUlt.Lonq). (Mellber2,Variant,Lonq) .~
,(HMberl,VArlant,Boolean) ) at 3/4/2004 6:57:50 AM
I
Object iostance au.st be usable: OK
Type au,t be compatible with QODtai.ned value and/or: bouDCb: OK
Type auat be cooapatible .nth contained value and/or bounda: OK
Since tho container i:l it. 0D'f', t.he contai.n~ ~ "bould be a c.ollection of ..mer.
Contai...Ded vari&b.lt! type(s) 1II.u.t ~. their own inspection: OK
I
I -
l
.t..
I Close I
Figure 6-7. qbVariableType inspection
155
Chapter 6
Note that inspection is two layers deep, because a UDT containing three
scalars as seen in Figure 6-5 is two object levels, and qbVariableType always
inspects its constituent objects. For each object, the instance must be usable.
The type must be compatible with the contained value and bounds. For exam-
ple, a user type must have a nonempty collection of members.
Inspection also clones the object to make sure the clone returns a toString that
matches the original toString. For all objects, object2. fromstring(objectl. tostring)
must create the clone of objectl in object2, and the compare method for the two
objects must return True. Both assertions are checked in the inspection.
A Note on Inspection
Multilevel inspection is time-consuming, and objects that may have multiple
levels are inspected when they are disposed. This may turn out to be exces-
sively time-consuming in a production compiler. In a lab compiler, however, it
is a benefit.
As I scale up, I may need to eliminate, or make optional, the inspection that
now occurs in qbVariableType (and most statefui objects) when the object is
disposed. For a large object, it will clone the object and its constituents, and
this could be too time-consuming even in the lab.
But I would rather preserve the inspect routine and just limit its scope when it
"notices" that the object is "large," because software objects should be reliable.
Unreliable objects give object-oriented programming a bad reputation.
Hardware is built with all sorts of self-checks. I do not understand why any
form of checking in production is suspicious in the real world. But perhaps in
reaction, I sometimes overdo it.
qbVariableType Testing
The .NET application qbVariable1'ypeTester.exe is available from the Downloads
section of the Apress Web site (http://WIWJ.apress.com) , in egnsf\apress \qbVariable \
qbVariableTester\bin. You can use this application to test the qbVariableType object.
Run it to see the screen shown in Figure 6-8 (after an introductory and one-time
screen).
156
QuickBasic Object Modeling
Venebl. types: click 10 ""loci: doubIB cick '" eet _ aIIfDc:t Cleer(olQ Creel. RIlndom Types
Slat us
* Finally. tha "Sere~8 Te~tft bueton will 3~re3~-tese this object. Click Test, ~i~ •
~ back, and .A~ch ehe progreD~ of te5cing in ~he ~ea:u~ box. This will provide
• same assurance i! you'~e chanqed the ~ource code .
••••••• _~._ •• 6.* •• ~ • • __ • • • ~ • • • • • • • • • • • • • • • • ~ • • ~.* ••••••••••••••••••••••••••• •••••••
~
.
3/4 / 2004 8:07:16 AM end of Abou~ information
3/4/2004 8:07:17 kH Loadin
: complet,..
qbvariabJei:y~Tester
OK
Click the Inspect button to see the internal inspection report, as shown in
Figure 6-9. This applies a series of assertions about the state of the object to the
157
Chapter 6
state, as we've seen. Here, the contained type is a full-scale type in its own right
and cannot (as in Figure 6-7) be represented as a toString.
'------------" -'<.
-,-.~-
.~~---~-
~- --
••• rn.pectiOD of CODtaiDod data type object qbVariableType0002 8:14:41 AM ....... . ~/4/200'
InspeoUon of "qbVui.ableType0002 3/4/2004 8:14:41 AM" (lnteqer) at 3/4/2004 8:19:54 AM •
Close
qbVariablif ypeTester
2-<limenSIOnal array W1th 10 rows (from 1 to 10) and 10 columns: (from 1 to 10): each element
has the type 16-bit Integern the range -32768 .. 32767: total siZe is 100
OK
Note that each time you create a test object using this interface, the object will
be inspected. If an internal error is found, areport will appear. These reports won't
appear for syntax errors. For syntax errors, asimple dialog box will appear, and
the object will not be created.
158
QuickBasic Object Modeling
Converting to XML
Click the object2XML button to convert the object state to XML. Figures 6-10
and Figure 6-11 show the results.
<1--
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 • •
• varhbleType
• 2-dt.ena1onal array with 10 row. Ifroa 1 to 10) end 10 co1U1111ls: Ifr.,. 1 to 10):
• ....ch e1-..t baa the type 16-b1t Inteqer in the ranqe -32168 .• 32167: totel .ize
• i . 100
....................................................................................
Close
Figure 6-10. The XML description of the data type starts with a comment block.
159
Chapter 6
<qbVariableType>
<1-- Indicates the usability of the object -->
<boousable>True</booUsable>
<!-- Identifies the object instance -->
<strName>qbVariableType0001 3/4/2004 1:02:57 PMk/strName>
<!-- Identifies the variable's type -->
<enuVariableType>vtArray</enuVariableType>
<!-- Identifies the type of a contained variable -->
<objVarType>Integer</objVarType>
<1-- Identifies the bounds of an array type -->
<coIBounds> (1,10) , (1,10)</coIBounds>
<!-- Identifies type ordering -->
<colTypeOrdering>noCollection</colTypeOrdering>
<!-- Identifies type containment -->
<booContained>Onallocated</booContained>
<!-- User's tag -->
<objTag>""</objTag>
</qbVariableType>
Figure 6-11. The XML description of the data type contains the object state.
Note that the object2XML output of Figure 6-10 shows a cache. The cache is used
to avoid unnecessary parsing of fromString expressions. Each time a new fromString
expression is presented to the object, the object parses the expression and saves it
in a keyed Collection, such that the key is the fromString expression. The object is
saved by fonning its comprehensive, or "deep," clone. Later on, when a fromString
is presented, the object can check the cache quickly for a copy of the required
object.
The XML comment describes the state of the cache. In the example, four
fromString expressions have been parsed and cached. Up to 100 fromString
expressions can be saved in this way. Caching saves time. For example, running
the nFactorial QuickBasic program using the Nutty Professor interpreter intro-
duced in Chapter 8 takes about 60 seconds on a contemporary system from start
to finish when no caching is performed; it takes about 40 seconds with caching.
Stress Testing
You can conduct a comprehensive stress test of most of the functionality of the
object. Click the Stress Test button to see a progress report, as the object exe-
cutes about 50 self-tests. On completion, click Yes to see the test report shown
in Figure 6-12.
160
QuickBasic Object Modeling
......................................................................................... ~
·. ·.
.. <I ... ... •
I
·.
... .. vA.r1ableType
I
••
!
·.
... ... Thi. cla •• represents the type of a quickBaaicE:nq1.ne variable, inoluding' .upport ...
·.
• • for an unknown type and Sbared _tbocb for relatin<J . Net type. to Quick Ba.t"
·.
.. .. types.
Scroll down through the test report to see a series of random (but repeatable)
tests that exercise the functionality of the object and the syntax of its fromString
expressions. You can also click the Randomize button for nonrepeatable tests. The
object is continually self-inspecting during these tests.S
5. While passing these tests proves nothing, in the sense that nothing proves software correct,
the tests have been invaluable to me in regression testing while changing the source code.
They will be of equal value to you if you change the source code.
161
Chapter 6
• defaultValue(type) provides the default .NET value for a type. For exam-
ple, it will return 0 for an integer type and a null string for the string type.
All of these shared methods and all the unshared properties, methods, and
events of qbVariableType are fully documented in the qbVariableType reference
manual (see Appendix B).
• Record the variable name (different from the object instance name in the
Name property)
qbVariable State
The value of qbVariable is recorded in the generic .NET object, objValue, as part
of the object state, as shown in Figure 6-13.
NOTE The objTag is the same as in the qbVariableType state, described earlier
in the chapter. It supports the core Tag property shared with a number o/the
compiler objects, including qbVariableType. This allows using code to add
objects and data to a lJariable instance in a spontaneous manner to meet
extra, unforeseen needs. For example, the compiler stores the lJariable's index
in its collection o/lJariables in the lJariableTag.
162
QuickBasic Object Modeling
The objValue object has the following values, which are checked for consis-
tency with the qbVariableType delegate, which is in objDope (so-called because the
variable type gives us the "dope" about the variable). This check is performed by
the qbVariable. inspect method.
It would be very glamorous in the academic sense to make the scalar entries
of arrays qbVariable values instead of .NET values. This would appear to reduce
multiple objects to one object. But if a large array collection consists of a mas-
sive collection of stateful qbVariable values, each will burden the CLR heap, and
each will take time to allocate, create, and access. To avoid this overhead, array
elements are .NET scalars. They are protected against outside tampering by the
array qbVariable.
163
Chapter 6
qbVariable Testing
The qbVariableTest.exe application, available from the Downloads section of the
Apress Web site (http://www.apress.com) , in egnsfl apressl quickBasicl qbVariablel
qbVariableTest/bin, will test the qbVariable object and allow you to enter expres-
sions in qbVariable's tostring/fromstring language that specify value and type.
Figure 6-14 shows the program's interface.
164
QuickBasic Object Modeling
em Clone
I
-=J
Inspect Cleer R8<&I Ot9pose GoIV.lue
Status Zoan
3/4/2004 1:34:00 PM Loading
3/4/2004 1:34:00 PM Loading compleee
3/4/2004 1:35:16 PM Creaeion of the variable type Variane,Ineeger:vtlnteger(32767)
3/4/2004 1:35::6 PM Creat~on of ve comoleee
/4/2004 1:35:16 PM Inspect10 lick OK to v~ew report" cl~ck C
165
Chapter 6
<1--
If cFVar lMle
.
.. 'Fbi. cl . . . repre ••nu the t.ype and value of • Quick B.ulo variable ..
.
~ qinozal111tyahoo . CCM
., bt tp: II~r• . eoreenz . COIa/odHil90.
,-
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . "" • • • • • • • l1li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
-->
<qbVutoblo>
<I -- lncl.lc.at.e. the uaablilty of the obj&Ct. -->
<boo11.oble>True<!boofJ.oble>
<t -- Ideaotlfiu the object a.tanee -->
<.trN.... >qbV.. riableOOOl l/4/2004 1: 35: 1115 PM</ilt:dfame>
<!-- ldentifiea the vubJ,le --> I
<.trV&rt.abl.e1~aao>'VaZIateqeroao4<I.trV&ri.ableNUIe>
<!-- True tndic&t4Ia that tbe variable OUle baa def.ult value -->
<booVariableNa.eDef aul t s>True</booVar iableN~f.ult.>
<! -- Dope a. " qbV&t'iableType -->
<Obj~>
<'--
I
.. varhbleType
Close
166
QuickBasic Object Modeling
In effect, a data type is, in many cases, a business rule. Whether the CEO wills
it or not, restricting gross pay values to a 16-bit integer representation means
(1) all employees must be paid in whole dollars and (2) no employee will earn
more than $32,767.00 per pay period.
qbVariable and qbVariableType allow you to obtain one-dimensional serializa-
tion in the form of toString/fromString expressions, or more self-descriptive
and two-dimensional XML tags. Either can be stored in a database field without
loss of fidelity. However, these objects do not support an xm120bject method.
I have never found a fully satisfactory XML parser (commercial parsers tend to
enforce goals I don't necessarily share). XML is easy to parse using recursive
descent, but the issues are very complex when you consider the size to which
XML can grow and the resulting need for incremental parsing of part of an
XMLfile.
This book could, I suppose, include a chapter on "XML parsing for fun and
profit," because an XML parser, like a classic compiler, can consist of a lexical
analyzer and parser. The problem is that there may be no profit in getting
involved in XML wars. It's also no fun to fight with the boss over which parser
to select.
167
Chapter 6
look at my wonderful code, and good luck." But that wouldn't be very helpful.
Instead, I needed to describe the data architecture of the compiler.
Managers may tear their hair when programmers break a problem down into
unforeseen components; the user wanted X, not X. Y. Z. But one lesson that can be
derived from any number of disasters is that objects, and before them reusable
procedures, want very much to have a formal existence.
In actual programming, it is common for a programmer to see needs beyond
the formal requirements written by nonprogrammers, and it is important to be
able to communicate these needs. For example, the programmer may understand
that a problem is best solved by a language that describes instances of the prob-
lem and write a compiler for that same language.
No tedious recitation of individual procedures and what they do can replace
the broad understanding afforded by describing what an object is. "This object
represents all possible QuickBasic variables" summarizes an intuition and is, for
this reason, better than a list of procedures, all of which must run in harmony to
provide the unmentioned variable object.
In this chapter, you've seen the design, development, and testing of the
qbVariableType and qbVariable objects. Like the qbScanner object introduced in
Chapter 5, creating these objects involves a disciplined methodology for require-
ments analysis, detail design before coding, and developing the core procedures
(including Name, inspect, and object2XML). This takes time.
I have found the payoff for this object-oriented approach to be large, and one
that fulfills the unmet promise of the structured methods of old. Structured pro-
gramming promised, but did not (for the most part) deliver, chunks of code that
would snap together with a satisfying "thunk," "ka-chunk," or "bada-bing." The
modular and structured methods often failed to obtain the desired productivity
gains, because they only seemed to make bad designs worse through incorrect
problem analysis. Object-oriented design, done right, seems indeed to deliver the
hoped-for "ka-chunk" sound. Because the prerequisite for delivering the object is
so much analysis, it has a tendency to concentrate the mind (kind of like the
prospect of a hanging).
But both structured programming and object-oriented development require
a big time investment. Because of this, management put the structured methods
on a very back burner in the 1970s and now gives object-oriented development
low priority. For this reason, it is best to practice the arts described here, and ear-
lier in Chapter 5, in secret. In particular, it is just wrong to describe this art as
"better," when at your office, better means faster.
Summary
This chapter has been a real forced march, and I do hope you are as tired as
I am, but not tired of me.
We have done a requirements analysis as mere programmers of what it
means to fully support a classic variable, whether in QuickBasic or old Visual
168
QuickBasic Object Modeling
Challenge Exercise
Translate the following variable types and variables to the fromString/toString
notation of qbVariableType and qbVariable. Make sure you can create and inspect
the variable types using qbVariableTypeTester. Make sure you can create and
inspect the variables using qbVariableTest.
• As a type: an integer
6. Of course, you may have other ideas. Please e-mail me your thoughts at
[email protected].
169
Chapter 6
• As a value: the array in the previous item with zeros in all entries, repre-
sented using the shortest possible fromString
Resources
For more information about data modeling for compilers, refer to the following:
170
CHAPTER 7
IN HOLLYWOOD'S TERMS, this is the chapter where the widow in arrears is tied to
the railroad track by the heinous landlord, young Jack overtakes the onrushing
locomotive to rescue the distressed widow, gets the widow a second mortgage
on the Web, sends the villainous landlord to rehab, and organizes a men's retreat
for the boys back at the ranch.
Or, if you prefer, this is where Luke Skywalker defeats the Dark Side of the
Force and finds that dad is Darth Vader, which just goes to show you.
After much object development at the mother ship, we have arrived at the flag-
ship, and indeed the largest object in our compiler object fleet, quickBasicEngine.
A brilliant editor of mine has said, in so many words, that programmers are homeys
who be chillin' when they see code. What I think he meant was that I need to
supplement a theoretical discussion with at least a mad dash through the over-
all solution architecture of quickBasicEngine and its roughly 10,000 lines of code.
I will postpone discussion of the onboard Nutty Professor interpreter that is
included in quickBasicEngine until the next chapter. Here, I will cover the parser
algorithms as generated from the BNF definition of the language described in
Chapter 4.
This chapter will show how the parsing procedures can be manually, but
rapidly, cranked out as multiple-algorithm implementations using a simple set
of rules. You will see how the compiler generates individual instructions as
objects and how this allows us to associate as much data as is appropriate to
each instruction, including data that ties the instruction to the source code to
aid in debugging. Just because we're implementing a legacy language, there is
no reason for using retrograde methods from the dawn of man.
I will also introduce the fascinating topic of compiler optimization, demon-
strating how constant expressions are evaluated by default during parsing and
how the compiler eliminates unnecessary operations in a safe manner. Finally,
171
Chapter 7
I will present an end-to-end example of the compilation and execution ofa very
simple program ('Hello world,' everyone's favorite).
Recursive-Descent Approaches
Recursive descent is one of the oldest parsing algorithms. It is not a compiling
algorithm, per se, because it has little to do with scanning or code generation.
It has to do with recognizing the language.
Hero computer scientist Niklaus Wirth, the creator of the Pascal language, 1
said that recursive descent must be used for block-structured languages such
as Pascal. This is extreme, but as a manual parsing method, it is the most
understandable.
Thro general approaches to parsing exist: top-down and bottom-up. In the
top-down, or goal-oriented, method of recursive descent, you decide on an over-
all goal or task and break it down into smaller tasks. In bottom-up algorithms,
you instead run through the series of scanned tokens with auxiliary data struc-
tures, and enter a variety of higher states as these symbols are seen to build
higher structures. Basically, in top-down algorithms, you start with program and
go down to token; in bottom-up algorithms, you start with token and go up to
program.
Both approaches can be automated by parser generators, but on the whole,
bottom-up is better automated because ofits complex data structures. Top-down
recursive descent is easier to understand, and, as a tactical solution to quick pars-
ing, it is nonpareil. So, top-down is the method used for our compiler's parsing
procedures.
1. Nildaus Wrrth was an early proponent of safe, as opposed to merely efficient, computing.
172
The Parser and Code Generator for the QuickBasic Compiler
Recursive-Descent Procedures
If you know how to code individual recursive-descent procedures as methods in
Visual Basic .NET or another language, then using a generator may be overkill
for small to medium parsing tasks, including parsing common languages.
The simplest case occurs when the production is of the form a : = b, where b
as the right-hand side (RHS) may be complex but does not contain any direct or
indirect recursive reference back to a; that is, a does not occur in b, nor does any
173
Chapter 7
part of b break down into an RHS that includes a. This simple case can be handled
by a series of checks for terminals and grammar symbols. In quickBasicEngine, for
example, each grammar symbol method is a Boolean function that returns True
or False. On success, each advances the token index one symbol beyond the end
of the context parsed.
In quickBasicEngine, each terminal is checked by calling the procedure
compiler_ checkToken_. This procedure checks for either a literal terminal value (such
as a comma when expected) or a class of terminal values (such as an identifier).
NOTE As part of the coding convention for the compiler, all Private methods
are terminated with an underscore. All members that are called exclusively by
a higher procedure, including all procedures called exclusively by the Private
method compiler_, are prefixed by the name of the caller. For example, the
name of the procedure responsible for checking both expected token values
and expected types is compiler_checkToken-, with two underscores after
compiler. The first underscore shows that compiler_is private; the second sep-
arates its name from its suffix.
If b Then Return(True)
If Not c Then Return(False)
If Not d Then Return(False)
Return(True)
This simplified code ignores "Cain's amulet"-the index of the next token that
is known by all compiler procedures and available for modification by all compiler
procedures. In the actual code of quickBasicEngine, the index is passed by refer-
ence to the compiler procedures as intIndex. Nearly all modifications of intIndex
(the "amulet") take place at the lowest possible level, when compiler_checkToken_
is called to check for a token.
In quickBasicEngine, nearly all procedures can assume that a successful parse
of a grammar category has advanced intIndex exactly one token beyond the end
of the code that corresponds to the grammar category, and that an unsuccessful
174
The Parser and Code Generator for the QuickBasic Compiler
parse will leave the index unchanged. For this reason, many of the more complex
procedures start with the declaration and saving of intIndex, so that they can
reset it cleanly on a False exit.
NOTE I emphasize precision in handling the index because, for error mes-
sages and debugging, it is important to be precise as to the scope of anyone
grammar symbol!
175
Chapter 7
176
The Parser and Code Generator for the QuickBasic Compiler
Figure 7 -2shows the beginning of the procedure body. The first step raises
an event that indicates to the GUI that a parse is starting. Note that the parse is
tracked using three events: parseStartEvent, parseEvent, and parseFailEvent. In
the "The Dynamic Big Picture" section later in this chapter, you will see how the
GUI uses these events for progress reports.
The next step is to check whether the token index is beyond the end of the
code, and the end of the code is passed by value as intEndlndex. It is not equiva-
lent to the end of the source program or immediate expression, because when
compiling inside a parenthesized sub expression, the end index for the expres-
sion parser, of which addFactorRHS is a part, is the position of the closing
parenthesis. The expression parser calls itself recursively when it finds a paren-
thesized subexpression. This step is not needed in recognizers that can assume
when called that the main index is not beyond the end of the context. But here,
addFactorRHS is called in a recursive loop when it finds a multiplicative operator
and a multiply factor (see the BNF in Figure 7-1) . On entry, it needs to know
whether it is past the end. If this index is past the end, failure is indicated by
calling the compiler_parseFail_ procedure, which calls parseFailEvent and
returns False.
Then compiler_mulOp_ is called to check for a multiplication operator (aster-
isk, forward slash for normal division, backward slash for integer divide, or Mod).
Note that we have, in terms of American baseball, struck out if a multiplicative
operator is not found and must call the parse failure routine in this case. This is
because the BNF requires that the addFactorRHS start with a multiplicative operator.
The next two Dim statements exploit an elegant new feature of .NET: its allo-
cation of local (Dim) variables just in time. The first Dim both declares a save area
for the token index and initializes it to intlndex. The next Dim just declares a string
work area.
177
Chapter 7
NOTE Previous editions ofVisual Basic 6 allocated all local variables on entry
and deallocated them on exit, and their name scope was the complete proce-
dure. In Visual Basic .NET, variables are assigned storage and their default
value (or the value assigned in the Dim statement) when the Dim statement is
encountered. If the Dim statement is encountered in an If .. Then .. Else •. End
I f structure, or inside a loop, the variable may be referred to inside that
structure. However, if the variable is "Dim (1" between I f and Then, it may
not be referred to after the Else and before the End If. The variable loses its
place after control leaves the structure. For practical coding, this means
that variables can be placed near the point of use, which makes code more
readable. The only execution penalty occurs when the variable is defined in
a Do or For loop, and for this reason, variables should be allocated outside
Do and For loops.
As shown in Figure 7-3, the next step checks for a multiply factor to match
the multiplicative operator. If this isn't found, the compile fails. Otherwise, we
can emit code in compiler_binaryOpGen_, which will emit the interpreter's
Multiply opcode.
178
The Parser and Code Generator for the QuickBasic Compiler
NOTE It's true that addition and multiplication are "symmetrical" such that
they can be evaluated either way. But quite apart from the issue of Mod, it is
bad practice for compiler writers to depart from the language specification,
which typically will tell them how to evaluate and in what order. This is
because of the finite precision ofcomputer numbers, which will gilJe different
answers if the elJaluation order is changed.
Then we can proceed to the rest of the RHS, shown in Figure 7-4, for the
case where there are several operators of multiplication precedence, such as
a*b/ c Mod d. We note the current position and recursively call ourselves. If this
returns False, and the noted index is the same as intIndex, this means that we
have gone past the end of the RHS and are finished. Whatever lies to the right
(probably a newline) is the concern of another grammar symbol (probably the
statementBody).
179
Chapter 7
2. Saul Rosen's 1968 anthology of compiler papers, Programming Systems and Languages, showed
it was clearly the Europeans who liked the stack, while Americans like John Backus were opti-
mizing small sets of registers, swapping data in and out. Later on, Calvin Moore's Forth
language showed that in terms of expressivity, deep stack languages were better than
register-oriented languages, including Fortran, for complex languages. In 1979, I imple-
mented a compiler and interpreter in lKB on a programmable calculator for the stack
language Mouse, a simplified Forth. This language had the semantic power of Visual Basic,
including recursion, but used single characters for operations to save space.
180
The Parser and Code Generator for the QuickBasic Compiler
181
Chapter 7
Each opcode is a stateful object, rather than a mere opcode enumerator value
as in the example in Chapter 3. Each opcode exposes the operation defined as an
enumerator, an operand in some cases, a comment, and its source as the start
and the end of the source code responsible for the opcode.
Figure 7-5 shows the following for each of the five opcodes generated for the
Print 'Hello world' program:
182
The Parser and Code Generator for the QuickBasic Compiler
• Each instance is usable (it better be), and its name is qbPolishnnnn, where
nnnn is the sequence number of the object, as generated within one com-
piler invocation.
• The opcode is specified next. The first opcode is pushLiteral, which causes
the Nutty Professor interpreter to push its operand (the string "Hello world")
on the interpreter's stack.
• It is important for debugging tools that the qbPolish object model support
linkage of the object code back to source. Therefore, the next two tags spec-
ify the first scanned token responsible for generating this opcode and the
total number of scan tokens responsible. Only one scan token- the quoted
string "Hello world"-is literally responsible for emitting pushLiteral.
• The next XML tag provides the literal operand for pushLi teral.
• The last XML token for each Polish opcode is a comment that can be asso-
ciated with the opcode.
NOTE To keep the display of the Polish code in XML to a manageable size,
I used the GUl to set an option in the compiler (on the GUl's Tools ~ Options
menu). It suppresses the generation ofPolish opcodes for Rem statements and
the addition of comments to Polish opcodes. However, by default, the compiler
generates Polish opcodes for Rem statements and comments opcodes to allow
the Polish code to self-document at some cost to its speed. When the option to
suppress this material is not in effect, the last XML token's value will be push
string constant.
As you can see, a collection of Polish opcodes is created by the compiler, and
it can be converted, along with the rest of the compiler's state, to XML.
Code Optimization
Ever since the first compilers, compiler developers have noticed that compilers
can assist the programmer in generating efficient code. In this section, I'll
demonstrate two entry-level techniques for code optimization. In MSIL code
generation, more advanced techniques, such as the global analysis of blocks and
their structure, are not important in the front end of a compiler, since JIT parsers
do a lot of optimization behind the scenes.
183
Chapter 7
Constant Folding
A surprisingly large number of programs contain expressions like 32767-2, where
a piece of the expression or the entire expression consists of constants, and it's
pretty obvious that this expression is mathematically and computationally
equivalent to 32765. The reason can be clarity of expression, the use of symbolic
constants, or the generation of code automatically.
I suppose you could write a preprocessor to simplify the source code. However,
this would complicate your life. That's because a decent preprocessor would need to
parse the complete source program. Even if you could cleverly factor the job and
reuse the same parser in both your preprocessor and the compiler, there still would
be two passes in the old style, and the optimization pass would be a waste of time
for most programs not containing constant expressions. Instead, it's relatively easy
to perform such calculations during parsing.
Take a look at Figure 7-1 again. One of its numerous parameters (which,
as I mentioned earlier, could be part of state) is objConstantValue, a qbVariable
described in Chapter 6.
In the example of 32767-2, on entry to addFactorRHS, objConstantValue will actu-
ally be 32767. This is because addFactorRHS is called exclusively from add Factor, and
add Factor's first BNF step (in its BNF addFactor : '" mulFactor [addFactorRHS]) is to
check for a mulFactor.
The check for a mulFactor will descend to compiler_term_, whose job it is
to check for the basic term of any expression. This can be a number; a string;
a subscripted or unsubscripted identifier; a function call; or a complete, inner,
parenthesized expression.
In the scenario, however, compiler_term_ will find the scanner token corre-
sponding to the number 32767. Because it has found a constant, compiler_term_
will set its by reference objConstantValue parameter to a qbVariable of the most
appropriate type (QuickBasic integer) for the token value, which in this case, just
fits into a QuickBasic integer, represented by a .NET short integer.
184
The Parser and Code Generator for the QuickBasic Compiler
NOTE You can tum off constant folding by using the Constant Folding prop-
erty of the quickBasicEngine object.
• The evaluate method evaluates an expression using all the options and
data of the engine running evaluate. The expression may be a single
expression or a series of assignment statements, each prefixed by the key-
word Let, followed by an expression using the assigned variables.
185
Chapter 7
The evaluate, eval, and run functions are not available in their full glory in
Visual Basic and other commercial products, for a very good reason. If a prod-
uct with the power of Visual Basic exposed the general ability to interpret the
code of the language submitted as strings, developers could rather easily
license copies of Visual Basic to others, by using Visual Basic to build a GUI.
Of course, corporate developers are allowed to extend the power of the VBA
language engine to users with no fees for internal applications, but shrink-wrap
vendors cannot do this. One of the joys of developing software for open release
is the fact that you don't need to worry about giving away the store, since you
have already done so.
Evaluation
The evaluate method takes a string consisting of a QuickBasic expression, com-
piles the string, and returns its value as a qbVariable. In fact, an evaluate function
is provided as part of the language- evaluate(string) will return its value as
a QuickBasic type. This function has been used in several Basic compilers, and it
was supported in the Rexx language. It is very useful because it allows the devel-
oper to extend to the user the ability to specify logic as data and business rules.
Here, compiler_binaryOpGen_ can call compiler_constantEval_ to do the eval-
uation using the current settings of quickBasicEngine, using an internal evaluate
method. compiler_binaryOpGen_ also implements the second form of optimiza-
tion, known as lazy evaluation.
Lazy evaluation is the elimination of mathematical, logical, and string oper-
ations known to be unnecessary. Examples include A+O (always the same as A),
B And False (always False), and C & "" (always the same as C).
Lazy evaluation is related to math arcana in the form of the theory of groups.
In math, a group is a set of elements (such as numbers, Boolean values, or strings)
and a set of operations defined over those elements, usually an additive opera-
tion and a multiplicative operation. Also, groups have a unity element and a zero
element. The unity element of a group is characterized by the fact that whenever
it is applied to another element using the multiplicative operator, the value of
the second element is unchanged. The zero element has the same effect when it
is applied using the additive operator. The unity element of the group is one. The
zero element is, of course, zero. The unity element in the group that consists of
the Boolean values (True and False) and their operators (Or and And) represent
truth, whereas its zero element is falsehood.
The group consisting of strings has, strictly speaking, no multiplicative oper-
ator, but it has addition cognate to string concatenation. Therefore, although
strings have no unity, their zero is the null string.
186
The Parser and Code Generator for the QuickBasic Compiler
This means that eompiler_binaryOpGen_ can apply the same logic to the binary
opemtor when one ofits pammeters objConstantValueLHS or objConstantValueRHS is
the unity, or zero, element of their group. Here are some examples:
• When either is zero and both are numeric, the code for stacking the alter-
nate element can be generated instead of addition.
• When either is one and both are numeric, the same code can be generated
instead of multiplication. In fact, this code replaces division when the
RHS is one.
• When either is False and both are Boolean, the code for False can be gen-
erated instead of And.
• When either is True and both are Boolean, True can be generated
instead of Or.
• When either are zero-length and both are strings, the non-null string can
be stacked instead of using a concatenation.
3. Of course, there is no reason why in yaee the optimizations could not be inserted in tags, but
overall, the manual method provides a little more insight into what's going on under the hood.
187
Chapter 7
quickBasicEngine np Interpreter
The Nutty Professor interpreter is embedded in the code, and therefore more
closely bound, or bolted on, the engine. This is mostly an artifact of scheduling
pressures; ideally, the Nutty Professor interpreter would be a separate object.
Figure 7-7 shows the solution architecture of the testing GUI, qbGUI. Most of
the projects in the solution should be either familiar or understandable. qbGUI is
the startup project and the only form-based project. You will recognize old friends
described in previous chapters, including qbScanner, qbVariable, and qbVariableType.
Also notice that documentation can be attached to a .NET project. Here, each
project has a readme.txt file, which describes the project goals, changes, and
open issues. The solution as a whole has a solutionReadMe.txt project with the
solution goals, changes, and open issues.
188
The Parser and Code Generator for the QuickBasic Compiler
In the next section, we'll run the compiler, which includes the ability to
examine its operations. This will give you the "dynamic" big picture.
4. In general, I try to stay away from developing a lot of my "own' visual controls, so forms don't
become too "welcome-to-my-world." However, the zoom project seemed to fulfill a genuine
need.
5. You'll notice the quotations on the qbGUI Easter egg. I first saw quotations in Bill McKeeman's
excellent, now out of print, book, on how to write a parser generator in PLlI, A Compiler
Generator. In introducing the need for a formal BNF notation, he quoted American poet
Emily Dickinson: "After great pain, a formal feeling comes." It was a reminder that the ulti-
mate guarantor of software correctness and efficiency is the person behind the machine.
189
Chapter 7
This dass comples and interpretiVely runs a subset of the Qulck BasIc language. Note that the
ph rase Qulck Basic Is the Intelectual property of the 1lcrosoft corporation.
Edward G. Niges
splnoza11l}@yahoo.COM
http://members.screenz.com/edNlges
To Darlene, EddIE! and Peter (junglee Peter): for In dreams begin responSlbt!t1es.
"But the man who knows the relation between the forces of nature and action, sees how some
forces of Nature work upon other forces of Nature, and becomes not their slave.·
- Bhagavad-Gita
"I could be bound n a nutshel and count myself long of absolute space."
- Shakespeare, Hamlet
OK
NOTE qbGUI will use your registry with sensitivity. It will create one folder in
the proper place (VB and VBA Program Settings) labeled qbGUI. At any time,
you can reset the product simply by deleting qbGUI.
190
The Parser and Code Generator for the QuickBasic Compiler
Click OK to see the simple qbGUI window shown in Figure 7-9. This screen
is meant for the public. Since you've come this far with me on the arcana of the
compiler, you might as well click the More button to see the full monty.6
Once you have the expanded display, click the Replay check box in the
lower-left corner of the window. Then select File ~ Load Source ~ Code, navi-
gate to egnsf/Apress/QuickBasic, and obtain the file helloWorld.bas, to see the
display shown in Figure 7-10. This will allow you to examine a simple compile
operation step by step.
6. Full monty is British slang for "the whole thing." The term became more well-known after the
release of the film called The Full Monty, in 1997.
191
Chapter 7
r Voew
""'pot
less Close
Zoom
~
X~L Inspect I Test Ir Test !MInt Ioct
Scenned T akens Zoom ParseOulline Zoom RPN Zoom Stock Zoom Storage Zoan
Figure 7-10. Thefull monty, obtained by checking the Replay box in the lower-left
corner and loading hello World. bas
The full display contains the code for the famous 'Hello world' program.
Click Run to see the screen shown in Figure 7-11. The most obvious effect is the
"green screen" output of running 'Hello world,' but more interesting is the his-
tory at the bottom (obtained by checking Replay).
192
The Parser and Code Generator for the QuickBasic Compiler
Let's review this history to get an overview of how a very simple program is
handled by the QuickBasic compiler.
193
Chapter 7
I
tokenTypeldentifier on line 5 at 1 to 5
tokenTypeString on line 13 at 7 to 19
1
Figure 7-12. Scan result
Zoom RPN
p:ogrut: .o~ree eor:ie: t:CrI .:. c ....
SQu::c:eProQ':a:m: .o".;rc.e code t. c 4: opPuah!.l.u:ral "It_,. 1
",ou:coePro;-=alt3ody: source- c d 3; opPu!ll".:'l.t.e-ral "S'trl.
ope~C.ode! I: oa.rce c:04e trca .:.
aOUl'ce:Prcq:ra..a.Body: I:o~ee c- "cpirlr.t ........ : apPrl
oItat.va:ntSody: ",ource cOd 6 o~nc:l .......".: ~e=.t
u::condlc.1.cnal: .o-.;rce co e
prl.n~: so..:.:ce code from 1
exp: e aslcnL18t: • .o-.ree c,.::j
194
The Parser and Code Generator for the QuickBasic Compiler
The parse outline may look intriguing. Click the Zoom buttons. From the
Zoom box, copy the parse outline to a Notepad or Word file (with Courier New as
a monospace font) to see the outline, as shown in Figure 7-14.
195
Chapter 7
7. A major difference between QuickBasic and Visual Basic is that in QuickBasic, executable
statements, as opposed to module-level declarations, can exist outside functions and subrou-
tines and form part of an implicit main procedure. I call this ·open" source (not to be
confused with either free software or free beer).
196
The Parser and Code Generator for the QuickBasic Compiler
Everyone then piles on the user to convince her that she was wrong in wanting
too much. Iwould ask her, like Leonard Cohen in Bird on a Wire, "Why do you
ask for so much? Why not ask for more?" That's because a user with many ele-
ments that combine may want, without being able to express it, not a set of
cases, but a language for describing cases, and programmers who can design
a language.
Of course, giving people a new language is a venture fraught with hazards. My
experience is that they are never grateful, and like Shakespeare's Caliban (in The
Tempest) are likely instead to say, "Thou taught'st me language, and my profit
on't, is I know how to curse." To avoid this, you can actually hide the language
in a GUI, or as we have done, make it strictly an internal language for produc-
tion and consumption by objects. The beauty of this gesture is that it typically
generates a more powerful system that is easier to debug and maintain.
For example, I worked at one firm that was trying to debug a program that ana-
lyzed phone records for billing purposes. The problem was that conference-calling
and other features interacted to produce an unlimited number of cases. I devel-
oped a language that specified the state transitions of the underlying switch
and an interpreter that simulated the switch in Cobol by reading the state tran-
sitions. I got the engineers to approve the state transitions, and then produced
bills by essentially Simulating the calls. Case closed.
Close
1. The first instruction is the opPushLi tera1, which pushes its operand as
a qbVariab1e onto a stack found in the interpreter_ method.
3. The next statement simply concatenates the string and the newline.
Finally, click the Step button a few more times to see the execution of the
program and its effect on the stack. There will be more about actual interpreta-
tion in the next chapter.
198
The Parser and Code Generator for the QuickBasic Compiler
<!--
***************************************************************
*
* quickbasicEngine
*
*
*
* Tbis class compiles and interpretively runs a subset of the **
* Quick Basic language. Note that the phrase Quick Basic is *
* the intellectual property of the Microsoft corporation. *
** This class was developed by
*
*
** Edward G. Nliges
*
* *
*
[email protected]~t
http://members.screenz.com/edNilges
*
*
* *
***************************************************************
-->
<quickbasicEngine>
<!-- Indicates object asabi~ity -->
<booOsab~e>Trae</booOsable>
<!-- Object instance's name -->
<strName>quickbasicEngineOOOl 3/12/2004 9:58:29 AM</strName>
In Notepad, scroll down to see the XML of the scanner delegate and the collec-
tion of qbPolish instructions. The end of the XML will show a variety of properties,
as you can see in Figure 7-17. Of course, for a simple program, most of these values
are default.
199
Chapter 7
The XML for the 'Hello world' program will consist of the usability and name
of the object, followed by the XML of the scanner state, as described in Chapter 5.
It will contain a null collection of variables (with the XML name colVariables),
because the 'Hello world' program doesn't contain any variables. It will contain
a Polish collection of opcodes identical to the one shown in Figure 7-5, and it
will end with a miscellaneous set of values for the engine, such as its queue of
information for the legacy Read Data statement.
XML allows us to capture not only a set of business rules as a QuickBasic
expression, but it also can capture execution properties that will change, subtly
or in the extreme, their evaluation. Using XML, we have a shot at capturing some
logic, including its execution environment, and placing it in a file.
200
The Parser and Code Generator for the QuickBasic Compiler
201
Chapter 7
Error Taxonomy
Three types of potential errors are recognized and handled by the
quickBasicEngine object:
Errors in logic of the code: There are two subtypes of this type of error:
bugs in the code I have developed and any errors you may add while
modifying the compiler. Errors in the logic of the code detected by this
code itself (such as in the inspect method) will result in calls to the low-
level errorHandler utility exposed by the utilities object, which typically
displays a message box. These errors will usually mark the instance object
as not usable, so it does not damage your data.
Errors in using the object interface: This type of error could be in a GUI
or an object using the QuickBasic engine as a Web service. Mistakes in
calling the object also result in calls to utilities.errorHandler, but do
not mark the object as unusable.
Summary
We have, I trust, cut to the chase. You have learned how to generate recursive-
descent parsers for a sizable language, as well as how to generate code, including
optimized code that eliminates unnecessary runtime computation.
We then ran the qbGUI program to step through an exceedingly simple program
and see how it is parsed. The same overall approach was used as in Chapter 3's tly-
over compiler, but here, the object model means that the elements handled are
themselves reference objects on the .NET heap, rather than simple values on the
.NETstack.
I hope I have shown that writing a compiler is a nontrivial task, but one that
is doable; for the dialogue in programming is always between simplicity and
complexity. In order to give the user a simple experience, we have to wrestle with
complexity.
The methods are not unique to Basic and can transfer to your own language
development. You may want to develop a language for the disabled and "parse"
their gestures. You may want to develop a language for dancers and "parse" their
leaps. You may want to teach language to gorillas in the wild. You may want to
develop a language to dodge responsibility and spin events to your best advan-
tage. You may have some killer ideas for a programming language, as did the
developers of Ruby and Python. Or you may wish to help a user who needs to
compile very old source code, for which the compiler has disappeared ("retro"
computing, or computing for old guys).
202
The Parser and Code Generator for the QuickBasic Compiler
Challenge Exercise
Run the qbGUI program. What happens when you click the Test button in the
Customer Engineering Zone of the full monty display?
Try testing some simple expressions. If you enter a math expression and
click the Evaluate button, the "green screen" will show its value. Try entering
a real program and clicking the Run button.
Resources
See Compilers: Principles, Techniques and Tools, by Alfred Aho, Ravi Sethi, and
Jeffery lTIlman (Addison-Wesley, 1985) for much more information about parsing
and compiler optimization. This is the famous "dragon" book referenced in ear-
lier chapters.
203
CHAPTER 8
Developing Assemblers
and Interpreters
No, I'm not interested in developing a powerful brain. All I'm after is just
a mediocre brain, something like the President of the American Telephone
and Telegraph Company.
-Alan Turing
TURING'S HOPE has not been realized; the CEO of AT&T was a pretty smart cookie.
Furthermore, if Dijkstra is right, the nature of computer intelligence is constituted
in the ability to manipulate symbols and not conscious choice and awareness.
However, Thring made a discovery in 1936 all the same. Obeying an algorithm
is itself obeying an algorithm...which manages to be obvious, or profound, or stu-
pid, or all three: "To follow the rules, follow the rules for following the rules."
In the last chapter, you learned how the quickBasicEngine generates code.
In this chapter, I will discuss the details of assembling code with jumps and with
Go To instructions and related issues having to do with assemblers.
The compiler generates qbToken objects and stores them in a collection. These
qbToken objects reference each other using labels, and in a rather clerical (but rather
tricky) operation, these labels must be translated to numeric addresses.
A more exciting operation, discussed in the second half of this chapter, is the
simulation of a computer by an interpreter. Rather surprisingly, even a complex
computer architecture can be completely imitated by another computer architec-
ture (even one less powerful or less complex) using software. In most cases, the
simulation will be slower than a native implementation, but this is not necessarily
the case when the computer doing the imitation is several orders of magnitude
faster.
This chapter will discuss assembly in the context of the simple assembler
embedded in the QuickBasic compiler. I will then discuss the design of the onboard
Nutty Professor interpreter, a software machine for executing the qbPolish objects
emitted by the compiler.
Assemblers
Let's take a look at assemblers in general and in their historical context, and then
examine the simple assembler embedded in the quickBasicEngine.
205
Chapter 8
Assemblers, in General
Assemblers have been around since the earliest days of computers, although para-
doxically, assemblers may postdate compilers. This is because the earliest computer
scientists, including Charles Babbage, John von Neumann, and Konrad Zuse, did
not work as lowly programmers. Instead, they prepared equations for the earliest
programmers to enter by keying or setting switches.
Of the three pioneers I have mentioned, Konrad Zuse also developed in the
early 1940s the PlanCalcul, which was a prototype of a high-level "compiled" lan-
guage, and this predates the first assemblers: the first compilers predated the first
assemblers (cf. A History ofModem Computing, Second Edition by Paul E. Ceruzzi
[MIT Press, 2003]).
Later in the 1940s, Grace Murray Hopper (an early ENIAC programmer and an
officer in the United States Navy) started to "reuse" the code for common equations
by borrowing tapes containing ENIAC codes and lending her own code in return.
This activity related more to the early compilers than to assemblers, but Hopper's
team was, as noted in Chapter 1, the first to see the economic value of saving pro-
grammer time.
The first assemblers were developed by working programmers to avoid hav-
ing to code in straight binary machine language. In fact, John von Neumann (the
Hungarian emigre mathematician who, at Princeton's Institute for Advanced Study,
is credited with the stored program concept) did not think that an expensive and
rare computer should be used at all to make its programmer's life easier.
Nonetheless, the earliest commercial mainframes of the 1950s were shipped
with assemblers after managers discovered that programming was much more
time-consuming than originally thought, and because compilers were harder to
develop at the time than assemblers (modem compiler theory not yet having been
developed) .
These early assemblers required the early programmer to specify actual
machine operations, but allowed him or her to identify storage locations with
mnemonic names. Such assemblers took over the job of aSSigning numeric
locations to the names.
206
Developing Assemblers and Interpreters
The writing of a basic assembler is easily mastered (if you ignore efficiency)
in any language that supports keyed collections.
The assembler must scan each line of assembler source code for constituent
parts: usually including an instruction label, a mnemonic op code, one or more
operands, and comments describing the operation. Older assemblers (including
the IBM 1401 "Symbolic Programming System") that I used forced the program-
mer to put these fields in fixed columns, usually on an IBM punch card.
207
ChapterB
Paper tape and newer assemblers (commencing with IBM 1401 Autocoder
and IBM 360 BAL) gave programmers more freedom because they allowed fields to
be separated with blanks or commas and used a primitive form of lexical analysis
as described in Chapter 5 to separate the individual tokens.
Operators were typically looked up in a fixed table of operators and their
numeric codes, typically using the well-known algorithm called a binary search.
The data labels were slightly more complex to find because a fixed table
could not be used. Instead, the best programmers of assemblers built tables of
the operands used and employed a hash method to access these tables.
In a hash method, a large number of names has to be mapped onto a limited
space for fast retrieval. Of course, if efficiency does not matter, you can simply
build (for example, using Redim in Visual Basic) a table that grows as more and
more distinct names are found and use a linear search.
However, the execution time formula grows rapidly as the number of variables
increases. Each time a variable is used, it must be looked up with, on average, n/2
probes of the list of variables. The execution time as a factor, not only of the num-
ber of variables, but also of the number of occurences of all variables, is m*n/2. As
m or n grow, the execution time slows dramatically.
Early assembler programmers therefore developed variants of hash tables,
and this technology was used by Microsoft to build the collection with a key.
The best hash algorithm will consider an identifier as a number and take
. some part of this number, which is bounded by the space available for the hash
table (like Hamlet, quoted in the qbGUI Easter egg of Chapter 7, the hash table
can be bound in a nutshell but counts itself king of infinite space). For example,
if 256 table entries are allocated, a fairly good (by no means optimal) hash algo-
rithm might take the last byte of the operand name as the hashed index; the last
byte is probably better than the first byte for most input programs to the assem-
bler, as the first byte might have distinctly nonrandom prefixes (many identifiers,
for example, might start with the letter I, and the use of systematic prefixes for
identifiers, sometimes known as Hungarian notation, will tend to create many
identifies with the same first letter).
However, because more than one operand can have the same last byte and
thus the same index in the table, the algorithm needs a plan for a "collision." It
turns out the most effective plan is simply to proceed to the next empty entry
of the table (wrapping back to the start of the table as needed) and use this as
the entry.
At worst, a small linear search, usually restricted to one or two entries, results.
One further complication occurs when entries have to be deleted in an assembler
or compiler that allows symbols of limited scope, which have to be thrown away
when their context is compiled; for example, a compiler that supports variables
local to procedures like the Visual Basic compiler must throwaway local vari-
ables. The Visual Basic 6 compiler had to throwaway all local variables at the
end of the procedure; the Visual Basic .NET compiler must throwaway variables
208
Developing Assemblers and Interpreters
at the end of each block as well as at the end of the procedure, because Visual
Basic .Net supports the declaration of variables inside For loops, Do loops, With
clauses, and other blocks.
The deleted symbol's hash table entry must be located and tagged as free;
but it is not quite the same as it was before it was used. This is because when
searching for a symbol that hashes to a location between the hash for the
deleted symbol and the deleted symbol, the freed entry doesn't stop the search.
Ordinarily, in searching a hash table just to find a symbol, the first unused
entry encountered shows that the search is complete, and has failed, because if
the symbol hashing initially to entry n was in the table, it would be in the first
available entry to the right of n, wrapping around to the beginning. But if
a deleted entry is found, it may have hashed to another initial starting location;
therefore, the search must continue.
The solution is to mark deleted entries specially so that they are distinct
from empty entries. For example, a .NET solution might be to set deleted entries
to a blank while making sure empty entries are Nothing. Many of these tech-
niques were discovered by early writers of assemblers.
But keep in mind you may never need to create a hash table for your parsers
and language tools because the classic collection of VB and other languages and
the .NET Framework collections solve the problem.
You can write a better-performing collection than those provided in the
.NET Framework by taking advantage of recent research in hash algorithms, or
by encapsulating knowledge of the keys to be hashed. But my experience in
doing this produced at best only a marginal speed advantage of about 15%.
One problem with the collection, whether used in Visual Basic 6 or in Visual
Basic .NET, is that it is a collection of untyped objects. This means in practice
that code that uses collections (of any type) might be altered, erroneously, to
contain objects of the wrong type. Also, retrieval is slowed because the pure
objects used by the collection need to be converted to the right type.
There are, however, many solutions for this problem.
The collection can be an object that inherits the collection member as
a base type. Or, you can wait for the next edition of Visual Basic .NET, which will
allow you to use "generic types." Or, you can bite the bullet and implement
a strongly typed hash table as an array with a strong type. Or, you can use the
solution of the quickBasicEngine, which is to use collections and to inspect them
for correct types.
Assemblers often have features to make programmers' lives easier. For
example, it was a chore to have to use a literal, such as the number one, by nam-
ing it and defining its position as a labeled instruction. Therefore, early common
assemblers allowed the programmer to use literal values, usually numbers, and
the assembler took over their assignment to storage.
209
ChapterB
A significant development was the macro assembler, which allowed the pro-
grammer to define sequences of defined opcodes as a new opcode. And it was then
a short step to the conditional macro assembler.
Conditional macro assemblers select sequences of code for assembly based
on conditions and the values of symbols. In fact, the Visual Basic preprocessor
statements that commence with the pound sign (such as #If, #Then, #Endlf) rep-
resent a conditional compiler version of this facility.
Conditional macro assembly was used mostly by manufacturers to ship cus-
tomizable and modifiable source code. For example, the IBM mainframe system of
the 1970s and 1980s, Virtual Machine, Conversational Monitor System (VM/CMS),
was shipped to large clients in the form of source code.
The client would set symbols in a special area or through a primitive GUI,
and the assembler would then use these values to select the actual source code
for that client's installation.
The modem C and C++ preprocessor is an almost complete conditional macro
"compiler" that supports the definition, assignment, and computation of compile-
time values in addition to traditional if .. then .. else statements.
Some conditional macro assemblers include the ability to branch to labels,
which meant that the engine underlying the conditional macro assembler was in
fact a general-purpose, simulated computer available at assembly or compiler
time that could engage in complex calculations to determine the final source
code presented to the assembler.
In fact, some of these products were not even used to generate code to the
assembler at all. Instead, they generated code for other environments or even, in
some cases, documents.
The nearly extinct language PLlI, developed for IBM mainframe program-
ming, extended all computational power to the macro writer with all the power,
and obfuscatory potential, that this implied. Basically, proprietary software ven-
dors don't deliver source code to their customers any more; therefore, there is no
reason to deliver, as before, highly and generally customizable source to cus-
tomers. But this may change. There is increasing interest in obtaining source
code instead of object code because of the greater quality and safety of the former,
whether as "open" source or as a commercial product. This may cause a return to
the shipment of source that can be customized, using a preprocessor.
Also, writing a full preprocessor would be a useful exercise and would create
a product unlinked to anyone programming language, because there is no reason
210
Developing Assemblers and Interpreters
why the macro processor has to care about the language it processes. It would
provide the ability to have a single source image of a large software system rather
than multiple copies with changed code, with one limitation: it might have trou-
ble with the fact that today, source code is created not as flat files but as project
"trees."
For example, a large Visual Basic source could be used with a preprocessor
to generate either Visual Basic 6 or VB .NET source code.
Here is an example. The following code uses a C preprocessor to condition-
ally generate a debugging statement:
#if (debugMode)
#define DEBUGCproc,msg) MsgBoxC"Debug message from ...
#else
#define DEBUG(proc,msg) , No debugging
#endif
The DEBUG symbol is a macro symbol. When debugging is in effect, the DEBUG
symbol is replaced here by a MsgBox that includes the parameter names proe and
msg: when debugging isn't in effect, the DEBUG symbol is replaced by a comment.
The preceding example is not usable inside the Visual Studio GUI (because
the C preprocessor is not called forVB .NET programs), but you could create an
external build system that would work if you needed to.
The advantage of the approach is that one source representation can sup-
port debugging after the system is placed into production, with no runtime cost.
Macro assembly and preprocessors, especially the c++ preprocessor, however,
have a terrible reputation, and programmers who use the C and c++ preprocessor
have been known to be punished by 20 lashes with a wet noodle.
There are two reasons for this. One was pointed out to me by my son several
years ago when he taught me about object design. Many of the jobs that were for-
merly performed by macro processing are now accomplished in a cleaner and
safer way using 00 concepts such as overloading and encapsulation. If (for exam-
ple) some customers want version A of a method, which exposes an extra parameter,
and others don't want this parameter to be exposed, it might make more sense
to use overloading to provide both versions rather than using a preprocessor to
generate the desired signature.
Another reason for the unpopularity of the preprocessor is the way in which
extensive use of macro processing creates unnecessary complexity.
However, I happen to disagree with the many writers on this topic (such as
the very droll Bill Blunden, who has written Software Exorcism: A Handbook for
Debugging and Optimizing Legacy Code [Apress, 2003), a guide to maintaining
code and pronouncing curses upon the original authors) who feel that using
extended definitional facilities is always and everywhere the sign of a flawed char-
acter. That's because the whole point of this book is that at times it makes sense to
develop a language for a problem solution. In fact, important solutions have been
211
Chapter 8
developed using the C and c++ preprocessor, including Bjame Stroustrup's first
c++ compiler, which was written using C preprocessor statements.
Although the era of macro processing in programming languages may be
over, the technique is still important in areas including text processing, and
may represent for you a problem solution when you have to process text with
substitution.
2. The compiler "decorates" the output code with remarks to show what
instructions are generated from which lines of source code, as shown
in Figure 8-2: this decoration needs to be removed when a user option
(available on an options form in the GUI as I will describe) is set.
212
Developing Assemblers and Interpreters
Figure 8-2. Assembler code list, prior to assembly, o/part o/the nFactorial
program
For now, ignore the details of the individual instructions. As you can see
from the "decoration" consisting of opRem instructions and opNop instructions
(neither of which have any effect on execution and both of which can be
removed by the assembler), the compiler has generated many instructions for
each line of source code. I'll discuss what these instructions do in the next sec-
tion of this chapter.
213
Chapter 8
The listing in Figure 8-2 for the nFactorial QuickBasic program starts with
four remarks, the first of which repeats the header comment, the second and
third of which include declarations, and the fourth of which heads the generated
assembler code to print the prompt for the value of N. The assembler should
remove these lines.
Take a look at line 21 in Figure 8-2. opJumpZ examines the top of the "stack"
maintained by the interpreter (a LIFO stack similar to that seen in Chapter 3) for
zero, and goes to LBLl when the top of the stack is zero.
NOTE You may wonder why we use Go To: is not Go To a thing ofdarkness?
The answer is that although Go To is not absolutely necessary in machine
language, I use it here because so many languages at the machine level do so.
At the end of Shakespeare's Tempest, Prospero says, "This thing ofdark-
ness I acknowledge mine."
It is the assembler's job to track down all pseudo opcodes of the type
opLabel, record their position in a keyed collection (where the key is the label
and the data is the position), and replace each occurrence of each label by its
position. This job is complicated, of course, by the fact that when you remove
labels (and sometimes the comment decoration), the pOSition of the label
changes and has to be adjusted; this is the "tedium" of the assembler's clerical
task I mentioned in the previous section.
The compiler's assembler code is found in the quickBasicEngine Private
method assemble_. The assembler makes two complete passes over the input
source, which is in the collection of qbPolish tokens named col Polish.
NOTE I will discuss the qbPolish object in more detail in the next section.
Here, understand that it is an object with state that represents one instruction
to the Nutty Professor interpreter and which is called a Polish token in honor
of the Polish logicians mentioned in Chapter 3.
214
Developing Assemblers and Interpreters
The first pass is the most difficult because it must remove labels, opRems and
opNops from the code while tracking the effect of removal on the value of labels;
opRems are instructions that do nothing but contain compiler comments, and opNops
are instructions that do nothing, period.
The first pass therefore has the form of a Do while loop that proceeds through
the code, and inside this loop you find a For loop. The For loop's job is to advance
from the current point to the next "real" instruction that is not a label. If this For
loop finds labels, it must record their position and value in a temporary labels
collection.
Each time the For loop finds a real instruction, a separate For loop, also inside
the Do loop, backs up to the previous instruction; deletes the labels, remarks, and
no operations; and deletes each one. The deletion process consists of executing
the dispose method of the qbPolish object and removing it from the col Polish
collection.
Then all new labels found inside the first inner For loop are added to the
real label table. It is very possible that during the pass through the first loop,
you did not know the position of a label. However, because you moved the
labels to a temporary area during the first For loop, all the labels found now
have a known address.
This is the surprising tedium of assembly I mentioned. Assembly, and code
generation inside a compiler, rather resembles DLL hell both in the tedium and
because you are close to the goal when the tedium occurs.
The quickBasicEngine also allows you to assemble without removing any
labels, remarks, or no operations. The qbGUI application gives you access to this
option as shown in Figure 8-3. Assembly is, of course, simpler when the option
to keep labels is retained.
215
ChapteT8
'ITTIil :tl: ~. ~
r-Optimizatlon
r Constant Foldinq
P' Remove comments & labels during assembly
r: Remove degenemte operations
ri Inspect compiler objects
r-parse Display r-Tmcing
I
r No perse dISplay [j Source tmee
,
(;' Outline parse display r; Object trace
.
216
Developing Assemblers and Interpreters
Take a look at line 13 (the JumpZ, jump on zero, instruction). Ignoring until later
its meaning in context, just note that it now jumps not to LBL1, but to instruction 25,
which is the first real instruction after LBL1.
The compiler's graphical user interface, qbGUI, available at the Apress Web site
(http : //www .apress.com) as an executable, allows you to run to the end of assembly
using menu commands to see the result. Bring up qbGUI and, using the File menu,
navigate to egnsf\apress\quickBasic and load the file nFactorial.BAS to see the
window shown in Figure 8-5. (Because you've already run the compiler, it should
expand to the More info screen; but if it does not, click the More button.)
217
Chapter 8
Ae Tools . .
I
Evaluate! Run r:: ~'"';.t less
r Repl.y
Using the Tools menu, click Compile to get a compiled version of the RPN
object code, and if you use the Zoom button on the RPN label after compiling,
you will see the assembly code in Figure 8-1. Again, using the Tools menu, click
Assemble to get the assembly with labels, remarks, and no operations removed.
This section has explained how assembly basically creates an efficient machine
language representation of code. The next step is to see how it is possible to
"build your own computer" without soldering computer parts together, starting
fires, or chipping your nails on top of a real computer by crafting an interpreter.
Interpreters
In this section, I'll cover interpreters in general and historical context, and then
examine the Nutty Professor interpreter embedded in the quickBasicEngine.
218
Developing Assemblers and Interpreters
Interpreters, in General
Interpreters may have been invented in Alan Thring's 1936 paper "On Computable
Numbers with Applications to the Eintscheidungsproblem." The formidable inser-
tion of a monster German word in the title ofThring's paper is reflective of the fact
that prior to WWII, German was like English is today: a lingua franca of science
owing to the prestige of German science and mathematics. The "Eintscheidungs-
problem" was the problem of the decidability of mathematics, and whether or not
all mathematical statements could be proved and I or whether mathematics was
even consistent.
Turing's concern in the paper was to formalize the notion of following a rule,
that is, executing an algorithm. He developed the ultimate paper computer, the
ultimate Nutty Professor computer, and the ultimate Reduced Instruction Set
Computing (RISC) machine because of its simplicity. This computer is called
a Turing machine, and you read about it in Chapter 4. Turing machines were an
important discovery in the history of software, for without them, we would not
be nearly as free to represent logic as data.
Back in the real world, interpreters came into commercial use when early
models of new computers were being designed and programmed before the
hardware was fully available.
When IBM introduced the IBM System/360 in 1964, a large number of busi-
ness customers had invested quite a lot of effort in coding programs for older
IBM 7094 and IBM 1401 architectures. A form of interpretation came to their res-
cue courtesy of hardware-assisted interpretation... also known as emulation.
Firmware in the 360 caused the operation codes and operands of older
machines to be unpacked and simulated in native 360 instructions, often at
a faster rate than the original computer executed its native instruction set.
This allowed 1401 developers to migrate to the 360 without losing their exist-
ing investment in software.
There is, in other words, a range of interpreters, commencing with slow and
simple interpreters, more complex interpreters such as the Nutty Professor inter-
preter for the quickBasicEngine, and virtual machines.
Slow and simple interpreters have a well-deserved reputation as being ineffi-
cient, and, in fact, the Nutty Professor interpreter belongs in this class.
219
Chapter 8
220
Developing Assemblers and Interpreters
Op Op Description
opEvaluate Evaluates stack(top) as a QuickBasic expression using heavy-
weight evaluation. A new quickBasicEngine with the same options
as the current engine is used to evaluate stack(top).
opFloor Replaces stack(top) with first integer n < stack(top).
opForlncrement Increments or decrements the For control value in a For loop.
opForTest Jumps to the for exit when contents of control variable location
are greater than final value (when step value is positive); jumps to
the for exit when contents of control variable location are less.
opIif Replaces stack(top-2) .. stack(top) with stack(top-l) when
stack(top-2) is True, with stack(top) otherwise.
oplnput Reads a number or a string to stack(top) by generating the
compiler input event.
oplnt Replaces stack(top) with integer part.
oplnvalid Invalid marker op intended in certain contexts to flag the opcodes
with a deliberate error; not used in the current compiler.
opIsNumeric Replaces stack(top) with True when stack(top) is a number,
False otherwise.
opJump Jumps to location in operand; expects integer.
opJumplndirect Jumps to location identified at the top of the stack.
opJumpNZ Jumps to location when stack(top) <> 0 (pops the stack top).
opJumpZ Jumps to location when stack(top) = 0 (pops the stack top).
opLabel Identifies position of a code label or statement number; inserted
by the compiler and removed by the assembler.
opLCase Replaces the string at stack(top) with its lowercase translation.
oplen Replace stack(top) by its length as a string.
oplike Compares two strings at the stack top for a pattern match, replac-
ing them by True or False.
oplog Replaces stack(top) by its natural logarithm.
opMax Replaces stack(top) and stack(top-l) with the maximum
value found.
opMid Replaces stack(top-2) .. stack(top) with the substring of
stack(top-2) starting at stack(top-l) using the length at
stack(top ).
221
Chapter 8
222
Developing Assemblers and Interpreters
223
ChapterB
The interpreter in the method interpreter_is a very large case statement that
moves through the Polish collection and jumps to individual support routines.
A stack collection keeps the working elements in the form of qbVariable objects.
The interpreter is rather slow because it imposes a "strongly typed" frame
on top of the stack. For each operation as defined with its description in qbOp,
the stack frame expected is specified in a string form that lists the expected
types of the operation: for example, the expected types of the Add operation
are "number, number". A pop routine obtains the expected stack as an array, and
returns it to the caller.
About the only advantage of its slow rate of execution is that you can sit back
and watch how it works in the GUI, or go to the fridge for a beer, or read Motor
Trend. Also, as in the case of the flyover compiler in Chapter 3, you can replay the
interpretation, as well as the scanning and compilation, in the GUI.
To see how the interpreter executes the nFactorial program, call up the qbGUI
application, and load the nFactorial.BAS program. Being sure that the More screen
is shown, go to Tools ~ Options to set up the options shown in Figure 8-6. In this
form, enable the Object trace option in the Tracing box.
224
Developing Assemblers and Interpreters
r Optlmization
r Constant Foldinq
P' Remove comments & IBbeis during assembly
r Remove degenerate operations
r Event Log
Cancel
I
r Inspect Quick Basic Engine
Click the Run button to watch the nFactorial program execute. It will prompt
you for input when the interpreter sends the GUI an input event: try 5. You should
see the screen in Figure 8-7.
225
Chapter 8
.... 3 N2 Varl.ant,Double:vt;Oouble(l) ..
Figure 8-7. qbGUI screen after execution of the nFactorial program with Object
trace enabled
Check the result, which should be 120. Then, click the Zoom button on the
Storage list box to see the strongly typed storage of the quickBasicEngine, which
is contained in the col Variables collection of the quickBasicEngine state (see
Figure 8-8).
1 N Variant,Byte:vtByte(S)
2 F Variant,Byte:vtByte(120)
3 N2 Variant,Double:vtDouble(l)
Close
226
Developing Assemblers and Interpreters
Because the Dim statements for N, F, and N2 (the input number, the factorial,
and the "work" copy ofN used in the For loop) are untyped, N, F, and N2 are each
variants. Note that the displays of each storage item are the output of the toString
method for qbVariable as described in Chapter 6.
Because you selected the Object trace as shown in Figure 8-6, the upper-right
hand of the screen will contain trace blocks like those in Figure 8-9. Figure 8-9
shows the trace block for the concatenate instruction that finishes the display of
the result.
Figure 8-9 shows the interpreter's situation just prior to the execution of the
opConcat opcode to join two strings; the strings being joined are "THE FACTOR-
IAL OF 5 IS 120" and the newline that terminates the Print operation. The trace
block shows the opcode, and it documents the function of the opcode.
As in the case of the flyover compiler of Chapter 3, you can click the Replay
check box at the bottom of the screen to save and store each scan, parse, and
execution step, and replay the complete process in detail.
I'd like to show you one final feature of the compiler at this point: its test
method, which tests the complete compiler after options or source code is
changed. It exercises most of the functions of the compiler.
Recall from the discussion of core methodology in Chapter 4 that I prefer to
write complex objects with their own test method so that they can be rapidly
tested after a change or in installing the software.
Click the Test button in the customer engineering zone. You may get slightly
different results than the ones you see in Figure 8-10 ifI add more tests between
227
Chapter 8
this writing and installation of the compiler software on the Apress Web site, but
you should see the number of test cases actually run, the total time for the tests,
and the important message "The test succeeded".
The report will start with three expressions, and it will contain several pro-
grams including Hello World and nFactorial. Note that the Print statements don't
affect the screen of the qbGUI because the test method creates a quickBasicEngine
and intercepts its Print events to test them against expected results.
Thus like qbScanner, qbVariableTypeTester, and qbVariableTest, qbGUI can test
the underlying object if you alter the source code.
Summary
In this chapter, you've seen how to "assemble" the code by translating symbolic
labels inserted by the compiler to numeric indexes and optionally removing labels,
source code information, and comments. You've then run the Nutty Professor
interpreter to simulate the quickBasicEngine on your system, and you've seen
(depending on your hardware's capabilities) why the interpretation process is slow.
You also learned how to continually test the quickBasicEngine.
Therefore, let's see if you can generate more efficient code by using the
Common Language Runtime, which I'll discuss in the next chapter.
228
CHAPTER 9
Code Generation
to the Common
language Runtime
We all live in a virtual machine, a virtual machine, a virtual machine.
-Sung at IBM Share conferences in the era of
Conversational Monitor SystemlVirtual Machine
I FEAR THIS CHAPTER may be a bit of an anticlimax, and this is because in this book
I have deliberately focused primarily on the front end of a compiler, including
language design, language specification using Backus-Naur Form, lexical analy-
sis, and parsing.
In order to avoid the distracting issues of the Common Language Runtime
and nasm, I even defined the onboard NUtty Professor interpreter-a stack-based
virtual machine tuned towards the needs of QuickBasic-with machine language
that directly supports QuickBasic needs.
It was also important to develop data types as objects because I wanted to
avoid a common pitfall of the tyro language designer-defining a language that
encourages the use of untyped variables including the .NET object or the COM
variant.
There are many sources of valuable content for using the Common Language
Runtime and its associated tools. At the entry level, Vzsual Basic .NET and the .NET
Platform: An Advanced Guide, by Andrew Thoelsen (Apress, 2001) is still solid on
basic interaction through the Reflection types with the CLR.
At a more advanced level, Serge Lidin, the actual developer of Microsoft's
nasm (which assembles CLR code into machine language), has written Inside
Microsoft .NET IL Assembler (Microsoft Press, 2002); this book describes not only
how to get started writing assembler code, but also how to write real code, since
the author gives comprehensive reference information on the opcodes and the
important issue of the loader, which combines multiple assemblies and links
them into a run unit.
229
Chapter 9
In this chapter, I will simply describe how the quickBasicEngine is able in a pro-
totype sense to generate CLR code ... which runs much faster than Nutty Professor
code, though without the benefit of allowing you to see its inner workings.
What I've implemented transforms a subset of possible QuickBasic expres-
sions, the part that does math with constants, into CLR instructions for fast
execution.
For the rest, Ishall have to use the sleazy academic practice ofleaving the
fun part of coding full code generation to you, the reader. I do expect that
because you have fully documented source code through the Apress Web site
(http://www.apress.com).this will be a relatively easy task, and I've dedicated the
final part of this chapter to showing how this can be done.
.. ClHltOnHH Engineenng.zan.
I
4 opPushLiteral 100: Push numflric constant
Evaluate 5 opSubtract : Replace atack(top) by opSubtract(stack
(top-l), stack(top»
3/14/2004 ~ 3'
6 opEnd : Generated at end of code
3/14/2004 5 3'
3/14/2004 5 3:
J/1</2004 5 J"
Sc-!lnnedTot..
230
Code Generation to the Common Language Runtime
ISlack
Even for this small amount of code, the interpreter took a noticeable amount
of time. Let's up the ante and use MSIL!
Go to the Tools menu and select the menu item named Run the MSIL Code,
and in a flash you should see the results shown in Figure 9-3.
OK
231
Chapter 9
NOTE As a side benefit, I will show you how the compiler implements full
thread capability simply so that multiple instances of the compiler can run
simultaneously, the compiler can be running while the user interacts with
a form, and multiple procedures in one instance of the compiler can run
simultaneously. The compiler is fully threadable.
All the actual functionality of the msilRun method is contained in the Private
msilRun_ method because in general you should try to keep Public routines sim-
ple shells around Private code. But note that you call msilRun by way of a private
dispatcher routine.
The purpose of this dispatcher routine is to make the quickBasicEngine mul-
tithreaded by placing the needed threading logic in one place. You must lock the
state on entry to the dispatcher and release the lock on exit.
This is because at any time in the dispatcher itself or in a Private procedure
called by the dispatcher, you may need to reference the variables concentrated
in the user data type usrState (of type TYPstate) in the General Declarations sec-
tion of the quickBasicEngine.
If two procedures running in separate threads reference and change these
variables simultaneously, the execution of the program will be unpredictable
and buggy.
This strategy is rather primitive and broad. By the time any thread is execut-
ing code inside the dispatcher, other threads trying to run procedures will simply
queue up and pound, as it were, on the door of the SyncLock ... waiting their
tum, like the kids in the house with one bathroom. A more fine-grained approach
would be to identify specific zones of the dispatcher that specifically interact
with the state and lock only these zones.
Several different procedures come to the dispatcher and identify the specific
functionality they need. The problem that you create in this design is that every
time a new Public property or method is created, the dispatcher center must be
upgraded with new execution cases. Their advantage is that they allow you to
concentrate logic, here locking, in one place and to wrap it around the trans-
action center.
Here is the dispatcher code, somewhat shortened:
232
Code Generation to the Common Language Runtime
Case Else
errorHandler_("Internal programming error: " &
"too many parameters", _
"dispatch_", _
"Making object unusable and returning Nothing", _
Nothing)
OBJstate.usrState.booUsable = False
Return (Nothing)
End Select
End Function
, --- Returns the reference value
Private Overloads Function dispatch_(ByVal strProcedure As String, _
ByRef strOut string As String, _
ByVal objDefault As Object, _
233
Chapter 9
Case "MSILRUN"
objReturn = msilRun_
Case Else
errorHandler_("Invalid dispatch method " &
_OBJutilities.enquote(strProcedure), _
"dispatch_", _
"Marking object unusable and " &
"returning default", _
Nothing)
.booUsable = False
End Select
End With
End If
End SyncLock
Sync Lock OBJthreadStatus
OBJthreadStatus.stopThread()
End SyncLock
Return (objReturn)
End Function
Note that the dispatcher not only locks and frees a lock on the usrState but
also on a minor player, OBJthreadStatus, which keeps track of running threads so
that you can monitor them in the qbGui form: to see this, run the qbGUI executable
234
Code Generation to the Common Language Runtime
and bring up the Options form used in the last chapter. Click the check box
labeled "Stop button", return to the main form, and load the nFactorial.bas
demonstration program used in Chapter 8. Run this to see a new form that will
allow you to stop a runaway compile or run (see Figure 9-4).
opPu~hL~teral Strinq:vtStr~nq
opConcat
...' k
. ..
~ •
~. -~ •
.
• '
'
As you can see in the dispatcher code, there is a case for the msilRun method
that sets an object to the value returned from msilRun_ that does all the work of
translating into CLR code using Reflection.
The code in msilRun_ requires the following imported namespaces identified
at the beginning of the quickBasicEngine:
System. Reflection provides the basic tools that allow you to discover the prop-
erties of your types, methods, and fields. System. Reflection. Emit provides you with
the ability to create code in the CLR. DotNetAssembly is what you need to create and
load a single dynamic assembly that will execute the compiled functions.
Here is the code of msilRun_ itself:
235
Chapter 9
Return Nothing
End If
Dim objAsmName As AssemblyName
Dim objAsm As AssemblyBuilder
Dim objClass As TypeBuilder
Dim objILgenerator As ILGenerator
Dim objMethod As MethodBuilder
Dim objModule As ModuleBuilder
Try
objAsmName = New AssemblyName
objAsmName.Name = "msilRun"
objAsmName.Version = New Version("1.0.0.0")
objAsm = _
AppDomain.CurrentDomain.DefineDynamicAssembly _
(objAsmName, AssemblyBuilderAccess.Run)
objModule = objAsm.DefineDynamicModule(objAsmName.Name)
objClass = objModule.DefineType(objAsmName.Name, _
TypeAttributes.Public)
objMethod = objClass.DefineMethod(objAsmName.Name &"_", _
MethodAttributes.Public, _
Type.GetType("System.Double"), _
Nothing)
objILgenerator = objMethod.GetILGenerator
Catch objException As Exception
errorHandler_("Not able to initialize MSIL generation: " &
Err.Number &" " &Err.Description, _
"msilRun_", _
"Returning nothing", _
objException)
End Try
Dim intIndexl As Integer
Dim objArgument As Object
Dim objNextOpcode As OpCode
Dim objNextValue As Object
With .colPolish
For intIndexl = 1 To .Count
loopEventInterface_("Generating MSIL code", _
"collection item",
intIndexl,
.Count,
0, _
1111)
Else
If .Opcode = ENUop.opPushLiteral Then
If UCase(.Operand.GetType.ToString) _
"QBVARIABLE.QBVARIABLE" Then
objNextValue = _
CType(.Operand, qbVariable.qbVariable).value
Else
objNextValue = .Operand
End If
Try
objILgenerator.Emit(OpCodes.Ldc_RS,
CDbl(objNextValue»
Catch
Exit For
End Try
End If
End If
End With
Next intIndexl
If intIndexl <= .Count Then
errorHandler_("Not able to convert Polish code to MSIL",
"msilRun_", _
"Returning Nothing", _
Nothing)
Return Nothing
End If
objILgenerator.Emit(OpCodes.Ret)
Dim objReturn As Object
Try
objClass.CreateType()
Dim objType As Type = objAsm.GetType(objClass.Name)
Dim objInstance As Object = Activator.CreateInstance(objType)
Dim objMethodInfo As MethodInfo = _
objType.GetMethod(objMethod.Name)
objReturn = objMethodlnfo.lnvoke(objlnstance, Nothing)
Catch objException As Exception
errorHandler_("Not able to run MSIL: " & _
Err.Number & " " & Err. Description, _
"msilRun_", _
"Returning Nothing", _
Objexception)
Return Nothing
End Try
237
Chapter 9
Return (objReturn)
End With
End With
End Function
Private Function msilRun__qbOpcode2MSIL__
(ByVal enuPolishOpcode As qbOp.qbOp.ENUop, _
ByRef enuMSILopcode As OpCode) As Boolean
Select Case enuPolishOpcode
Case ENUop.opAdd : enuMSILopcode = OpCodes.Add
Case ENUop.opAnd : enuMSILopcode = OpCodes.And
Case ENUop.opDivide : enuMSILopcode = OpCodes.Div
Case ENUop.opEnd : enuMSILopcode = OpCodes.Nop
Case ENUop.opMultiply : enuMSILopcode = OpCodes.Mul
Case ENUop.opNegate : enuMSILopcode = OpCodes.Neg
Case ENUop.opNot : enuMSILopcode = OpCodes.Not
Case ENUop.opOr : enuMSILopcode = OpCodes.Or
Case ENUop.opSubtract : enuMSILopcode = OpCodes.Sub
Case Else
Return False
End Select
Return (True)
End Function
msilRun_ is a function with no parameters because it gets all its input from
the colPolish collection in usrState, which contains all the Polish operations.
msilRun_ will iterate through colPolish and compile what operations it can. If it
finds an operation it cannot convert, it will give up, report failure through an
error handler that throws an error, and return Nothing to the caller.
However, the first and rather formidable job ofmsilRun_ is to create the
objects needed to build a dynamic assembly in the first place. The job is formi-
dable because the objects need to tie together in one and only one way.
238
Code Generation to the Common Language Runtime
objAsm = AppDomain.CurrentDomain.DefineDynamicAssembly _
(objAsmName, AssemblyBuilderAccess.Run)
objModule = objAsm.DefineDynamicModule(objAsmName.Name)
objClass = objModule.DefineType _
(objAsmName.Name, TypeAttributes.Public)
objMethod = objClass.DefineMethod(objAsmName.Name & "_", _
MethodAttributes.Public, _
Type.GetType("System.Double"), _
Nothing)
objILgenerator = objMethod.GetILGenerator
Catch objException As Exception
errorHandler_("Not able to initialize MSIL generation: & II
"msilRun_", _
"Returning nothing", _
objException)
End Try
• objAsmName: The name of the assembly to contain the compiled code. Note
that this is an object and not a string, because the CLR requires (primarily
for security and interoperation) a structured name that includes the string
name, the version, and the locale information.
• objAsm: The assembly itself, a container for the type and the method. Many
assemblies contain multiple classes (also referred to as types), but this
assembly will contain one class.
• objClass: The class is a type builder because you need to not only use it,
but also assign its properties.
• objILgenerator: This is the object that will enable you to emit code to the
method itself.
• objMethod: Again, the method is a builder for the same reason the class is
a type builder.
The next step, in the Try .• Catch block, is to create the objects.
239
Chapter 9
You first create the assembly "name" as an object that contains the string name
and the version: in this simple application, you don't need to identify the locale.
The next step is to define a new, "dynamic" assembly with the structured name
and a type. The type you select is Run, because all you want to do is run the code.
You can also choose Save if all you needed to do was save the code to disk or
RunAndSave to do both.
Inside the new assembly, you create the single module and then the one-
and-only class (also known as type).
For the one-and-only method, you need to specify its name (msilRun~, its
scope as Public, and its return type; the return type is Double because all you can
compile are arithmetic operations. The rest of the DefineMethod would specify the
parameters expected by the method if it had any.
Finally, you need to assign an IL generator to the method as the hose that
will transmit specific CLR instructions.
The next segment of code is pretty straightforward, so I won't reproduce it
here: instead see the complete listing. It loops through the col Polish collection,
converting each entry to a qbPolish.
Note how in more than one place the compiler needs to convert collection
entries to specific types. This is because as of Visual Studio .NET 2003 a "generic"
facility is absent. This would allow you to specify that the col Polish collection
always contains objects of type qbPolish. This feature will be available in the
2004 release of Visual Studio for Visual Basic and C#.
Then the code calls msilRun_qbOpcode_ to see if the colPolish opcode can be
"transliterated" one for one to a CLR opcode. msilRun_qbOpcode_ returns the IL
opcode or Nothing on failure. If the opcode cannot be translated, it might be any
one of the large number of opcodes not supported by your prototype.
In Nutty Professor machine and assembler code, the pushLiteral opcode has an
operand that is a constant represented either as a .NET value or as a qbVariable. You
determine what type it is and try to emit an instruction for the CLR Ldc_R8 opcode.
240
Code Generation to the Common Language Runtime
Note that Ldc_RSloads a constant in the CLR operand that is a Double pre-
cision value represented in 8 bytes. What we call a push in the Nutty Professor
interpreter is a load in the CLR. Furthermore, note that the Nutty Professor inter-
preter has one instruction that obtains the type of the operand from an object or
the GetType of a .NET value, whereas the CLR is more strict: it uses distinct opcodes
for distinct data types, which makes the CLR more reliable and efficient at the
same time.
If the generation of the IL for the value fails, then the operand in the inter-
preter's code cannot be converted to double, and the expression deals with strings
that the demo object code generator cannot handle.
The "piece of resistance," of course, is where the code is actually run.
objILgenerator.Emit(OpCodes.Ret)
Dim objReturn As Object
Try
objClass.CreateType()
Dim objType As Type = objAsm.GetType(objClass.Name)
Dim objlnstance As Object = Activator.Createlnstance(objType)
Dim objMethodlnfo As Methodlnfo =
objType.GetMethod(objMethod.Name)
objReturn = objMethodlnfo.lnvoke(objlnstance, Nothing)
Catch objException As Exception
errorHandler_("Not able to run MSIL: " & _
Err.Number & " " & Err.Description, _
"msilRun_", _
"Returning Nothing", _
Objexception)
Return Nothing
End Try
Return (objReturn)
To complete the method, you must emit the Ret opcode to return to the caller.
The Try block then uses objClass (which, you'll recall, is a class builder) to "bake"
the ingredients into an executable pie. You then have to obtain as a 1YPe the cre-
ated type, and make an instance of that type.
Note that having to create an instance is somewhat of an unnecessary require-
ment because the very simple method you create is stateless and could be, in
terms of both C# and C++, a static method; in Visual Basic, it could be a method
with shared procedures and no variables in general declarations.
Using the instance object, you invoke, or run, the method without any param-
eters. Because it leaves one value on the stack, the CLR returns this as the function
value. This is returned by the code.
241
Chapter 9
Challenge Exercise
Develop the MSIL object code generator for the quickBasicEngine.
Summary
Both the Nutty Professor interpreter and the CLR are stack machines that exe-
cute opcodes that interact with the stack. In Chapter 8, you saw how a simulator
for the CLR itself could be written in a similar manner to the interpret_method
of the quickBasicEngine.
Unlike many Microsoft products, .NET and the CLR were developed in an
open spirit. Officially, we don't know how the Visual Basic interpreter for VB 6 pro-
grams worked apart from the fact that it was a C++ program that at times imposed
some strange performance penalties. But the operation of any valid interpreter for
the CLR is well defined, even if you don't have its code, by the governing standard.
The open standards create an aftermarket of opportunities for unemployed
compiler writers, not only for .NET compilers for traditional languages, but also
for business rule compilers for expressing the rules of the organization in a highly
maintainable, and reasonably efficient, form.
Beyond the books by Andrew Troelsen and Serge Lidin on basic and
advanced use of the Common Language Runtime and lLASM, a considerable
amount of content is shipped with Visual Studio on the CLR and lLASM.
242
CHAPTER 10
Implementing
Business Rules
It's not simple unless it's complete.
-Larry Ellison, CEO of Oracle
243
Chapter 10
shop floor. It appears from Weinberg's story that the requirements were being
translated into sequential assembly steps with many dependencies.
The code was, according to Weinberg, a rather confusing mess, and a pro-
grammer was assigned to rescue the project in a way that will be completely
familiar to modem programmers (the story happened long ago).! After examin-
ing the code, the programmer realized that he could rewrite the buggy program
faster than he could fix it, by creating a program that read tables, looked up the
configuration requests in the table, and found the specific sequence of steps.
Weinberg doesn't go into any detail about specifics about what the program-
mer went through, but it is clear that he recognized logic as data, and that the
existing, buggy solution did not make manifest the business rules. The business
rules were hidden in the complex code, and for this reason weren't being cor-
rectly implemented. Weinberg's hero seems to have rethought his problem as
less a question of implementing a specific set of business rules and more as an
exercise in implementing a second level of logic, in the form of a system that
read, "compiled," and processed the actual engineering requirements.
244
Implementing Business Rules
such as "caller removed the phone from the hook."2 Therefore, I needed to get
from an event to a call. How could I do this?
I realized that the mere Cobol program had to act as ifit were a digital switch,
and thereby create calls from events. I knew in general how the Cobol program had
to act to simulate a Private Branch Exchange (PBX), which is a limited but general-
purpose (in 1\Jring's sense) digital computer, since these switches are commonly
state machines.
Starting in a "start" state, they wait for symbols that consist of atomic user
events, such as user picks up the phone, user dials digit x, user hangs up, and user
throws phone across room (sorry, just making sure you're awake).
Characteristic of this type of state machine is the fact that a state and sym-
bol fully determine a new state and perhaps a list of actions. I demonstrated to
the client that the simple transitions could be obtained from the original design-
ers of the switch and placed into a fIxed table defIned using the Cobol occurs
clause. Then, as the program read the file of events, it would actually retrace the
sequence of events the original switch had encountered. The billing people were
then able to identify at which points an actual call completed and how to appor-
tion the bill, for example, by dividing the cost of the call by the number of
conference callers. The solution worked (lucky me), and the client was happy.
2. As a recovering philosophy major, I call these moments ontological moments. Ontology isn't the
study of how to get onto the bus. It's the theory of the fundamental constituents of reality. This
was an ontological moment because I realized that the entity analysis of just what a phone call
might be had never been carried out, and as a result, the programmer had no clear pathway
from event to call. Business rules must be based on a clear business ontology in this sense,
which ordinary users know without having to study philosophy.
245
Chapter 10
The Dilbert factor is the belief, subtly fostered by the strip, that (1) everything
interesting and worthwhile has been done in computing, and that for this rea-
son work should be the boring installation and cleanup of existing solutions;
and (2) even if there's room for innovation, we here at XYZ company are proba-
bly going to be laid off, we don't have a clue, and it's best not to take any risks.
Because of the Dilbert factor, you need to present your idea carefully, you need
to do your homework, and above all, you need to be sensitive to the feelings of
the designers of the bad solution if one exists. (But since I'm an insensitive clod
who treads on other people's feelings when I am not otherwise engaged in
walking on water or shooting myself in the foot, my advice in this area is some-
what limited.)
Logic As Data
As you know, client server and Web systems should be organized into two or more
tiers. The simplest design separates the GUI from the logic I data side. This allows
the GUI to be developed in Visual Basic when it runs on Wmdows and in HTML,
ASP, JavaScript, and other technologies when it runs on the Web.
The next step is to divide the logic from the data, and when this is done, the
logic consists of business rules. Normally, business rules are code in Visual Basic,
e#, and other compiled languages. In some cases, business rules linked directly
to data appear in 'fransact/SQL procedures. There is nothing wrong with this, as
long as the business rules are completely specified by the end user and are rea-
sonably static.
When the rules change, problems arise. This is because changes to the rules,
caused either by users changing their minds or by changes in business needs, cause
a lot of work in the typical Wmdows development environment. Programmers must
obtain the current version from the source code library. They must then determine
where to change the code and, of course, make the changes. The changes must
then be tested, and a new version built and installed.
The code changes are simple in many scenarios and might involve tweaking
a few operators or changing the value of a constant, but the interactions with the
source code library and the installation process constitute a large, fixed invest-
ment of programmer time. That's why it makes sense to represent much of the
logic in such a way that authorized users can change the logic without bothering
the programmers. It makes sense because the programmers are freed to concen-
trate on new problems and to work on the data and presentation tiers, where
their skills are best leveraged. It also makes sense because end users can under-
stand rules presented as data.
Logic as data also means less exposure to multiple versions of source code
with different business rules, a source of bugs. Of course, if the logic consists of
text business rules stored as SQL or Oracle fields, there can be multiple copies of
the database. But security procedures normally protect the end user from using
246
Implementing Business Rules
the wrong version of a database, whereas no such protection exists against using
multiple versions of compiled code, except internal programming procedures.
In some environments, these procedures may be more than adequate. In many
other situations, the user will prefer to see a layer of business rules in the data.
In his book, What Not How: The Business Rules Approach to Application
Development (Addison-Wesley, 2000), C. J. Date (a relational database pioneer)
makes the case for the use of pure predicate logic for the control of the business
organization. Here, complex expressions-including but not restricted to mathe-
matical, logical, and relational expressions as seen in ordinary programming
languages-would be used to implement the mission, goals, and constraints of
the business such as a formalized version of "For all customers, their satisfaction
level must never be less than 5; if it is, call the customer to find out what's wrong."
Although this level of control is what many users want, they also find that it
locks them into a dependence on a vendor. Therefore, my humbler and tactical
(as opposed to strategic) approach has the ordinary programmer using the logic
as data approach in specific situations, rather than across the board, and from
the view of the executive suite. Furthermore, this book is about what it takes to
develop a processor for business rules, while management texts typically assume
they are already available.
There are many areas in which representing logic as data makes sense. I've
mentioned a customer engineering application and a combined business and
engineering application earlier in this chapter. Legal applications and "expert"
diagnostic/strategic systems also generate rule sets of a complexity and rapid-
ity of change that strain the typical software change cycle.
The case study we'll look at in this section is taken from the credit industry,
presenting a part of a credit-rating solution.
247
Chapter 10
3. Cultural and religious factors playa role in credit; for example, Islamic law forbids high inter-
est rates and advises the credit grantor that he is taking just as much a risk of nonpayment as
is the debtor and is therefore bound by sha' aria to investigate the borrower's ability to repay
and his own risk aversion.
248
Implementing Business Rules
The advantages of the open system are twofold: the company can do
a detailed analysis, and the open system is more easily internationalized. For
example, a detailed analysis may find, by a complex application of the rules to
existing payment records, that people who rent in a particular Louisiana parish
and whose income is less than $15,000 a year always pay their debts on time,
but people in the same parish with incomes higher than $15,000 never pay their
debts on time. The single number mayor may not reflect this research. And, of
course, the open system is more easily used in international markets where
there is no preexisting credit scoring database.
A disadvantage of the open system is that borrowers can circumvent the rules
if they know the rules. This isn't true with the closed system, since you cannot
question the finality of the single number.
Many companies combine the new single-number, closed system with their
own rules.
In our example, we want loan officers to have a "calculator" to evaluate the
credit-worthiness of applicants for consumer credit, first-time home loans, home
equity loans, and so on. But in addition, the calculator should be itself change-
able-programmable when the loan supervisor wants to change the rules. Our
tool will use the open system, in that it will make clear the rules, indeed in such
a way that the calculator itself will explain the rules.
4. Note that the latter varies somewhat from common practice, but I would like to show how
the flexibility of logic as data makes it possible to recognize either a reality we've discovered
or the populist tendency of the boss.
249
Chapter 10
This solution can use the QuickBasic engine as a nonvisual object to com-
pile and run a set of rules transliterated by us from a user-oriented notation into
strict Basic. The transliteration can be done with the qbScanner object that was
used to construct the QuickBasic engine to translate rules in the user's notation
. to Basic.
The user's format is condition, action: comments. When the condition is
true, the action is taken. The condition can use the same operators as are used
in QuickBasic, since they will be familiar as simple math to the end user. It can
use the data names annualIncome, bankrupt, thirtyDayPastDue, sixtyDayPastDue,
owns, rents, other, and otherDescription. The only annoyance is the fact that
the user cannot put spaces between word breaks in data names. This can be
overcome by careful language design, but it's probably not important enough
to warrant the effort.
The action will be either decline or an integer interpreted as the loan's
annual percentage rate. Note that the action is an object represented as a weak
type .NET object, because it is either a single-precision annual percentage rate
or the string "decline".5
5. The word decline is used by credit grantors to remind the borrower that a loan in our society
is a contract into which both parties freely enter.
250
Implementing Business Rules
• We have discovered that people in the target demographic area with incomes
between $15,000 and $25,000 don't pay their bills; therefore, we decline to do
business with them. In order to avoid the appearance of discrimination,
we need to give a bona fide business reason for doing so, and it is because
they don't pay their bills.
251
Chapter 10
These rules in particular, and any set of rules in general, mayor may not be
logically sealed, or airtight, in the sense that one or more rules exist for each logical
possibility. The set of rules used by the calculator is not logically sealed. Consider,
for example, what happens when the applicant makes $25,000 or more but has
an undeclared bankruptcy. Since this set of rules is not logically sealed, we need
a default decision, which is decline here. The Default Policy button in the Credit
scoring rules section allows you to define the default policy. Later in this chapter,
in the "Handling Contradictory and Redundant Rules" section, I will discuss an
application of compiler and symbolic interpretation technology that can analyze
complex sets of rules to see whether they are sealed, and if multiple rules apply
to one case.
A requirement is that we explain the rules to the credit analyst and the appli-
cant in an understandable way. This is why the rules contain a comment field.
Also, code inside creditEva1uation uses an instance of the qbScanner object to
transliterate the rules from source form to lengthier explanations. To see a spe-
cific example, click the third rule in the list box, which excludes the income range
from 15000 to 25000. Then click the button labeled Explain this rule, on the right
side of the form, to see an explanation of the rule:
Since we want to make sure the user knows that coding < Oess than) versus
>= (greater than or equal to) has a different result, we use words to clearly state
the effect of the rule. Since we're using the QuickBasic engine to compile rules,
this transliteration can be applied to any changes in the rules.
You can document any set of rules completely.1i:y clicking the button labeled
Explain (in the Credit scoring rules section). You should see the report illustrated in
Figure 10-2. You can copy and paste this report into documentation.
252
Implementing Business Rules
2. If annual ~ncome ~s greater than or equal to 5000 and annual 1ncome ~S less
tban or equal to 15000 and not undischarged bankruptcy and 3D-day overdue
reports is less than 2 and 60-day overdue reports equals 0, the appl1cat~on Will
be accepted With an Annual Percentage Rate of 0.1 (In the ~dxange group we
accept most applicants at a very favorable rate)
3. If annual 1Dcome is greater than or equal to 15000 and annual 1ncome 1S less
than or equal to 25000, the application will be declIned (~hIS income range is
declined by ~~is firm as company POlICY)
4. If annual lnco~e is greater than 25000 and not undischarged bankruptcy, the
application w111 be accepted w1th an Annual Percentage Rate of 0.!5 (-he high
1ncome cl~ent w11l pa;,' a h1gher interest rate at our f1=)
Applying Rules
Let's try applying the rules to the starter applicant standing. Take a look at the
applicant information, which provides the applicant's credit history.
The applicant has an income of $20,000, pays her bills on time, has no undis-
charged bankruptcy, and rents her home.
Click the large Evaluate Credit button in the upper-left comer of the screen,
and then wait. This prototype software is fully scanning, parsing, and interpret-
ing the code, and its progress appears at the bottom of the screen. It takes about
30 seconds on my Pentium 4 (much of this time is spent in making the progress
report). Later in this chapter, in the "Improving the Credit Evaluation Calculator"
section, I will suggest some ways to make this prototype faster.
Since a system requirement is that we explain the result to the applicant, the
screen in Figure 10-3 should appear. This screen provides a statement, similar to
the preliminary explanations shown in Figure 10-2, which can be provided to
the user.
253
Chapter 10
Ap~lIcan' Stand·Dr=:---:----:=:----;::=-:-:=~
Evaluale CredIt An nuallncome 1 2ilOCi03 Hous ing ]
Thirty dny .,..,.. due 1 0:::8 r OWns to Rents
Sixty day.,..,.. due i-1-----:0:::8""'." r: OTher (pleese describe)
Rale 3: BecaU5l1 annll41 lnco~ l.~ qrea.ter t!\an or I!!qu.tl t.a 15000 and anDWll
lncome 1.3 le~" chan or equ"l t.o 25000, eh@ app11cat.l.o.o. has ~en decllne:d ('=tu.~
lnc:OIr..e rallge 1.5 decll.ned by -<:hi.!- fl..rc. as cQ:pany p<:I11cy)
s~. SettIngs
Let's examine how the rules (which are not, after all, in QuickBasic on the
screen) are transliterated into QuickBasic source code using the independent
lexical analyzer. Anyone rule can be viewed as its corresponding QuickBasic
code. Highlight the third rule (the rule that was used in the previous example)
and click the button labeled Show Basic code to see the basic form of the busi-
ness rule:
254
Implementing Business Rules
You can see how the code is executed in the QuickBasic GUI. The button labeled
All Basic code displays the complete program transliterated from the rules. Click
this button and copy and paste the code into the code section of the qbGUI pro-
gram. Then click Run in qbGUI to see the results, as shown in Figure 10-4. If the
More screen is displayed, you will see the detailed, step-by-step execution of the
business rules.
code at IP lOa
3/21/2004 10:56: 39 AM ~u.nning cod • • t I P 10'9
3/21 / 200-4. 10: 56: 39 AM Rwrningo code ee .IP 110
cDde at. If' 111
XMl Inspecl Test Ir T••levenlloo
token':"ypel'd~ntl.fler
.
fScenned Tele;=- loom Pa ..... OUI""" loom RPN Zoam
prog:.tall: _ource ecde: trcm 1 eo ... 1 o~_ 0: .......... r..t ...
c:::l itolJl.Iceprogl:&lI: .IIDU!'"C'I!! cod'!!! (z:o.:::j 2 opl'v.bL1terol 2MDL:J
~ Zoom S\crege L loom
0 .. be.nbupe V."~(!1
OS owns VlI.ruu:lc,
J
tot~~:rypf!ap.e.n.t.o: en .-GlU'ce:Pro9r &ldody: .ourclI!! c.od 3 opPop 1: A:II.:IIlqJ:11 eX) 06 rent. v.ru:.n~~
tokentypel1nJi1.!1nedlnt;t ope nCocie-: "Q\lzee -cede: hOll 1 4 cp.Re:m 0: ......... Let 01 ot.her V.rUnt.
tobn':"ypt."Ne-..lln~ OD . ...:J .sourceProgrlllt.3ody: .sO\Lrce c~ 5 cpP'1.lsh.Lltf!l'al 0: PI...::J ·1
Figure 10-4. Running the business rule code in the qbGUI application
In qbGUI, click the Storage Zoom button to see the business rule data storage,
as shown in Figure 10-5. Note that each data point has been compiled as a Variant,
with the narrowest type for its current value; therefore, annualIncome and rents
have the Integer type and the value 20000 and -1, while other data points are
zero bytes. The rents variable is -1, not Boolean and True, owing to the way in
255
Chapter 10
which the value of the Rents radio button is converted to the value of the rents
qbVariable.
Zoom ,':~~
Close
When the Rents radio button's Checked status is passed to the valueSet method
of qbVariable (discussed in Chapter 6), it is a .NET object, and valueSet tries to
assign it to a series of widening QuickBasic data types, starting with the byte. If it
started with Boolean, the integer value -1 would be converted to Boolean, and in
many cases, this would be wrong.
The .NET representation ofTrue is -I, and this fails to convert to a byte but
can be converted to a QuickBasic Integer (represented by a .NET Short Integer).
Therefore, the selected type is Integer. Since quickBasicEngine is not as fussy as
is .NETVisual Basic with Option Strict, the Integer value -1 can still be used in
Boolean rules such as "rents or owns."
In qbGUI, click the RPN Zoom button to see the generated Nutty Professor
code. Figure 10-6 shows what this code looks like when it contains comments.
When you selected a rule and clicked the Show Basic code button in the
credi tEvaluation application, you saw an End statement in the generated code,
which brings us to another feature of that application. Scroll to the bottom of
the RPN list box in the main form of qbGUI to see it at location 155.
256
Implementing Business Rules
45 opRem 0: *****
If annual Income >= 5000 And
annual Income <= 15000 And Not Bankrupt And ThirtyDay <
2 And SixtyDay =
0 Then
46 opNop 0: Push lValue annual Income contents of
memory location
47 opPushLiteral 1: Push indirect address
48 opPushIndirect : Push contents of memory location
49 opPushLiteral 5000: Push numeric constant
50 opPushGE : Replace stack(top) by opPushGE(stack(
top-1), stack(top»
51 opNop 0: Push lValue annualIncome contents of
memory location
52 opPushLiteral 1: Push indirect address
53 opPushIndirect : Push contents of memory location
54 opPushLiteral 15000: Push numeric constant
55 opPushLE : Replace stack(top) by opPushLE(stack(
top-1), stack(top»
56 opAnd : Replace stack(top) by opAnd(stack(top-1),
stack (top»
57 opNop 0: Push lValue Bankrupt contents of memory
location
58 opPushLiteral 4: Push indirect address
------_....
Figure 10-6. Assembly code for a business rule
By default, when the check box on the calculator's main form labeled Thorough
rule application is not selected, each successful rule causes the rule evaluation to
be terminated. We need to allow the user to thoroughly apply all rules to detect
contradictory and redundant rules. This is not because "users aren't programmers"
(in fact, many are). It's needed because programmers and users make mistakes
when managing large sets of rules.
Three types of contradictory situations may exist in this application:
• A benign contradiction (from our point of view and not that of the cus-
tomer) occurs when more than one rule indicates that the customer should
be declined. A benign contradiction is resolved by declining the user and
providing him all the reasons, so that he can recover his creditworthiness
with us.
257
Chapter 10
• An APR contradiction occurs when two or more rules indicate that the
customer should be accepted at different annual percentage rates. An APR
contradiction is resolved by giving the customer the lowest interest rate;
otherwise, she will complain if another customer in a similar situation
receives a better rate.
• A fatal contradiction exists when one or more rules indicate that the cus-
tomer should be accepted and other rules indicate a decline. Here, the rule
set is broken, and a decision cannot be made based on our system; the
analyst needs to fix the erroneous collection of rules.
Let's see how each type of contradiction is handled. This will also let you see
how rules can be added and edited.
Benign Contradictions
First, let's add a benign contradiction. Make sure that the Thorough rule appli-
cation check box on the main calculator form is checked, indicating that you
want all of the rules to be applied. Click the Add Rule button above the rule list
to see the ruleEntry form. Double-click bankrupt in the Data Names list to get
a condition of bankrupt, and enter the explanation Cannot lend when there is
an undischarged bankruptcy, as shown in Figure 10-7. Make sure the Decline
radio button is selected in the Policy section, and then close the form.
ruleEntry ~~;;?.
f Condition Poli
CondrtlCn r. Decline
lbankrupt
I (' Accept et this APR <83 1
I ExplenetlCn
ICon not lend when there is en undischerged benkruptcy
Cencal Close I
Data Names Example Rules
nnu.fIllncome < 5000 dedlne Insufficenl ennuallncome
annua ncome >= 5000 And annuaUncome <= 15000 And Not Benkrupl And ThlrtyOey < 2 And SottyOey
IInnu"lIncome >= 15000 And IInnu"lInccme <= 25OOO,decllne Th,s Income range IS declined by th,s fim
IInnueUncome :> 25000 And Not Benkrupl. 15 The hIgh Income den! "';11 peY II h>gher Interest rete lit D!
defllultPofocy declme RejBCIS other appiocenlS
rents
other
258
Implementing Business Rules
Evaluate Cmdil
Annuallnaxne 1 2OOOO±l Housincr- - - - - - ,
Thittyday.-_ 1 O±l r awns r. R@nl9
--:~d"y p;;;;i;h;
r--~-~==---;;:Stxty ;-1 ---:'0±l7'l.
Close (don't
so ... sattongs) Close
AddRuI&
llD..'lu!l :n.ec~@ > 25000 A.'ld Not. Bankru t .15: 7he hiOh income el1~ne w.111 ay a. hl.qher lnte.re:.5t rau at ou.r f1
nkrupt, D!:CLINE: C41lO0t. le.nd when cherI!! .15 an anch.5charge:d bankruptcy
ule- 3: 8e:c:ause annual 1ncome 15 gt:l!!ate.r than or equal to 15000 and a..nnual
nccm.e 15 le.!!'s than or eq-.lal 'to 25000, the. applicot1o;a M.3 been deel1.ne:d ( .... oS
nc~ range 1.5 decl1ned by thi. firm as company poliCY)
Click the button labeled All Basic code to see the code that is generated
when the rule application is thorough, as shown in Figure 10-9.
259
Chapter 10
The thorough code moves the default action to the bottom and uses the
decisionMade flag to indicate whether the default rule needs to fire.
The interpreterPrintEvent handler in creditEvaluation in this scenario inter-
cepts two events. The first is the detection that the applicant's annualIncome is
between the excluded range of 15000 to 25000, and here the event handler receives
a decision of decline and a rule number of 3. The second is the detection of an
undischarged bankruptcy, with a decision of decline and a rule number of 5.
Each time the event handler code receives a new decision, it executes the
following logic:
APR Contradictions
Now, let's try an APR contradiction, where mUltiple APRs are specified. Delete
the rule you just added by clicking the rule bankrupt, DECLINE, and then clicking
the Delete Rule button. Add the rule shown in Figure 10-lO. Here, we've decided
to accept people at the high end of the favored range (whose annual income is
between $14,000 and $15,000) with a promotional APR of 5%.
Condilion- -
I Cond~lOn
lannu!lllncome> 14000 And annuallncome<=1 5000
l ~____~__________~~~~=n~!I=~~n~_______________
c
IPromobonal rete
j r po1;q----
r DeclIne
to Accept at thIS APR
~~~~________________~
Concel
Return to the main form, and then set the applicant's income to 14500. Thrn
off the Bankruptcy indicator. Click Evaluate Credit to see the screen in Figure 10-11.
Th~ appl~cation ha~ b~~n accepted: the annual perc~ntag~ rate ~hall b~ 0.05
Rule 2: 8ecau~~ annual lnco~e i~ gr~ater than o~ e~~al to 5000 and annual 1ncc~e
1~ le3~ than 0% equal to 15000 acd not undl~cha%ged bankruptcy and 30-day
ove~due report~ ~~ le~~ than 2 and 60-day overdue report~ equal~ 0, the
appl~cat~on ha~ been accepted with an Annual Percentage Rate of 0.1 (In the
m1drange group we accept mo~t appllcant~ at a very favorable rate)
Rule 5: Becau~e annual inco~e 1~ greater than 14000 and annual 1ncome 1~ le~~
than or equal to 15000, ~~e appllcat10n has been accepted w1th an Annual
Percentage Rate of 0.05 (Promot10nal rate)
Note that r~le 5 conf1rm5 a preceding acceptanc~: s~lect1ng 10we~t appl1cable APR
The customer is given the best APR in a clear and documented fashion,
because we've treated logic as data.
261
Chapter 10
Fatal Contradictions
Now, let's see what happens when we insert a "fatal" contradiction. We'll try to
decline any applicant who is a renter.
Go to the ruleEntry form and enter the rule shown in Figure 10-12. (In gen-
eral, creditEvaluation requires that each rule be rather verbose and specify all
the conditions that apply, which for many applications, is a good thing.)
Condition Pair
Condlllon .. Decline
irents
I ExpillOlloon
I('" Accept at thIS APR
other
Close the ruleEntry form and click Evaluate Credit again, to see the report
shown in Figure 10-13.
~h~ d~c~slon cannot b~ p~rform~d b~caus~ th~ rul~s ere not consll1t~nt
Rule 2: B~cause annual income 1S greater than or equal to 5000 and annual income
is less th~n or equal to 15000 and not undlsch~rged bankruptcy end 3D-dey
overdue reports 1S less than 2 and 60-day overdue reports equals 0, th~
appl~cat~on has been accepted ~1th an Annual Percentage Rate of 0.1 (In the
m1drange group ~e accept most applicants at a very favorable rate)
~u1e 5: Because annual income 1S greater than 14000 and annual 1ncome 1S less
than or equal to 15000, the appl1cat~on has been accept~d With an Annual
Percentaqe Rate of 0.05 (Promot1onal rate)
Rule 7: Because rents home, the appllcat10n has been declined (Decline all
renters! My tenant 1S a bum.)
262
Implementing Business Rules
The rule was a bad rule, since it logically contradicts two other rules. When
business rules of this or greater complexity are encoded in a programming lan-
guage, the bad rule would be at best dead code (where the renter test follows the
other tests); at worst, it would be live code that prevents code that contradicts its
effect from executing.
263
Chapter 10
• A more advanced method would be to create a set object and make each
rule the defining rule of each set. Then set operations-including intersec-
tion, union, and complement-could determine the existence of error
sets, including, in the example, the set of all people making exactly 5000,
which are wrongly declined.
For the more advanced set solution, you would need the ability, probably
inside the Nutty Professor interpreter, to do symbolic calculation with unknown
variables. This would be a version of the interpreter that when given (let's say)
the And operation, a stack value of False, and another stack value of unknown,
would push False, transforming the unknown value back into a known value.
The symbolic version of the Nutty Professor interpreter would need to use
fairly advanced math in cases where it had to compute with values known as
ranges of values. For example, to symbolically "add" 5 to a value known to be in
the range 10.. 20 is to get the range 15 .. 25. The object-oriented approach empow-
ers this type of development since it can define sets, ranges, and unknown
values as objects and value types.
Summary
This chapter has shown you that, with a performance penalty, treating logic as
data provides a new level of flexibility and control for real applications. And note
that as long as you have an effective tokenizing tool for transforming the exter-
nal representation of business rules on a form or in a database and a compiler/
interpreter, it's not necessary to write lexical analyzers and compilers to get to
this level of flexibility.
Consider applying the techniques in the preceding chapters to get any user
with a business rule problem to a level that genuinely eliminates "programming."
The elimination of programming has long been a Philosopher's Stone. Many pro-
grammers claim it is not possible. However, this chapter shows that if pure,
declarative, nonprocedural logic isn't programming, it is indeed possible to
eliminate, for classes of applications, traditional procedural programming, and
that it has benefits even in the area of mere documentation, as our automated
explanations show.
In the next and final chapter, we need to address the issue of language
design as it occurs in crafting a notation for an end user, as in our simple
condition,action:commentnotation, or in creating a tradition~ language.
264
Implementing Business Rules
Challenge Exercise
Refine the rules in this chapter, by taking into account a new fact: homeowners
in the 5000 .. 15000 band are better risks than renters and other housing categories.
Give the homeowners a better rate, and test the new rules.
For a real challenge, find a new industry with which you are familiar and
define a set of business rules. Using Visual Basic .NET, design a calculator using
creditEva1uation as a model and the quickBasicEngine DLL. For example, if you are
selling life insurance to put yourself through school and your sales calls involve
pricing life insurance based on the age of the applicant, his financial standing, and
whether he smokes, you can design a simple Gill, using quickBasicEngine to evalu-
ate the rules (using quickBasicEngine's eva1 method) to come up with a yes/no
accept/ decline decision, or a rate versus decline. Your code will gather the input
data from the screen and assemble a valid QuickBasic expression, which will then
be passed to quickBasicEngine.
Resources
The following are some resources for more information about handling busi-
ness rules:
265
Chapter 10
The Trouble with Dilbert: How Corporate Culture Gets the Last Laugh, by
Norman Solomon (Common Courage Press, 1997). Dilbert communicates
hopelessness and lack of initiative as a positive virtue that masquerades
as cool. As such, he reminds me of Herman Melville's Bartelby the
Scrivener, who was so burned out that he preferred not to do much of
anything. Norman Solomon shows how this is passive I aggressive, and
while it may manufacture consent to corporate policies, it is a recipe for
nonproductive organizations that serve, at best, only an inner ring of top-
level people. Norman Solomon also observes that the comic strip never
mocks or disrespects top-level executives, only middle managers.
266
CHAPTER 11
Language Design:
Some Notes
Confucius, hearing this, said, "Don't bother explaining that which has
already been done; don't bother criticizing that which is already gone;
don't bother blaming that which is already past."
- Confucius, The Analects
MASTER KONG Fu (Confucius' real name) might have thought that in this book I
have "explained that which has already been done," "criticized that which has gone,"
and perhaps even "blamed that which is past." But, like most great ones, Kong con-
tradicted himself, for even he was concerned with transmitting the past and scorned
originality for its own sake. I do believe that as so much knowledge is increasingly
encapsulated in products, we forget how much work the basics represent.
You have learned techniques for specifying a language precisely in Chapter 4,
and for building a lexical analyzer in Chapter 5. You learned how to parse and
interpret this language in Chapters 6 through 9, and saw how to apply this tool
to a practical, real-world problem in Chapter 10.
This chapter addresses, broadly and generally, how to design a language.
This in itself warrants an entire book and a thorough knowledge of software his-
tory, in order to avoid repeating mistakes. Here, I will simply talk about four
important issues:
It is unlikely but possible that you may have some ideas for more productive
general programming of .NET applications. In this case, you need to keep your
audience in mind. It's not enough for you to be more productive with a unique
language; you need to convince your prospective audience that they, too, will be
more productive with your solution. This is an almost impossible task. Managers
call it a "people" task. I call it social engineering.
268
Language Design: Some Notes
However, you may well discover that you are more productive using your own
notation in general, procedural programming. For example, highly skilled but
dyslexic programmers have been known to secretly use system facilities to
create a comfortable environment, without desiring to make their different
abilities the norm.l
The non-Dvorak keyboard, and the prevalence of operating systems like
Windows (which cause purists to shudder and gag) are both examples of the
network externality, in which success is reinforced as long as your product fits
into an existing technical infrastructure. As far as the "ideal computer system"
is concerned, be well advised that your typical user might be a poor Southern
gentlewoman in reduced circumstances forced to put up not only with your
ideal but also with you. The acceptance of something new in anyone case is
going to take into account far more than the bits and the bytes. It will also be
based on the way in which the new paradigm fits in the existing network.
1. Microsoft provides comprehensive facilities for the differentlyabled (just as the blind "see"
that which is hidden from the view of the sighted, the handicapped are differently abled in
that their difference should be added to the sum total of their insight) to use computers, but
not as many to program computers. The SIGCAPH (Special Interest Group for Computers and
the Physically Handicapped) of the American Association for Computing Machinery, and
similar organizations worldwide, address programming for the deaf and other groups, but
mostly when the different ability makes the candipate attractive as a programmer of existing
systems.
2. Program maintainability is another issue that needs to be addressed in relation to differently
abled programmers, because we want to maintain the code written by the differentlyabled.
But in terms of corporate needs, maintainability is often exaggerated by programmers in
search of job security.
269
Chapter 11
But on the whole, it is unlikely that your goal will be to save the world with
a new programming language. Typically, your goal will be humbler, like my goal
to adequately bill switch users for complex calls, described in Chapter 10.
Far from creating a general-purpose language, you may wish to create, for
a user, a language that is deliberately limited, so the user doesn't get into trouble.
The credit Evaluation software described in Chapter 10 is an example of a pro-
gram designed to meet this goal. The user doesn't need to think step-by-step or
procedurally. The system orders the default rule so it appears last, and it watches
for collisions of the benign form (where an applicant is declined because of mul-
tiple rules), the APR form (where multiple annual percentage rates are entailed
by multiple rules and we select the highest), and the fatal form (where the rules
are contradictory).
Such languages have a tendency to disappear into the woodwork; that is, the
business rules are stored as data. An example would be where you discover that
a SQL column's value is sent to a transaction center and parsed, and used to drive
a SELECT statement strangely akin to the interpreter method of quickBasicEngine.
In fact, logic as data may be said to occur when a data field drives a process.
A customer name field does not drive a process-it's just data. On the other
hand, a customer request code field used to select from a large number of
options does drive a process.
• Object-oriented or traditional
• Backdoor problems
• Data typing
270
Language Design: Some Notes
Object-Oriented or Traditional
Your first instinct, in all probability, will be to keep it simple for the end user and
not implement object orientation, because the end user wouldn't understand it.
Well, you need to look deep within yourself, for many times we say the end user
won't understand it, when the truth is that we don't understand it.
In fact, the experience of the early designer (the late Ole-Johan Dahl) of
the prototype object-oriented language, Simula, was that end users found the
object-oriented paradigm far more understandable than the procedural para-
digms of Fortran and Algol, because code was tightly coupled to objects familiar
in the industrial shop floors where the end users worked. Dahl did not babble
on in computerese about new processes and new files, but instead about Simula
proto-objects with a clear relationship to the daily work of the shop floor.
One problem, which I haven't addressed in this book, is the question of how
to develop a compiler and/or interpreter for object-oriented code. In fact, the
object-oriented paradigm itself comes to the rescue. An investigation into the
System namespace of CLR will make it clear that object-oriented approaches are
closed in the benign sense; a closed system is one whose objects combine to form
new members of the same system. An Object is an Object is an object, and in par-
ticular, within an object-oriented compiler within an object-oriented language,
an Object can be represented by an object. Contrast this with traditional develop-
ment, whether of complex MIS programs or compilers.
Entities mUltiply within MIS and compiler development. For each user
object, a table or file is typically created, and the designers focus on its care
and feeding-often in excess of what the user wants. On the other hand, object-
oriented development tries to ensure a one-to-one mapping between the nouns
that the user wants and what we are working on.
Within traditional compilers, there was often a one-to-many, or even many-
to-many, relationship between what the user (here, the application programmer)
wanted and the entities of the compiler. For example, the IBM Fortran compiler
I encountered in 1972 (described in Chapter 1) was divided into 99 phases. These
phases had nothing to do with Fortran per se and instead were necessitated by
the small storage of the machine. The designers had to write special code to
manage the transition between phases.
In an implementation of an object-oriented system like .NET, whether the
proprietary implementation created by Microsoft or the open implementation
created by the Mono organization (see http://www •go-mono. org), many, many
objects need to be created. Using tables would create many more relationships
between the target and the implementation; therefore, the only sensible way to
develop an object-oriented system seems to be with an object-oriented approach.
271
Chapter 11
Interpreted or Compiled
Traditional procedural languages fell into two broad categories: interpreted or
compiled. Languages like Fortran, Algol, and C were meant to be compiled to
efficient object code. But soon after the introduction of these tools, a need was
seen for fast compile times, even at the expense of efficiency, especially in one-
time proto typing in industrial settings and student coding in universities.
An early effort was PUFFT, described in Chapter 1, which compiled to a sort
of bytecode in order to provide Purdue students the ability to get their assign-
ments done on time in a mainframe environment. Another early effort was
Basic, whose first compilers compiled to generally undocumented bytecodes.
NOTE An old joke: How do you debug a C program? Answer: Change your
major. The attraction ofCompSci 101 for Boneheads is that ifyour programs
work. you get an A. while in Literarily Theorizing Jane Austen and Relating It
to Women and Their Lives in the Post-Colonial Era. you need to behave your-
self The attraction is also the downSide-if your code doesn't work. you get an
E Therefore, students demand good turnaround. which PUFFT was the first
to provide.
Uyou want, through your language, to provide fast turnaround and com-
prehensive debugging, the language should be designed with this goal in mind.
Often, but not universally, such languages use a single, weak. type to represent
all data. This way, the debug messages can present data easily. A popular weak.
type is the string, which is the least narrow value object in .NET. But if you
desire efficiency, the language will need strong data types, as described in the
"Data Typing" section coming up soon, because you need to avoid runtime
conversions.
However, the issue of security, if it is one of your issues, complicates this dis-
tinction. Interpreted, weakly typed languages like VBA have been found to host
crude viruses. These aren't anything like the very vicious, industrial-strength
viruses (like SoBig in the summer of 2003), which are coded by extremely knowl-
edgeable, if evil, people. Rather, they are like the Outlook viruses, which used an
innocent and helpful feature to run macros. The Outlook viruses were run by
unsuspecting users in the late 1990s when they opened certain e-mail. These
viruses executed in a pesky and self-replicating fashion. This is why the CLR for
.NET is strongly typed: so that remote platforms can determine what a remote
executable will do, as far as possible.
If your language is interpreted and weakly typed, you need to determine
whether its use will create exposure in the form of crude Trojans, viruses, and
worms, and whether, in its intended environment, this risk is acceptable.
272
Language Design: Some Notes
Backdoor Problems
This section describes some business exposures that can be unintentionally cre-
ated by overeager compiler developers. I call these "backdoor" problems.
One little-known problem can be described as unintentionally (or inten-
tionally) giving away the store. Consider the eval, evaluate, and run methods of
quickBasicEngine (described in Chapter 7). The eval method evaluates an expres-
sion in source form and returns its value. The evaluate method does the same job
using the current settings of quickBasicEngine. The run method acts as if a string
contained a QuickBasic program, and it compiles and interprets the string.
If you plan to sell a language for money, you need to know that providing
this level of functionality will mean that for common use, your users don't need
extra copies of the compiler. Instead, because the compiler runs at interpretation
time, the user simply can build a GUI around your compiler to get a new copy.
In open source, university, and older large-corporate environments, this
anxiety about giving away the code doesn't exist. For example, the Rexx language
for running interpreted code contained a function that executed Rexx source
code. However, at the time I encountered Rexx, its use was restricted to universi-
ties and large companies running the Conversational Monitor System on IBM
mainframes.
Backdoor problems exist whenever language facilities are of one class such that
any object can be explicitly processed (as in Reflection) by another object. They can
have unpredictable effects when programmers discover unintended uses for the
facility, and you should assess their impact if you want to make money. If, on the
other hand, you don't want to make money, this is not a concern.
Data Typing
quickBasicEngine provides a selection of data types current when QuickBasic was
in vogue: Boolean, Byte, Integer, Long, Single, Double, String, Variant, and Array.
The native types supported in the CLR are Boolean, Byte, (l6-bit) Short,
(32-bit) Integer, (64-bit) Long, Single, Double, String, and Object. A strongly typed
language designed for .NET should probably support these data types. The CLR
also supports, on behalf ofC# and C++, the unsigned integer types Unsigned Short,
Unsigned Integer, and Unsigned Long (which will also be supported natively in the
next version of Visual Basic .NET).
Beyond this starter set, you may decide that the user's needs demand a new
primitive type. For example, the hash table Collection of traditional Visual Basic
genuinely replaces the need to create special-purpose code for fast access to
tables.
However, one lesson from the PLII language is still germane: A language can
become bloated with a variety of cool primitive types to the point where there are
273
Chapter 11
so many features that the programmer doesn't know which ones are optimum.
So, the programmer finds a suboptimum subset and cultivates her own personal
style based on the subset, making her code hard to understand and debug.
274
Language Design: Some Notes
In fact, one of the charges in the anti-Linux lawsuit filed by seo, a company
that owns a commercial Unix, is that Linux's runtime libraries for e programs
were not scaled up to industrial strength until an abortive partnership between
IBM and seo in 2000. At that time, IBM was able to look at e libraries that had
benefited from 15 years of testing and improvement within AT&T, Bell Labs,
and Lucent. seo maintains that a e-written system can be very different,
depending on the libraries.
The situation in kernel operating system design is parallel. This design focuses
on the basic job of any operating system, which is apportioning resources, such
as computing time and 1/0 facilities. We ordinarily think of an operating system
as something like Wmdows 2000, a vast empire of device drivers, DLLs, APIs, and
fun games. However, in kernel design of the operating system, developers focus,
like the hedgehog of the proverb and not the fox, on one thing. A kernel doesn't
drive devices or expose tools for programmers; instead, it gives processes time
slices and access to resources. Around the kernel, various drivers and GUls (also
known as skins) provide the final computing experience to the end user. But all
of these extras must go through the kernel to get work done. The kernel approach
thus restricts the operating system to the basics and allows itself to be retrofit
with different layers of functionality.
RIse design, the e language's use of libraries, and kernel operating systems
demonstrate the power of keeping things simple and focused, and argue strongly
for a language that provides users with definitional capabilities in place of facil-
ities (like the Collection) that they could code, copy, or buy from others.
275
Chapter 11
3. This departs from an exact implementation of QuickBasic, but at this writing, the compiler
isn't standard in all respects, anyway.
276
Language Design: Some Notes
Notice that line 2 pushes False, and line 3 pushes the string "False" for
evaluation by the opEval opcode in line 4, despite the fact that the eval is always
unnecessary. Of course, the False value that is pushed in line 2 could be a vari-
able, a function, or a subexpression. Likewise, the eval could be far more complex
and time-consuming; however, it is always evaluated.
Next, change the And operator to AndAlso, and compile and/ or run the code.
Then click the Zoom button of the RPN box to see the Nutty Professor assembler
language that appears in Figure 11-2.
277
Chapter 11
Floating-Point Math
String Handling
278
Language Design: Some Notes
4. I use quotes because it's news to me that programs should be always easy to write. What's
worth doing well is worth doing slowly, and Ada and Eiffel impose constraints that have been
shown to create better software with less programmer self-abuse.
5. Although I will admit that the ?: operator has its own gnomic charm once you start dweeb-
ing out with it.
279
Chapter 11
LiJllit ~ 10
For I - 1 To Limit
Prine I
Limit - 5
Next I
m e~o a~ IP 30
3/21/2004 6 = 2': 30 PH Runnin9 code e.t Ii' 31
3/21/2004 6; 27: 3'0 PH Running codC! at Ill' 32
{Scanned Tcken~
toten'IypeOpera.t:cr OD
t.oke~Un.lgDt!!!dln.u
toll: n'Z'ype:Na..,l.1.nc en :
toten:ypeJ'cienntlet' l..::j
P Repley
It will print 1 through 5, because the limit can be changed in the loop (which is
nearly universally unsafe practice), and because the second expression in the
semicolon-separated list of expressions in the for loop header is evaluated by
reference in such a way that it reads refreshed values of its operators.
But, if you run the preceding code in qbGUI, Visual Basic 6, or Visual Basic
.NET, the For will print a list of numbers from 1 to 10. It will ignore the change to
the limit.
C's approach is flawed because the Do construct of C already provides this
capability, and missing is the ability to provide a checkable for loop header.
The qbGUI implementation of the rule that for is by value is shown by zoom-
ing and examining the commented assembly language code for the preceding
example, as shown in Figure 11-4.
280
Language Design: Some Notes
Notice in line 13 that, with some pain, we create a stack frame for the For, as
illustrated in Figure 11-5.
281
Chapter 11
stepValue
finalValue
Take a look at finalValue. Its value is pushed on the stack in steps 5 and 10,
and this is why the loop will execute ten, not five, times.
Notice that ctlVariable (I) is pushed on the stack in step 5. Note that it is 2,
which is not 1's value but its location. This is because most dialects of Basic (includ-
ing our quickBasicEngine, Visual Basic .NET, and Visual Basic 6) allow change to the
control variable, although this is terrible practice.6
Change the code in Figure 11-3 as shown here:
Limit 10 =
For I =
1 To Limit
Print I
I = 10
Next I
In qbGU1, as well as in Visual Basic 6, Visual Basic .NET, and most versions of
Basic, this code will print I and stop. This is because, as Figure 11-3 shows, the
control variable will be referenced on the stack, not placed on the stack. Of course,
this is what we want if we need to use the variable in the loop, although to change
it is poor practice.
From the standpoint of syntax, we should, as this example shows, stay as close
as possible to the user's natural expectations of the semantics.
A final syntax consideration, seen in the credit evaluation application in
Chapter 10, is whether syntax is important if the users have a GUI that enters
rules. Generally speaking, it remains a good idea to have a documented "serial-
ization" standard for the business rules, to allow both power users and support
personnel to modify the rules in XML or as straight text files.
6. Unlike MIS programmers, compiler writers cannot be dissing code; we need to compile
pathological, if not psycho, programs.
282
Language Design: Some Notes
Summary
This chapter addressed four important issues regarding language design: the
goals, the semantics, the syntax, and the documentation. The bottom line of all
these considerations is that you need to write the language reference manual
before writing the compiler. If you have a specific target audience in mind, host
a tea party, bun fight, or conference to get them to buy into your goals. Or, more
sensibly, you can just create the new language, unleash it on the Internet, and be
damned; in fact, this is how many useful new languages were created.
The lesson of the failure of Esperanto, an attempt to design a global language,
is applicable. In practice, programming languages behave like real languages,
with dialects, extensions, and pidgins proliferating. The Algol team attempted to
do it right according to the Eurocentric and social-engineering notions popular
in Europe and in American universities in the 1950s. They hosted any number of
international bun fights and meetings, only to discover (as have social reformers
throughout history) that actually getting people to "do it my way" is hard, if not
impossible.
The founder of modem Columbia and Venezuela, Simon Bolivar, compared
revolution to plowing the sea. Many programming managers find that managing
283
Chapter 11
Challenge Exercise
Repeat the experiment we did in the "Lazy vs. Busy And and Or" section of this
chapter with lazy and busy Or. Compile a Or eval(Ub U) to determine what will
happen when a is True and confirm that this will evaluate the eval. Then compile
a OrElse eval(Ub U) to confirm that this will not unnecessarily evaluate the eval.
Conclusion
I never blame myself when I'm not hitting. I just blame the bat, and if it
keeps up, I change bats. After all, if I know it isn't my fault that I'm not
hitting, how can I get mad at myselfl
-Yogi Berra
Many programmers, having learned on the job, are curious about computer "sci-
ence." This book, I hope, has motivated you to use .NET to investigate an area of
computer science unexplored by many programmers.
On September 11, I was appalled by the unprecedented loss oflife. I was
also saddened a few months later when one of the FBI field agents assigned to
tracking the highjackers reported in Congressional testimony that she had no
way to enter simple Boolean queries of the form terroristAssociation AndAlso
attendsFlightSchool. The separate queries were pOSSible, but their Boolean
combination was not, according to the FBI whistle blower, Colleen Rowley.
Had the system been anyone of a large number of mainframe or network-
based systems, it would be, as far as I can tell, simple for a programmer to
develop such queries by defining the BNF of the additional queries using the
techniques in Chapter 4, developing a scanner for identifiers and operators
using the techniques in Chapter 5, and developing a recursive-descent parser as
described in Chapter 7.
284
Language Design: Some Notes
However, the attitude that such techniques are rocket science seems to have
been a minor contributing factor in a tragedy, and if at a minimum, I can show
a proactive approach, I am more than satisfied.
On a more positive note, I feel confident that your new knowledge of the
DNA of computer science, indeed, the way it propagates, gives you a better sense
of how your source code actually runs and illuminates some of the darker comers
oftheCLR.
If you decide to write a production .NET compiler, I urge you to get your
hands on a copy of Aho, Sethi, and Ullman's "dragon book" (Compilers: Principles,
Techniques and Tools), to which I have referred more than once in this book.
That's because I've only scratched the surface and got you started, in the way we
programmers get started: hands-on examples.
When I started out, developing compilers was rocket science. In 1970, com-
piler developers were not in all cases fully aware of how choices made by the
coders of compilers (such as how to evaluate a Boolean operator) were not mere
crotchets and conveniences, but became part of the reality of the compiler. But
many years of intense development in the Unix world under the long-gone
corporate sponsorship of the former AT&T monopoly taught a generation of
programmers how compilers work and are best constructed. I have meant little
disrespect by characterizing these characters as gnomes of Unix (I meant some
disrespect, because that is healthy) .
.NET developers have, in their own quiet way, absorbed the lessons learned,
most especially the value of open standards in particular and glasnost in general.
You can find, for example, a large amount of useful source code in the .NET
releases, including a full C compiler. Partly due to the surprising success of
Linux, more and more products are available as source code, and this trend
will make compiler and parser development a growth field in the future.
In this book, I've shown you an extremely Basic approach towards compiler
design theory. I do not want to give the impression that this is all you need to
know. However, I have seen the power of a low-level, grassroots parser in simpli-
fying a genuine user problem, and this motivated me to write this book.
The technology we use every day should not be a sort of mystery accessible
only to a temple priesthood; this has always tended to retard and even reverse
progress. Although we need to use each other's production, it is nevertheless
good to know how things work. I demur from the Dilbert philosophy, that we
should not worry our pretty, little heads about what goes on under the hood of
society or its technology, and instead take our anger out on hard-working middle
managers for doing their rather thankless job (of herding polecats and losing
golf games with the CEO). In fact (and as Krishna admonishes Arjuna in the
qbGUI Easter egg), knowledge is freedom, for the man or woman who knows the
relations between the forces of nature is no longer their slave. A compiler,
although a mathematical artifact, is part of nature. The rest is television.
285
APPENDIX A
quickBasicEngine
Language Manual
Then anyone who leaves behind him a written manual, and likewise anyone
who receives it, in the belie/that such writing will be clear and certain, must
be exceedingly simple-minded.
-Plato
Plato was wrong. The attitude expressed has caused a lot of mystification and a lot
of damage. French philosopher Jacques Derrida has shown how Plato's preference
for speech over writing (which includes as a sub case the automatic preference for
tutorials "for dummies" over reference manuals) runs through our culture as a pre-
sumption that results in prejudice against a well-meaning reference manual.
But because real programmers (who Plato might consider Sophists) prefer
reference manuals for many purposes, this appendix forms the comprehensive
reference manual for the programming language that is actually supported by
quickBasicEngine. This appendix describes the low-level lexical syntax supported
by quickBasicEngine, the keywords and system functions of this language, and
the parser syntax in Backus-Naur Form (BNF). It then identifies each of the built-
in functions supported by quickBasicEngine.
NOTE The language of quick Basic Engine supports only a subset of the
QuickBasic language, with extensions including the AndOr and OrElse
operators. Also, QuickBasic remains, as a name and as a product, the
intellectual property of Microsoft. QuickBasic, in other words, refers to
the language that was supported by Microsoft's QuickBASIC for MS-DOS
and Windows. quickBasicEngine (expressed in camelCase) refers to the
.NET object that supports a dialect of QuickBasic, where a dialect of a
language is a language that overlaps it, containing most of its features
(but not necessarily all) and extensions.
287
Appendix A
Lexical Syntax
Input for quickBasicEngine consists of the string containing either an executable
program or expression. This string may contain blanks, tabs, and tokens. Outside
strings and comments, blanks and tabs are ignored. This string may consist of
multiple lines, and line breaks (see the newline token in Table A-I) are signifi-
cant. There is no limit on the length of a line.
Table A-I lists the supported token types.
288
quickBasicEngine Language Manual
• You may actually be able to get away with using these names in certain con-
texts because these names are checked in certain contexts and not others.
• Some names might be problematic even though they do not appear in this
list. This applies to names not listed, but, like Option, perform a syntax role.
TIP The best policy is to use Hungarian names that start with an abbrevia-
tion (normally three characters long) for all data. Languages, including Basic,
that rely on keywords with identifier syntax have a slight inherent ambiguity
because the identifier syntax overlaps that of the keyword.
289
Appendix A
290
quickBasicEngine Language Manual
end I
exit I
goSub I
goto I
input I
print I
randomize
read I
return
screen
stop I
trace
, --- The statements
, Assignment
assignmentStmt :~ explicitAssignment I implicitAssignment
explicitAssignment := Let implicitAssignment
implicitAssignment :~ IValue "~" expression
IValue := typedIdentifier [ "(" subscriptList ")"
subscript List :~ expression [ Comma subscript List
, Circle
circle := Circle ( expression Comma expression) Comma expression
, Comment: note: NoNewLine is text that does not contain a newline
comment := Rem NoNewLine
comment :~ Apostrophe NoNewLine
comment :~ EmptyLine
, Data statement
data :~ Data constantList
constant List :~ constantValue [ Comma constant List ]
constantValue :~ number I string
number := [ sign ] unsignedNumber
sign := "+11 III_II
291
Appendix A
, Do condition
doCondition := While I Until expression
, else
else := Else
, End statement
end := End ' (followed immediately by newline)
, endIf
end If := End If
end If := EndIf
exit := Exit [ Do I For I While]
, For header
forHeader := For lValue "=" expression To expression [ Step expression ]
, For next
forNext := Next lValue
, GoSub
goSub := GoSub (UnsignedInteger I identifier I expression
, GoTo
goto := GoTo (UnsignedInteger I identifier I expression )
goto := UnsignedInteger
'If
if := If expression [ Then ] unconditionalStatementBody
if := If expression Then
, Input
input := Input lValueList
lValueList := lValue [ Comma lValue
, Loop or Wend
100pOrWend := Wend I ( Loop [ whileUntilClause ] )
whileUntilClause := ( WHILE I UNTIL ) expression
, Print
print : = Print expression List [ ";"
expression List := expression [ Comma expression List
, Randomize
randomize := Randomize
, Read data
read := Read lValueList
, Return from a GoSub
return : = Return
, SCREEN n command (does nothing)
screen := Screen UnsignedInteger
, Stop
stop := Stop
, Trace
trace := "Trace Push"
trace : = "Trace Off"
292
quickBasicEngine Language Manual
293
Appendix A
Built-In Functions
Table A-31ists the quickBasicEngine built-in functions and describes their use.
294
quickBasicEngine Language Manual
Function Description
Eval(s) Evaluates the string s considered as an expression that is
acceptable to quickBasicEngine. The evaluation is performed
using the default properties of the quickBasicEngine class.
Evaluate(s) Evaluates the string s considered as an expression that is accept-
able to quickBasicEngine. The evaluation is performed using the
properties of the quickBasicEngine instance performing the
Evaluate function.
Floor(n) Returns the largest integer that is less than or equal to n. Note
that when n is negative, this will still return the largest integer
greater than or equal to n; for example, while floor(2.05) is 2,
floor( -2.05) is -3.
Iif(a,b,c) Evaluates the expression in a. If the value is True (any number
other than zero), returns the value of the expression in b. If the
value is False (0), returns the value of the expression in c. Note
that whether a is True or False, this function will fully evaluate
the b and the c expressions.
295
Appendix A
296
APPENDIX B
quickBasicEngine
Reference Manual
We have to be simple simply for lack of time
-Jacques Derrida
This appendix documents the properties and methods (known jointly as the pro-
cedures) exposed by quickBasicEngine, as well as its references, with the exception
of the utility OILs: utility, windowsUtilities, collectionUtilities, and zoom. Full
documentation of the utility OILs is available in the source code for these tools.
We should be as simple as possible, but no simpler (as AI Einstein said) in
the time available, as the French philosopher Jacques Oerrida implies.
This is the original design document for the compiler. When I sit down to
code, I first write a design document. It was kept up to date while coding, and
even after flooding my laptop with a Starbuck's vente.
This document describes the standards followed by each class and the pro-
cedures exposed by each class. The following classes are described:
• qbOp
• qbPolish
• qbScanner
• qbToken
• qbTokenType
• qbVariable
• qbVariableType
• quickBasicEngine
297
AppendixB
Class Standards
Properties of each class start with an uppercase letter; methods start with a low-
ercase letter. Any method, which does not otherwise return a value, will return
True on success or False on failure.
All classes except qbOp and qbtokentype have state in the form of variables in
General Declarations that persist between procedures but which goes away when
the class is destroyed. Stateful (as opposed to stateless) classes can be usable or
not usable.
During the execution of the constructor procedure for the stateful class, it is
unusable. On successful completion of the constructor, the class object instance
becomes usable, and it remains usable until the class is disposed (or otherwise
terminated) or a serious internal error is found. Serious internal errors include
bugs in the code of the object, whether from errors in the original code or through
modification, and "object abuse" (the use of the object after a serious error has
been discovered and reported). When the object is not usable, most Public prop-
erties and methods will report an error when called and return a suitable default
value.
All classes implement an informal interface known as the core methodology.
It is informal because classes don't implement a file containing procedures in the
methodology; instead, they tend to implement the core procedures shown in
Table B-1 consistently.
298
quickBasicfngine Reference Manual
The stateless object qbOp is of necessity fully tbreadable; multiple copies may
run in multiple threads, and all procedures are Shared (Static in e# terms). Other
than the qbOp object, the other objects are serially threadable. Multiple copies
may coexist in parallel threads, but the same copy cannot run more than one
non-Shared method in the same thread. quickBasicEngine is stateful but fully
threadable.
Each serially threadable object organizes its state into a structure with the
name TYPstate and an instance of the TYPstate called USRstate. quickBasicEngine,
because it is fully threadable, organizes its state into an OBJstate object, which
contains the USRstate. This makes it much easier to lock the state using Synclock.
Note that each method that doesn't otherwise need to return a value is none-
theless coded as a Boolean function, and returns True on success and False on
an error. Although this standard produces, at times, some strange code (such as
functions that always return True), it is maintained for consistency.
299
AppendixB
qbOp
The qbOp stateless class identifies the operators supported by the non-CLR Nutty
Professor machine as a large enumerator, and it provides Shared conversion
tools for enumerator values.
qbOp includes references to utilities.DIl...
qbOp is stateless and is fully threadable. Multiple instances can run simulta-
neously in multiple threads, and multiple procedures may be executed in the same
instance in multiple threads.
300
quickBasicEngine Reference Manual
Stack Template
The template describes what an opcode requires on the stack. The template is
a string containing the comma-separated list of expected stack values, from
lower down in the stack to the top of the stack. The template is defined inside
the op description statement in the opCodeToDescription method.
Each stack value must be one of the following:
• <name>: Where name is the name of one of the values of the ENUvarType enu-
merator, this specifies that the stack value is restricted to the varType.
301
AppendixB
qbPolish
The qbPolish class represents one instruction to our non-CLR Nutty Professor
machine.
References of qbPolish are qbOp.DLL, qbVariable.DLL, and utiIities.DLL.
qbPolish is serially threadable. Multiple instances can run simultaneously in
multiple threads, but errors will result if one object's procedures run in multiple
threads and in parallel.
• enuOpCode: Operation code (see Chapter 8 for a list of the supported opcodes)
• intStartlndex: Start index of the source code responsible for this instruction
302
quickBasicEngine Reference Manual
• The start index corresponding to the operation in the source code must be
I or greater.
An internal inspection is carried out in the constructor (after the object con-
struction steps are complete) and in the dispose method (before the reference
objects in the state are disposed of).
303
AppendixB
Public Function mkUnusable Method that forces the object instance into
As Boolean the unusable state; it always returns True.
Public Property Name() As String Read-write property that returns and can
set the name of the object instance, which
will identify the object in error messages
and on the XML tag that is returned by
object2XML. The name defaults to
qbPolishnnnn date time, where nnnn
is a sequence number.
Public Overloads Function Method that converts the state of the object
object2XML() As String toXML.
Public Overloads Function Optional overload of object2XML that
object2XML(ByVal booHeaderComment controls the commenting of the XML strings
As Boolean) As String that are returned: object2XML(False)
returns XML with no header comment.
Public Overloads Function Optional overload of object2XML that controls
object2XML(ByVal booHeaderComment the commenting of the XML strings that are
As Boolean, ByVal booLineComments returned. The booHeaderComment parameter
As Boolean) As String controls the generation of the block header
comment. The booLineComments parameter
controls the generation of a line of explana-
tory comment for each XML element.
304
quickBasicEngine Reference Manual
qbScanner
The qbScanner class scans input source code for the quickBasicEngine and pro-
vides, on demand, scanned source tokens and scanned lines of source code. This
class uses "lazy" evaluation, scanning the source code only when necessary and
when an unparsed token is requested.
References of qbScanner include collectionUtilities.DLL, qbToken.DLL,
qbToken'JYpe.DLL, and utiIities.DLL.
305
AppendixB
• intlast: Index of the last token parsed or zero when no tokens have
been parsed
306
quickBasicEngine Reference Manual
• Each token in both the array of scanned tokens and the array of pending
tokens must pass the inspect procedure of qbToken.
• The tokens in the scanned array must be in ascending order; gaps are
acceptable but not overlaps.
• No token's end index may point beyond the end of the source code in
either the scanned array or the pending array.
• The format of the line number index collection positive integers must be
valid. This is a collection of three-item subcollections. Item(l) must be
a string containing the key of the index entry. ltem(2) and item(3) must be
positive integers. Item(2) cannot be zero.
• If the (nonnull) code is fully scanned, the first token's start index should be
the same as the position of the first nonblank character in the source
code. The last token's end index should be the same as the position of the
last nonblank character.
• If the code is null and indicated as fully scanned, the scan count must
be empty.
An internal inspection is carried out in the constructor (after the object con-
struction steps are complete) and the dispose method (before the reference objects
in the state are disposed). Note that the dispose inspection may be suppressed
using the overload dispose(False).
307
AppendixB
Public Overloads Function checkToken Method that checks the scanned tokens for
(ByRef intIndex As Integer, ByVal strValueExpected. If it finds the expected value, it
strValueExpected As String, Optional increments a token index. intIndex should be an
ByVal intEndIndex As Integer = 0) Integer, passed by reference. The scan token at this
As Boolean index is checked. On success, this integer is incre-
mented; on failure, it is unchanged. strvalueExpected
is compared to the source code, disregarding case
differences. The optional parameter intEndIndex
can be used to restrict the check to all tokens up to
and including the token at the specified end index.
See also checkTokenByTypeName.
Public Overloads Function checkToken Method that checks the scanned tokens for the
(ByRef intIndex As Integer, ByVal type in enuTypeExpected. If it finds the expected
enuTypeExpected As qbTokenType. token type, it increments a token index. intlndex
qbTokenType.ENUtokenType, Optional should be an Integer, passed by reference. The scan
ByVal intEndIndex As Integer = 0) token type at this index is checked. On success this
As Boolean integer will be incremented. The optional param-
eter intEnd Index can be used to restrict the check
to all tokens up to and including the token the spe-
cified end index. See also checkTokenByTypeName.
Public Overloads Function Method that checks the scanned tokens for
checkTokenByTypeName(ByRef intIndex strValueExpected. If it finds the expected token
As Integer, ByVal strTypeExpected type (identified using its name), it increments
As String, Optional ByVal intEndIndex a token index. intIndex should be an Integer,
As Integer = 0) As Boolean passed by reference. The scan token type at this
index is checked. On success, this integer is incre-
mented. The optional parameter intEndIndex can
be used to restrict the check to all tokens up to and
including the token the specified end index. See
also checkToken.
Public Function clear As Boolean Method that clears the source code and resets the
scan.
308
quickBasicEngine Reference Manual
Public Function compareTo(ByVal Method that compares the object instance with the
objScanner As qbScanner) As Boolean scanner object passed in objScanner, returning
True when the source code in the instance is iden-
tical, after tokenization, to the object code. The
source code in the instance may have a different
white space pattern from the source code in
objScanner. The qbScanner clone always produces
an object that returns True when compared to the
source. (All objects that compare to a given object
are token-identical, but not all are clones, because
a clone will be white-space-identical in addition to
being token-identical.) This method implements
IComparable.
Public Overloads Function dispose Method that disposes of the object and cleans up
As String any reference objects in the heap. This method
marks the object as unusable. This overload will
always conduct an internal inspection of the object
instance (using the inspect method), and an error
is thrown if the inspection failed. For best results,
use this method when you are finished using the
object in code. See the next method for an overload
that allows inspection to be skipped.
Public Overloads Function dispose Method that disposes of the object and cleans up
(ByVal boolnspect As Boolean) As String any reference objects in the heap. This method
marks the object as unusable. This overload inspects
the object instance, unless dispose (False) is used.
For best results, use this method when you are fin-
ished using the object in code.
309
AppendixB
Public Overloads Function findToken Method that searches the scanned tokens left to
(ByVal intIndex As Integer, ByVal right, starting at intIndex, for the expected value in
strValueExpected As String, Optional strValueExpected, ignoring case differences. Ifit
ByVal intEndIndex As Integer = 0) finds the expected value, it returns the scan index of
As Integer the token. If it does not find the expected value, it
returns O. The optional parameter intEndIndex can
be used to restrict the search to the token up to and
including the token at the specified end index.
Public Overloads Function findToken Method that searches the scanned tokens left to
(ByVal intIndex As Integer, ByVal right, starting at intIndex for the expected token
enuTypeExpected As qbTokenType. type. If it finds the expected type, it returns the
qbTokenType.ENUtokenType, Optional scan index of the token. If it does not find the
ByVal intEnd Index As Integer = 0) expected type, it returns O. The optional parameter
As Integer intEndIndex can be used to restrict the search to
the token up to and including the token at the
specified end index.
Public Overloads Function findToken Method that searches the scanned tokens left right,
ByTypeName(ByVal intIndex As Integer, starting at intIndex, for the expected type named
ByVal strTypeExpected As String, in strTypeExpected, ignoring case differences. If it
Optional ByVal intEndIndex As finds the expected value, it returns the scan index
Integer = 0) As Integer of the token. If it does not find the expected value,
it returns O. The optional parameter intEndIndex
can be used to restrict the search to the token up to
and including the token at the specified end index.
Public Function inspect(ByRef strReport Method that inspects the object. The report param-
As String) As Boolean eter should be a string, passed by reference; it is
assigned an inspection report. See the "qbScanner
Inspection Rules" section preceding this table.
310
quickBasicEngine Reference Manual
Public ReadOnly Property Indexed, read-only property that returns the char-
LineStartIndex(ByVal intLine As acter starting index, from 1, of the source code
Integer) As String contained in the line numbered intLine (number-
ing starts at 1). Use of this property forces a complete
scan. Continuation lines count as distinct lines.
Public Function mkUnusable As Boolean Method that forces the object instance into the
unusable state; it always returns True.
Public Property Name() As String Read-write property that returns and can set the
name of the object instance, which identifies the
object in error messages and on the XML tag that is
returned by object2XML. The name defaults to
qbScannernnnn date time, where nnnn is a sequence
number.
311
AppendixB
312
quickBasicEngine Reference Manual
313
AppendixB
Public Property SourceCode () As String Read-write property that returns and may be set to
the source code for scanning. Assigning source
code clears the array of tokens in the object state,
but does not result in an immediate scan of the
source code. Scanning occurs when the QBToken
property is called and the token is not available.
Public Overloads Function sourceMid Method that returns the source code that com-
(ByVal intStartIndex As Integer) mences at the token at intStartIndex (a token
As String index, not a character index).
2. At this writing, the only error detected occurs when unrecognizable characters are found.
314
quickBasicEngine Reference Manual
Public Function tokenTypeAsString Method that returns the type of the token at
(ByVal intIndex As Integer) As String intIndex as a string. See qbTokenType for the
possible values of ENUtokenType enumerators,
which convert directly to string values.
3. At this writing, this method will result in a full scan of the input source code.
315
AppendixB
qbToken
The qbToken class defines one scan token as used in quickBasicEngine, including
its start index, length, type, and its line number.
References of qbToken include qbTokenType.DLL and utilities.DLL.
qbToken is serially threadable. Multiple instances can run simultaneously in
multiple threads, but errors will result if one object's procedures run in multiple
threads and in parallel.
The qbToken class is ICloneable: see its clone method.
316
quickBasicEngine Reference Manual
code. Instead, the user code is expected to use the start index and the length to
get the raw source code.
317
AppendixB
4. In the specific case of qbTokens, at this writing, there are no reference objects in the state for
cleanup. The dispose is provided for consistency, to allow for future growth and to mark the
object as unusable.
318
quickBasicfngine Reference Manual
319
AppendixB
Public Function typeToString(ByVal Method that returns the string value of the type
enuType As ENUtokenType) As Integer assigned to the current instance.
Public ReadOnly Property Usable() Read -only property that returns True if the object
As Boolean instance is usable; False otherwise.
qbTokenType
The qbTokenType class merely defines the token types recognized by the qbScanner
and qbToken classes.
Token Types
Table B-6 defines the token types in the ENUtokenType enumerator that is exposed
by the qbTokenType class.
tokenTypeColon Colon
tokenTypeComma Comma
tokenTypeIdentifier Identifier
tokenTypeNewline Newline
tokenTypeOperator Operator
tokenTypeSemicolon Semicolon
tokenTypeString String
320
quickBasicEngine Reference Manual
qbVariable
The qbVariable class represents the type, structure, and value of a quick basic
scalar, an n-dimensional QuickBasic array, or a user data type.
References of qbVariable include collectionUtilities.DLL, qbScanner.DLL,
qbTokenType.DLL, qbVariableType.DLL, and utilities.DLL.
This class implements IDisposable, ICloneable, and IComparable.
qbVariable is serially threadable. Multiple instances can run simultaneously
in multiple threads, but errors will result if one object's procedures run in multi-
ple threads and in parallel.
Scalar: Scalars are of type Boolean, Byte, Integer, Long, Single, Double, or
String. The structure of a scalar is just its type. Data is the data associated
with the variable. For a scalar, the data is represented by the correspond-
ing .NET type with two important exceptions: QuickBasic Integers are
represented by .NET Short integers, because .NET Integers are 32-bit,
while QuickBasic Integers are 16-bit. QuickBasic Longs are represented
by .NET Integers, because .NET Longs are 64-bit, while QuickBasic
Integers are 32-bit.
321
AppendixB
NOTE The Array structure is sometimes referred to as a dope vector. Here, the
dope concept is generalized to use dope as a synonym for the structure ofany
variable.
5. The only semi-useful ability to start an array at a lower bound, other than one, was dropped
by.NET.
6. These are orthogonal in the sense that each subcoUection has an identical number of
members.
322
quick8asicEngine Reference Manual
323
AppendixB
quoted string, the type is the narrowest scalar QuickBasic type that can contain
the value. If the value string is in parentheses, contains a comma-separated list,
or both, the type is Array, and the array's entry type is determined by examining
the values in the array. If they all convert to a single type, the array's type is this
type. If they all convert to more than one type, the array's type is Variant. The
type may not be omitted when a UDT is specified.
jromString Types
The type should be the variable type in the syntax supported by
qbVariableType. fromString and one of the following values, depending of the
overall type:
• For a scalar type, the type should be one of Boolean, Byte, Integer, Long,
Single, Double, or String.
• For an Array, the type should be Array, type, bounds, where type is the name
of a scalar type, the keyword Variant, or a parenthesized UDT definition.
The type of variant arrays is specified "abstractly" and with no associated
scalar type.
• For a UDT, the type should be VDT, memberlist, where the member list
consists of one or more comma-separated and parenthesized member
definitions. Each definition in the member list has the parenthesized form
(name, type), where name is the member name and type is its type. The type
must be scalar, abstract Variant, or Array.
jromString Values
The fromString expression value should specify the variable value(s). If the vari-
able is a scalar or a Variant that does not contain an array, the value may be the
scalar's value (compatible with its type) as True, False, a number, or a string,
quoted using Visual Basic's conventions.
324
quickBasicEngine Reference Manual
NOTE When the variable is not otherwise known to be an array (when, for
instance, the variable rype is omitted from the fromString expression), the use
ofa repeat count will make the variable into an array.
For a UDT, the value should be the comma -separated list of member values.
Each member that is a scalar or the scalar value of a Variant member should be
its value in string form or in the decorated form type(value). Each member that
is an array should be the array's value, represented orthogonally (as described in
the previous paragraph) and enclosed in parentheses. Each member that is a UDT
should be the nested UDT specification, in parentheses.
For Unknown and Null types, values (and its preceding colon) should not be
specified.
The syntax: value (colon and value without a type) may be used to change
the value of the variable without altering the type. The value must be compatible
with the existing type, unless the existing type is Unknown; in this case, the type will
be changed to the narrowest QuickBasic type capable of containing the value.
TIP You can run the qbVariableTest executable (provided with the sample
code) and try each example. Type it in the text box at the top of the screen and
click Create to make sure the example creates the qbVariable object. Then
click the toString button to verify that the fromString expression converts to
the variable and type specified in the examples.
325
AppendixB
Integer:4
Variant, Integer:4
specifies a variant that contains a I6-bit integer containing the value four.
Array,Integer,O,3:1,2,3,4
Array,Integer,1,2,1,2:(1,2),(1,2)
Array,Variant,1,2:Integer(1),Long(2)
32768
:32767
assigns 32767 to a prespecified type. When set after the previous example, : 32767
will preserve the type of Long integer. When assigned to an uninitialized variable,
: 32767 creates a 16-bit integer.
Array, Byte,o,1
specifies a Byte array that contains the Byte default values of O. The to String will
be Array, Byte, 0, 1:*.
Array,Byte,O,l:*,l
specifies a Byte array that contains the Byte default value of 0 followed by 1. The
toString will be Array, Byte, 0,1: *,1.
Array,Variant,o,l,l,2:(32767,IB"),(32768,l)
326
quickBasicEngine Reference Manual
• With 20% probability, the fromString will represent a scalar, and with equal
subprobability this will be any of the types Boolean, Byte, Integer, Long,
Single, Double, or String.
• With 20% probability, it will be an array, and this array will contain a vari-
able that has 50% probability of being a Variant and will otherwise be
a random scalar.
• With 20% probability, it will be a UDT, and this UDT will randomly contain
1..10 scalars, Arrays, Variants, and UDTs. Each type will have 25% probability.
• With 20% probability, it will be a Variant, and this Variant will contain a
variable that has these type probabilities, with one exception: there is a 70%
probability that the variable will be a scalar, and no probability that the
variable will be a Variant.
327
AppendixB
fromString := fromStringType
fromString := fromStringValue
fromString := fromStringWithValue
fromString := fromStringType COLON fromStringValue
fromString := COLON fromStringValue
fromStringType := baseType I udt
baseType := simpleType I variantType I arrayType
simpleType := [VT] typeName
typeName := BOOLEAN IBYTE IINTEGER ILONG ISINGLE IDOUBLE ISTRING I
UNKNOWNINULL
variantType := abstractVariantType COMMA varType
varType := simpleTypel(arrayType)
arrayType := [VT] ARRAY,arrType,boundList
arrType := simpleType I abstractVariantType I parUDT
parUDT := LEFTPARENTHESIS udt RIGHTPARENTHESIS
udt := [VT] UDT,typeList
typeList := parMemberType [ COMMA type List ]
parMemberType := LEFTPAR MEMBERNAME,baseType RIGHTPAR
abstractVariantType := [VT] VARIANT
boundList := boundListEntry I boundListEntry COMMA bound List
boundListEntry := BOUNDINTEGER,BOUNDINTEGER
simpleType := [VT] typeName
typeName := BOOLEAN IBYTE IINTEGER ILONG ISINGLE IDOUBLE ISTRING I
UNKNOWNINULL
variantType := abstractVariantType,varType
varType := simpleTypel(arrayType)
arrayType := [VT] ARRAY,arrType,boundList
arrType := simpleTypelabstractVariantType
abstractVariantType := [VT] VARIANT
boundList := boundListEntry I boundListEntry, bound List
boundListEntry := BOUNDINTEGER,BOUNDINTEGER
fromStringValue := ASTERISK I fromStringNondefault
fromStringNondefault := arraySlice [ COMMA fromStringValue ] *
arraySlice := element Expression I ( fromStringNondefault )
element Expression := element [ repeater ]
element := scalar I decoValue
scalar := NUMBER I VBQUOTEDSTRING I ASTERISK I TRUE I FALSE
decoValue := quickBasicDecoValue I netDecoValue
quickBasicDecoValue := QUICKBASICTYPE ( scalar )
netDecoValue := netDecoValue := [ SYSTEM PERIOD ] IDENTIFIER
LEFTPARENTHESIS ANYTHING RIGHTPARENTHESIS
repeater := LEFTPAR ( INTEGER I ASTERISK ) RIGHTPAR
328
quickBosicEngine Reference Manual
• The variable type object objDope must pass its own inspection procedure.
It must be Unknown or an array type. lfthe dope is Unknown, the objValue
must be Nothing and the following tests are skipped.
7. At this writing, qbVariable does not support variants that contain arrays, although the
fromString syntax allows their specification. This rule should be changed to allow variants
that contain arrays when code is added to fully support this feature.
329
AppendixB
• The toString serialization of the variable must create a clone of the vari-
able when used with fromString. However, Variants, Arrays, and UDTs are
not subject to this rule
• The empirical dope of the variable must be consistent with its recorded
type. The empirical dope (the type as determined by examination of the
value) must be either the same as or contained in the type. Only scalars
are subject to this rule.
• If the variable is a Variant, its Variant type must match the type of its entry
as seen in the decorated value when the variable is serialized using
toString. For example, Variant J Byte: Integer( 256) is not valid.
330
quickBasicEngine Reference Manual
Public Overloads Function Method that returns True when the qbVariable object in
containedVariable(ByVal objVariable2 is contained in the instance as described
objVariable2 As qbVariable, in the preceding section. If the object instance is aUnT
ByRef strExplanation As True) or objVariable2 is a UnT, this method returns False.
As Boolean The strExplanation parameter is set to an explanation
of why the containment relation is True or False.
Public Function derefMemberName Method that is valid only for variables that are UnTs.
(ByVal strName As String) As strName should be the name of a UnT member, and this
qbVariable method returns the qbVariable object, contained
directly or indirectly in the overall instance, identified
by n. strName may be a simple member name. If it
selects a member that is aUnT, strName may be simple,
in which case, it returns the UnTo strName may also
select submembers when periods separate names. For
example, if a unT contains UDT01, and UDTOl contains
intVal, then this method returns the object corres-
ponding to intVal when strName is udtOl. intVal.
Public Sub dispose() Method that disposes of the heap storage associated
with the object (if any) and marks the object as not
usable. For best results, use this method (or
disposelnspect) when you are finished with the object.
Public Function disposelnspect() Method that disposes of the heap storage associated
As Boolean with the object (if any) and marks the object as not
usable. For best results, use this method (or dispose)
when you are finished with the object. This dispose
method conducts a final object inspection. See
"qbVariable Inspection Rules" preceding this table.
331
AppendixB
Public Function inspect(ByRef Method that inspects the object instance for errors
strReport As String) As Boolean resulting from bugs in the original code, bugs in the
code as changed, or object abuse in the form of using
the object after a serious error has already occurred. An
internal inspection is carried out when the object is
constructed and inside the disposelnspect method. If
the inspection fails, the object is marked unusable. See
the preceding section "qbVariable Inspection Rules."
8. Since a valid instance contains type information, this method is primarily a curio, for intemal
use and to clarify the concept of deriving a type from data only, which we need when chang-
ing the data of an array without, unnecessarily, changing its structure.
332
quickBasicEngine Reference Manual
Public Function isScalarO As Boolean Method that returns True when the object instance
represents a scalar variable or a Variant that contains
a scalar value.
Public Shared Function Shared method that returns a random variable with
mkRandomVariable() As String random type and value, as an expression that is valid
input for the fromString method. See the preceding
section "The fromString Expression Supported by
q bVariable."
Public Function mkUnusable Method that forces the object instance into the unusable
As Boolean state. It always returns True.
333
AppendixB
Public Shared Function Shared method that creates and returns a new
mkVariableFromValue(ByVal objValue qbVariable object with the specified scalar value. The
As Object) As qbVariable value operand may be any .NET scalar value of the type
Boolean, Byte, Short, Integer, long, Single, Double or
String. When the value is a string, it should not be
quoted. This method cannot create an array. For example,
mkVariableFromValue("O,l") creates a string. It also can-
not create a variant; the qbVariable will instead have the
narrowest possible scalar, nonvariant type. For example,
mkVariableFromValue(32768) creates a Long integer.
Also see mkVariable.
Public Event msgEvent(ByVal strMsg Event that provides general information. It exposes the
As String, ByVal intlevel As strMsg and intlevel parameters. strMsg is a general
Integer) information message. intlevel should contain a nesting
level starting at 0 and is useful in indenting displays.
To obtain the msgEvent, declare the qbVariable object
Wi thEvents and write the event handler.
Public Property Name() As String Read-write property that returns and can set the name
of the object instance, which identifies the object in
error messages and on the XML tag that is returned by
object2XMl. The name defaults to qbVariablennnn date
time, where nnnn is a sequence number. This property
identifies the object instance. The VariableName prop-
erty identifies its data.
334
quickBasicEngine Reference Manual
Public Sub new Object constructor that creates the qbVariable and
inspects its initial state.
Public Sub new(ByVal strFromString Overloaded object constructor that creates the
As String) qbVariable and inspects its initial state. It sets the ~e
and the value of the new qbVariab1e to the strFromstring.
For example, objQBvariable = New qbVariable
("Integer:4") creates the variable with ~e Integer
and value 4.
Public Overloads Function Method that converts the state of the object to XML.
object2XMl() As String
Public Event progressEvent(ByVal Event that indicates progress through a loop inside one
strActivity As String, ByVa1 of the stateful procedures of qbVariab1e. strActivity
strEntity As String, ByVal describes the activity or goal of the loop. strEntity
intEntityNumber As Integer, identifies the entity being processed. intEnti tyNumber
ByVal intEntityCount As Integer, is the entity sequence number from 1. intEntityCount
ByVa1 intleve1 As Integer, ByVa1 is the number of entities. intleve1 is the nesting level of
strComments As String) the loop (starting at 0). strConvnents may supply addi-
tional information about the processing in the loop. To
obtain the progress Event, declare the qbVariable object
WithEvents and write the event handler. See also
progress Event Shared.
335
AppendixB
336
quickBasicEngine Reference Manual
Public Overloads Function value() Method that returns the value of the qbVariable as long
As Object as the object instance represents a scalar value, Unknown,
or Null or a Variant that contains a scalar value, Unknown,
or Null.
Public Overloads Function value Method that returns the value of the qbVariable when it
(ByVal intIndex As Integer) As is a one-dimensional array, at the entry indexed by
Object intIndex.
Public Overloads Function value Method that returns the value of the qbVariable when it
(ByVal intIndexl As Integer, ByVal is a two-dimensional array, at the entry indexed by
intIndex2 As Integer) As Object intIndexl and intIndex2.
337
AppendixB
Public Overloads Function valueSet Method that assigns the .NET value objValue to the
(ByVal objValue As Object) As qbVariable as long as the object instance represents
Boolean a scalar value, Unknown, or Null or a Variant that
contains a scalar value, Unknown, or Null.
Public Overloads Function value Set Method that assigns the .NET value objValue to the
(ByVal objValue As Object, ByVal qbVariable when the object instance represents a one-
intIndex As Integer) As Boolean dimensional array, at the entry indexed by intIndex.
Public Overloads Function valueSet Method that assigns the .NET value objValue to the
(ByVal objValue As Object, ByVal qbVariable when the object instance represents a two-
intIndexl As Integer, ByVal dimensional array, at the entry indexed by intIndexl
intIndex2 As Integer) As Boolean and intIndex2.
Public Overloads Function valueSet Method that assigns the .NET value objValue to the
(ByVal objValue As Object, ByVal qbVariable. If strID is a null string, the object instance
strID As String) As Boolean must represent a scalar value, Unknown, or Null or a
Variant with a value that is a scalar, Unknown, or Null. If
strID is not a null string it must be a comma-separated
list of array indexes to access an array value or a UDT
member name. If strID is a UDT member name, it may
be a period-separated series of member names to
modify UDT submembers.
338
quickBasicfngine Reference Manual
qbVariableType
This qbVariableType class represents the type of a quickBasicEngine variable, includ-
ing support for an unknown type and Shared methods for relating .NET types to
QuickBasic types.
References of qbVariableType include collectionUtilities.DLL, qbScanner.DLL,
qbToken1)rpe.DLL, and utilities.DLL.
qbVariableType is serially threadable. Multiple instances can run simultane-
ously in multiple threads, but errors will result if one object's procedures run in
multiple threads and in parallel.
Note that this class implements IDisposable, ICloneable, and IComparable.
• Scalars: Ordinary values with no structure. They can have the type
Boolean, Byte, Integer, Long, Single, or String.
339
AppendixB
• User Data Types (UDTs): Variables that contain 1 ..members, which may
be a mix of scalars, Variants, or Arrays but cannot be nested UDTs.
• Unknown: As its name implies, the type we don't know. In this implemen-
tation, Unknown is assigned to the variable type in the constructor. 9
9. In a planned future EGN implementation of a language for symbolic computation (which will
probably be called FOG), this will be used to actually calculate with mystery values.
340
quick8asicEngine Reference Manual
• Type a and type b are scalars (Boolean, Byte, Integer, long, Single, Double, or
String) and all possible values of type a convert without error to type b.
341
AppendixB
• Type a and type b are arrays, and each dimension of type b contains the
same number of elements as the corresponding dimension of a, or more
elements. The array entry type of a is contained in the array entry type
of b according to this overall definition. Note that lowerBounds of a and b
may differ.
• For a scalar type, the type should be one of Boolean, Byte, Integer, Long,
Single, Double, or String.
• For an Array, the type should be Array, type, bounds, where type is the name
of a scalar type, the keyword Variant, or a parenthesized UDT definition.
The type of variant arrays is specified "abstractly" and with no associated
scalar type.
• For a UDT, the type should be UDT, memberlist, where the member list
consists of one or more comma-separated and parenthesized member
definitions. Each definition in the member list has the parenthesized form
(name, type), where name is the member name and type is its type. The
type must be scalar, abstract Variant, or Array.
342
quickBasicEngine Reference Manual
The following are some examples of fromString expressions with various types.
TIP You can run the qbVariableTypeTester executable (provided with the sam-
ple code) and try each example. 1Ype it in the text box at the top of the screen and
click Create Variable 1Ype to make sure the example creates theqbVariableType
object. Then click the Describe button to verify that the fromString expression
converts to the variable type specified with the exampl£s.
Integer
Variant, Integer
Array,Integer,o, 3
343
AppendixB
• The type that is contained in the Variant or Array must pass its own
inspection; each type in a UDT must likewise pass its own inspection.
• When the object is cloned, the clone must return the same tostring value
as the original object.
• When the fromString value of the object is used to set the value of a new
instance, the compareTo method must indicate that the original instance
and the new instance are identical.
344
quickBasicEngine Reference Manual
An internal inspection is carried out when the object is constructed and inside
the disposeInspect method. If the inspection fails, the object is marked as unusable.
qbVariableType has the capability, supported by the optional parameter
booBasic of the inspect method, to carry out the default, extended inspection or
a basic inspection. If basic inspection is in effect only the first three inspection
rules are applied.
345
AppendixB
Public Shared ReadOnly Property Shared read -only property that returns the class
ClassName name qbVariableType.
Public Shared Function clone As Method that implements ICloneable. It creates a new
qbVariableType qbVariableType with identical type information,
returning it as the function value. ll
Public Overloads Function compareTo Method that compares the object instance to qbVariable2
(ByVal objQBvariableType2 As and returns True when the types in both are the same;
qbVariableType, ByRef strExplanation False otherwise. This method is a wrapper for the pri-
As String) As Boolean vate compareTo_ method, which implements IComparable.
This overload places an explanation of why the types
are identical or different in its strExplanation parameter.
Public Shared Function contained Shared method that returns True when the type identi-
Type(ByVal enuTypel As ENUvarType, fied by enuTypel is contained in the type identified in
ByVal enuType2 As ENUvarType) As enuType2; False otherwise. See the preceding section
Boolean "Containment and Isomorphism of Types."
Public Shared Function contained Shared method that returns True when the type identi-
Type(ByVal enuTypel As ENUvarType, fied by enuTypel is contained in the type identified in
ByVal objType2 As qbVariableType) objType2; False otherwise. See the preceding section
As Boolean "Containment and Isomorphism of Types."
11. At this writing, the clone method runs slowly because it (1) serializes the type information using
tostring into a fromString expression for the type, and (2) uses fromString on the new object to
parse and set the type. A Friend variant of clone is used internally. It copies the state directly, but
this has not been fully tested and is not ready for prime time. It should be fully tested as a replace-
ment for the original clone to make qbVariable1yPe applications run faster.
346
quickBasicEngine Reference Manual
Public Function containedTypeWith Method that returns True when the type identified by
State(ByVal enuTypel As ENUvarType, enuTypel is contained in the type identified in objType2;
ByVal objType2 As qbVariableType) False otherwise. This method works the same way as
As Boolean the corresponding overload of containedType, but it
creates the reusable containment matrix in the state of
the object using it, which will result in faster processing.
Public Function containedTypeWith Method that returns True when the type identified in
State(ByVal objTypel As objTypel is contained in the type identified by
qbVariableType, ByVal enuType2 enuType2; False otherwise. This method works the same
As ENUvarType) As Boolean way as the corresponding overload of containedType,
but it creates the reusable containment matrix in the
state of the object using it, which will result in faster
processing.
Public Function containedTypeWith Method that returns True when the type identified in
State(ByVal objTypel As objTypel is contained in the type identified in objType2;
qbVariableType, ByVal objType2 As False otherwise. This method works the same way as
qbVariableType) As Boolean the corresponding overload of containedType, with the
difference that it creates the reusable containment
matrix in the state of the object using it, which will
result in faster processing.
347
AppendixB
Public Overloads Function Method that returns the default value associated with
defaultValue As Object the type in the object instance. For all scalars, this method
returns False (for the type Boolean), String (for the type
String) or 0 (for the numeric scalar types). For Variants,
Null types, and UDTs, this method returns Nothing. For
nonvariant arrays, this method returns the default type
of the array's entry.
Public Overloads Shared Function Shared method that returns the default value associ-
defaultValue(ByVal enuType As ated with the type specified in enuType. For all scalars,
ENUvarType) As Object this method returns False (for the type Boolean), String
(for the type String) or 0 (for the numeric scalar types).
For Variants, Null types, and UDTs, this method returns
Nothing. enuType may not specify an array.
Public ReadOnly Property Read-only property that returns the number of dimen-
Dimensions() As Integer sions associated with an Array. This property returns 0
with no other error indication when the variable is not
an Array.
Public Sub dispose() Method that disposes of the heap storage associated with
the object (if any) and marks the object as not usable. For
best results, use this method (or disposeInspect) when
you are finished with the object.
Public Function disposeInspect() Method that disposes of the heap storage associated
As Boolean with the object (if any) and marks the object as not
usable. For best results, use this method (or dispose)
when you are finished with the object. This method
conducts a final object inspection. See the preceding
"qbVariable Inspection Rules" section.
Public Function fromString(ByVal Method that sets the variable type to the value serialized
strFromstring As String) As Boolean in strFromstring. See the preceding section "The
fromString Expression Supported by qbVariable1YPe"
for the syntax requirements of strFromstring.
348
quickBasicEngine Reference Manual
349
AppendixB
Public Function isScalar() Method that returns True when the type is scalar; False
As Boolean otherwise. This method returns True only when the type
is one of the scalar types Boolean, Byte, Integer, Long,
Single, Double, or String. This method returns False
for a variant with a scalar contained type.
Public Shared Function Shared method that returns True when the type identi-
isScalarType(ByVal enuType fied in enuType is scalar; False otherwise. This method
As ENUvarType) As Boolean returns True only when enuType is one of the scalar types
Boolean,Byte,Integer, Long,Single,Double,orString.
Public Shared Function Shared method that returns True when the type identi-
isScalarType(ByVal strType As fied by name in strType is scalar; False otherwise. This
String) As Boolean method returns True one when strType is one of the
scalar types Boolean, Byte, Integer, Long, Single, Double,
or String.
Public Function isUDT 0 As Boolean Method that returns True when the type is a UDT; False
otherwise.
Public Function isUnknown() Method that returns True when the type is Unknown;
As Boolean False otherwise.
Public Function isVariant() Method that returns True when the type is Variant;
As Boolean False otherwise.
Public Property LowerBound(ByVal Indexed, read-write property that returns and can change
intDimension As Integer) As Integer the lower bound of an array at the dimension d, which
starts at 1 for the major dimension. If the qbVariableType
isn't an array or d is invalid, an error occurs. The lower
bound may not be changed to a value that is greater
than the upper bound, because this would leave the
qbVariableType object in an invalid state. See redimension
for a method that can change the lower and upper bounds
of an array in one statement.
350
quickBasicEngine Reference Manual
Public Overloads Shared Function Shared method that creates a random scalar value in
mkRandomScalarValue() As Object .NET form as one of Boolean, Byte, (I6-bit Short) Integer,
(32-bit) Long, Single, Double, or String.
351
AppendixB
Public Property Name() As String Read-write property that returns and can set the name
of the object instance, which identifies the object in
error messages and on the XML tag that is returned by
object2XML. The name defaults to qbVariableTypennnn
date time, where nnnn is a sequence number.
Public Shared Function name2NetType Shared method that converts the system's name for a
(ByVal strSystemName As String) .NET type (such as System. Int32) to the generic name of
As String one of the .NET types used to support QuickBasic
variables (such as Integer). name2NetType will convert
System. Int64 to Integer.
Public Shared Function netType2Name Shared method that converts the generic name for a
(ByVal strNetType As String) As .NET type (such as Integer) to the system name of the
String .NETtype (such as System. Int32).
352
quickBasicEngine Reference Manual
Public Shared Function Shared method that returns the narrowest QuickBasic
netValue2QBdomain(ByVal objValue type to which the .NET object in objValue converts
As Object) As ENUvarType without error as an ENUvarType enumerator. This method
returns ENUvarType. vtUnknown when the .NET object does
not convert to any QuickBasic type. It does not return
Boolean, Array, or Variant. It returns one of Byte, Integer,
long, Single, Double, or String. It returns Null when the
.NET value is Nothing. It returns Unknown (with no other
error indication) when the .NET value converts to no
other values. See also netValueInQBdomain.
Public Shared Function Shared method that converts the .NET value in
netValue2QBvalue(ByVal objValue objValue to a .NET value in the narrowest .NET type
As Object) As Object that corresponds to a QuickBasic type.
Public Shared Function Shared method that converts the .NET value in objValue
netValue2QBvalue(ByVal objValue to the .NET type that corresponds to the QuickBasic
As Object, ByVal enuType As type specified in enuType.
ENUvarType) As Object
Public Shared Function Shared method that converts the .NET value in
netValue2QBvalue(ByVal objValue objValue to the .NET type that corresponds to the
As Object, ByVal strType As String) QuickBasic type identified by strType.
As Object
Public Overloads Function Method that returns True when the .NET value in
netValueInQBdomain(ByVal objValue objValue may be converted without error to the
As Object) As Boolean variable type in the object instance.
Public Overloads Shared Function Shared method that returns True when the .NET value
netValueInQBdomain(ByVal enuType in objValue may be converted without error to the
As ENUvarType, ByVal objValue As variable type identified by enuType.
Object) As Boolean
Public Shared Function Shared method that returns True when the .NET value
netValueIsScalar(ByVal objValue in objValue is one that can represent one of the
As Object) As Boolean QuickBasic scalar values.
Public Sub new Object constructor that creates the qbVariableType and
inspects its initial state.
353
AppendixB
Public Shared Function object2Type Method that returns the corresponding fromString
(ByVal objValue As Object) As String expression for its type for any .NET object. If objValue is
a scalar with a .NET scalar type that corresponds to a
QuickBasic type (Boolean, Byte, Short, Integer, Single,
Double, or String), that type is returned. If objValue is
a .NET Long but in the range -2"31..2"31-1, the Long
type is returned. If objValue is any other value, Unknown
is returned.
Public Overloads Function Method that converts the state of the object to XML. By
object2XML(Optional ByVal default, information concerning the variable type cache
booIncludeCache As Boolean = True) won't be included in the output XML, but the optional
As String parameter booIncludeCache may be passed as True to
include the serialized cache information. See the previous
section "Cache Considerations."
Public Shared Function Shared method that converts the type identified by
qbDomain2NetType(ByVal enuDomain enuDomain to the .NET type that is used to contain values
As ENUvarType) As String of this type. The enuDomain cannot be Unknown, Null, or
Array. The .NET type is returned as the string name of
the type; it will be in the form SYSTEM. type.
Public Function redimension(ByVal Method that redimensions an array type to the lower
intDimension As Integer, ByVal bound and upper bound specified. It doesn't allow the
intLowerBound As Integer, ByVal lower bound to be greater than the upper bound, but it
intUpperBound As Integer) As Boolean does avoid the problem that occurs when you need to
sequentially change the lower bound to a value that is
higher than the upper bound, and the upper bound to
a new valid value, or vice versa, using the LowerBound
and UpperBound properties.
Public Function scalarDefault() Method that returns the default value applicable to the
As Object type. If the type is Null or Unknown, it returns Nothing. If
the type is scalar, it returns the default for the scalar type.
If the type is array, it returns the default for the array entry.
If the type is concrete variant (a Variant with a known
embedded type), it returns the default for the embedded
type. If the type is abstract variant (a Variant with an
unknown embedded type), it returns Nothing.
354
quick8asicEngine Reference Manual
Public Shared Function Shared method that converts a string type name to an
string2enuVarType(ByVal strInstring enuVarType enumerator. strInstring is one ofvtBoolean,
As String) As ENUvarType vtByte,vtInteger,vtlong,vtSingle,vtDouble,vtString,
vtVariant, vtArray, vtUDT, vtNull, or vtUnknown. The pre-
fix vt may be omitted, and the name is case-insensitive.
Public Property Tag() As Object Read-write property that returns and can be set to user
data that needs to be associated with the qbVariableType
instance. Tag can be a reference object. If so, the Tag ob-
ject isn't destroyed when the object is destroyed.
Public Overloads Function Method runs tests on the object, and returns True to
test(ByRef strReport As String) indicate success; False otherwise. The strReport refer-
As Boolean ence parameter is set to a test report. The test consists of
four phases: a series of random fromString expressions
are built and used to create new qbVariableType ob-
jects,l2 the defaultValue method is tested to make sure
it provides valid defaults, the testing type containment
methods are run for a series of known results, and finally,
the various "domain-mapping" methods (which convert
.NET values and types to QuickBasic values and types)
are tested. If the test fails, the object instance is marked
as not usable. A test method will be exposed by this object,
unless the compile-time symbol QBVARIABlETYPE_NOTEST
is set explicitly to True in the project properties for
qbVariableType. 13
Public Shared ReadOnly Property Shared method that returns True if the version of
TestAvailable() As Boolean qbVariableType, running this method, was compiled
with the compile-time symbol QBVARIABlETEST_NOTEST
either omitted or set to False.
12. Of course, these new qbVariableType objects are inspected for validity when created.
13. Set the compile-time symbol QBVARIABLETYPE_NOTEST to True in the project to suppress the
generation of the test method.
355
AppendixB
356
quickBasicEngine Reference Manual
Public Function toNameO As String Method that returns the name of the variable type in
the object instance. Unlike toString, it returns only
Variant, Array, or UDT.
Public Overrides Function Method that returns the type in the qbVariableType, in
toString() As String the format described for fromString in the preceding
section "The fromString Expression Supported by
qbVariableType." If the variable is an array, the repre-
sentation returned is packed, condensing series of
identical elements using parenthesized repetition
counts. The representation returns default values as
asterisks if the value of a scalar variable contains the
default value appropriate to its type, or each member
in an array value contains the default. The variables in
the output string are decorated (using type( value)
syntax) when the variable type is either Variant or
Variant Array.
357
AppendixB
Public Shared ReadOnly Property Shared, read-only property that returns a space-
VariableTypeList() As String delimited list of all the variable types supported:
Boolean Byte Integer Long Single Double String
Variant Array Unknown Null.
Public ReadOnly Property VarType() Read-only property that returns the type contained
As ENUvarType inside the object instance type. For Variants, this is the
type of the Variant's value. For Arrays, it is the entry
type. For everything else, this property returns
ENUvarType. vtUnknown, with no other error indication.
VarType is the contained variable type. See also
VariableType.
Public Shared Function Shared method that adds the vt prefix to strName,
vtPrefixAdd(ByVal strName As unless it is present, and returns the result. IS
String) As String
Public Shared Function Shared method that removes the vt prefix from
vtPrefixRemove(ByVal strName As strName, when it is present, and returns the result.
string) As String
quickBasicEngine
The quickBasicEngine class does all scanning, parsing, and interpretation for this
version of QuickBasic. It may be dropped in to a .NET application, and it will
provide the ability to evaluate immediate Basic expressions, as well as compile
and run Basic programs.
References of quickBasicEngine include collectionUtilities.DU., qbOp.DU.,
qbPolish, qbScanner.DU., qbToken, qbToken'JYpe.DU., qbVariable, qbVariable'JYpe,
and utilities.DU..
359
AppendixB
NOTE No explicit parse tree is built, because this is unnecessary. Parse infor-
mation is available just-in-time through the parse Event, which is fired for
each distinct grammar symbol.
At any time, instances of this class are usable or nonusable, and one of the
following states:
• Ready to run: When the object is fully initialized, and after normal proce-
dures have terminated, it is Ready to run.
• Stopping: When the user has requested a stop, through the stopQBE
method, but threads are still running, the object is Stopping.
• Stopped: When the user has requested a stop, through the stopQBE
method, and no threads are still running, the object is Stopped.
The usability and running states, and the number of running threads, are
available through methods and properties.
At the end of a successful New constructor, the instance becomes usable and
Ready to run. Asuccessful or failed execution of dispose makes the object unusable.
A serious internal error, such as failure to create a resource, or using the object
after a serious error is reported makes the object not usable. The mkUnusable method
may also be used to force the object into the unusable state. The Usable property
tells the caller whether the object is usable.
360
quickBasicEngine Reference Manual
Usability makes run status moot because an unusable instance won't run. It
should be disposed, a new instance should be created, and the processing that
created the error should not be repeated.
The stopQBE method places the object in a Stopping state immediately, and it
puts the object in a Stopped state when all running threads have terminated. While
an unusable object cannot be made usable, a Stopped object can be restored to
active duty using the resumeQBE method
When the object instance is Stopped, the state of the engine becomes imme-
diately Stopping as an atomic operation. Then the following occurs:
• Any executing For or Do loop in the engine, which issues loop events, is
exited as soon as the loop event is issued.
• No Public procedure will execute, and all Public procedures will return
default values, until Stopped is set to False.
The resumeQBE method places the object in the Ready state if the object is
stopped. The resume method has no effect when the object is already Ready or is
in the Running or Stopping states.
The getThreadStatus method returns the run status as one of ready, running,
stopping, or stopped.
When the quickBasicEngine is Running, the runningThreads method returns
the number of threads that are running methods and properties as a number
between 1 and n. When quickBasicEngine is Stopped or Ready, runningThreads
returns o.
361
AppendixB
• The subroutine and function index must conform to its expected struc-
ture; see the source code for details.
• The constant expression index must conform to its expected structure; see
the source code for details.
Public Property AssemblyRemovesCodeO Read-write property that returns and may be set to True
As Boolean to remove remarks inserted by the compiler and label
statements inserted during assembly, or False to sup-
press this removal. By default, this removal occurs. Setting
this property to False does not change the effect of
QuickBasic code, only its efficiency.
Public Shared ReadOnly Property Shared read-only property that returns the class
ClassName name: quickBasicEngine.
Public Function clearO As Boolean Method that resets the engine to a start state by ensur-
ing all reference variables are cleared. You don't need to
execute it in the normal case, as long as you use dispose
to responsibly clean up the compiler.
362
quickBasicEngine Reference Manual
Public Shared Function clone As Method that implements ICloneable and returns a
quickBasicEngine clone of the instance object. The clone consists ofiden-
tical code (including comments and white space patterns)
and identical run mode options, including optimization,
but it may not be in the same state as the cloned object.
The clone is always n the initial, unexecuted state. When
passed to the compareTo method, the clone returns True.
Public Event codeRemoveEvent(ByVal Event that is triggered whenever code is removed from
objQBsender As qbQuickBasicEngine, the compiled set of qbPolish tokens: objQBsender iden-
ByVal intOpIndex As Integer) tifies the quickBasicEngine, and intOplndex identifies
the index of the operation removed.
Public Shared Function codeType Shared method that returns the type of strCode as a
(ByVal strCode As String) As String string. immediateCommand is returned when the code is
a valid expression. program is returned when the code
is a valid executable program. Otherwise, invalid is
returned.
Public Function compareTo(ByVal Method that compares the instance object to the
objQBE As quickBasicEngine.qbQuick quickBasicEngine identified in objQBE, and returns True
BasicEngine) As Boolean when objQBE clones the instance. objQBE clones the in-
stance when the source code of objQBE and that of the
instance are identical, except for white space, and all
global options such as the Constant Folding property are
the same. The compilation and assembly of the two ob-
jects and their storage values may differ. The compareTo
method implements the IComparable interface.
Public Function compileO As Boolean Method that compiles the source code to unassembled
interpretive code. This method won't proceed to assembly
on a successful compile, it will scan code that has not
been scanned already.
Public Function compiled() As Method that returns True if the current source code has
Boolean been compiled already; False otherwise.
363
AppendixB
16. This can speed up runtime. For example, in a+1+1 when Constant Folding is True, the sub-
expression 1+1 is evaluated by the compiler. This example is contrived (as in stupid) but
many code and business rule generators may yield such contrived, stupid examples.
364
quickBasicEngine Reference Manual
Public Overloads Shared Function Shared, "lightweight" method that evaluates the string
eval(ByVal strExpression As String) s as a single expression in QuickBasic notation or as a
As qbVariable.qbVariable series of statements (separated by colons), followed by
a colon and then a final expression. For example, s may
be a series of let assignment statements that set variable
values, followed by an expression. The value of the expres-
sion is returned as a qbVariable object. This method
creates a new quickBasicEngine with all of the default
values and default properties, and the evaluated string
is executed using default values and default properties.
See also evaluate.
Public Overloads Shared Function Shared, "lightweight" method that evaluates the string
eval(ByVal strExpression As String, s as a single expression in QuiclcBasic notation or as a
ByRef strlog As String) As series of statements (separated by colons), followed by
qbVariable.qbVariable a colon and then a final expression. This overload works
like the previous overload, but it places the evaluation
event log in the strlog string, passed by reference. See
the Event log property for details on the format of event
logs. See also evaluate.
Public Overloads Shared Function Shared, "lightweight" method that evaluates the string s
eval(ByVal strExpression As String, as a single expression in QuiclcBasic notation or as a
ByRef booError As Boolean) As String series of statements (separated by colons), followed by
a colon and then a final expression. The value of the ex-
pression is returned, as a string. The reference parameter
booError is set to True on any error (and a null string is
usually returned). booError is set to False on an error-
free evaluation. The eval method creates a new
qUickBasicEngine with all of the default values and
default properties, and the evaluated string is executed
using default values and default properties. See the
Eventlog property for details on the format of event
logs. See also evaluate.
365
AppendixB
Public ReadOnly Property Read-only property that returns the result of the most
EvaluationO As qbVariable.qbVariable recent evaluate method or Nothing when no such
result exists.
Public Function evaluationValue{) Method that returns the result of the most recent
As qbVariable.qbVariable evaluate method, or Nothing when no such result
exists.
366
quick8asicEngine Reference Manual
Public Overloads Shared Function Shared method that returns a list suitable for display in
eventLog2ErrorList(ByVal colEventLog a monospaced font (such as Courier New) of all compiler
As Collection) As String and interpreter error events in the object instance event
log passed as colEventLog. colEventLog must be in the
format described for the EventLog property.
Public Overloads Function Method that formats the event log in the object instance
eventLogFormat() As String in a way best viewed in a monospace font such as Courier
New, and returns the formatted log as a string.
Public Overloads Function Method that formats the event log in the object instance
eventLogFormat(ByVal intStartIndex in a way best viewed in a monospace font such as
As Integer) As String Courier New, and returns the formatted log as a string.
The returned log starts at intStartIndex.
Public Overloads Function Method formats the event log in the object instance in
eventLogFormat(ByVal intStartIndex a way best viewed in a monospace font such as Courier
As Integer, ByVal intCount As New, and returns the formatted log as a string. The
Integer) As String returned log starts at intStartIndex and contains at
most intCount entries.
Public Overloads Shared Function Shared method that formats the event log passed as
eventLogFormat(ByVal colEventLog colEventLog in a way best viewed in a monospace font
As Collection) As String such as Courier New, and returns the formatted log as
a string.
Public ReadOnly Property Read-write property that returns and may be set to True
EventLogging() As Boolean or False to control the generation of event logs.
367
AppendixB
Public Property InspectCompiler Read-write property that returns and may be set to True
Objects() As Boolean when objects created by the compiler need to be in-
spected when disposed. Its default is False. Set this
option to True when testing the compiler and modifi-
cations as a way to be sure that objects dont include
buggy code. Setting this option will slow the compiler
down. When this option is True, the following com-
piler object types will be inspected when they are
disposed: The scanner, each variable that is created
during compilation and interpretation (including its
type), and the quickBasicEngine.
Public Function interpretO As Object Method that interprets the compiled code (it will
scan, compile, and assemble the source code as
needed). This method will return an Object. If the
stack is empty at the end of interpretation, this method
returns True. If the stack contains one entry at the end
of interpretation, it returns that entry, which will be
a qbVariable. If the stack contains multiple entries at
the end of interpretation, it returns False. This method
does QuickBasic input and output by means of events.
See interpretInputEvent and interpret Print Event for
details.
17. For this reason, getThreadStatu5 should be used for entertainment purposes only; for exam-
ple, to display the nondeterministic status in a GUI.
368
quickBasicEngine Reference Manual
Public Event interpret Print Event Event that is triggered when a Print statement is exe-
(ByVal objQBsender As cuted. The event handler should usually display the
QuickBasicEngine, ByVal strOut string output string as-is, or the Print statement may be in
As String) use to return results to a business rules interface.
Public Event interpretTraceEvent Event that is triggered prior to each interpreter execu-
(ByVal objQBsender As tion of each Polish opcode. objQBsender identifies the
qbQuickBasicEngine, ByVal quickBasicEngine. intIndex is the index of the Polish
intIndex As Integer, ByVal opcode. objStack is the stack prior to executing the
objStack As Stack, ByVal opcode. colStorage is the variable collection prior to
colStorage As Collection) executing the opcode. The Shared stack2String method
is available for serializing the stack, and the Shared
storage2String method is available for serializing
variable storage.
Public Event loopEvent(ByVal Event that is triggered inside loops inside the
objQBsender As qbQuickBasicEngine, quickBasicEngine.objQBsenderidentifiesthe
ByVal strActivity As String, ByVal quickBasicEngine. strActivity identifies the loop
strEntity As String, ByVal intNumber activity. strEntity identifies the entity being processed.
As Integer, ByVal intCount As intNumber identifies the number of the current entity.
Integer, ByVal intLevel As Integer, intCount identifies the total number of entities.
ByVal strComment As String) intLevel identifies the nesting level starting at O.
strComment may provide additional information about
the loop.
Public Function mkUnusable Method that forces the object instance into the
As Boolean unusable state. It always returns True.
Public Event msgEvent(ByVal Event that is triggered by general messages inside the
objQBsender As qbQuickBasicEngine, quickBasicEngine.objQBsenderidentifiesthe
ByVal strMessage As String) quickBasicEngine. strMessage is the message.
369
AppendixB
18. At this writing, only a few Polish opcodes are trarIslatable to MSIL.
370
quickBasicEngine Reference Manual
Public Function resumeQBE() Method that puts the quickBasicEngine in the Ready
As Boolean state when it is in the Stopped state. If the object is in
any other state, resume has no effect and results in no
error. For best results, clear the quickBasicEngine after
resuming it.
Public Overloads Function rune) Method runs the immediate command or program in
As Boolean the quickBasicEngine. The code will be scanned, com-
piled, and! or assembled as needed.
Public Overloads Function run(ByVal Method that runs the immediate command or program
strRunType As String) As Boolean in the quickBasicEngine. The run type of immediateCommand
or program may be specified in strRunType.
Public Function runningThreads() Method that returns the number of threads that are
As Integer running procedures inside the quickBasicEngine as a
number between 0 and n. This method includes its own
thread. The value returned is nondeterministic, because
the status may change while it is executing. The value
will always be one or greater because runningThreads
includes its own thread. See also getThreadStatus.
Public Function scan() As Boolean Method that scans the source code.
Public Event scanEvent(ByVal Event that is triggered when the scanner has found the
objQBsender As qbQuickBasicEngine, next token. objQBsender identifies the quickBasicEngine.
ByVal objToken As qbToken.qbToken) objToken identifies the token.
Public Function scanned() As Boolean Method that returns True when the current source code
has been scanned; False otherwise.
371
AppendixB
Public Shared Function stack2String Shared method that formats a stack of qbVariables
(ByVal objStack As Stack) As String such as the stack returned by the interpretEvent. The
formatted stack is best viewed in a monospace font
such as Courier New.
Public Function stopQBE As Boolean Method that puts the quickBasicEngine in the Stopped
state when it is in the Ready or Running state. If the
object is in the Stopped state already, it has no effect
and results in no error. If the object is in the Running
state, (1) any executing For or Do loop is exited as soon
as the loop event is issued, (2) if the parser is running,
the compiler is exited when the next grammar category
is recognized, (3) if the interpreter is running, the inter-
preter is exited as soon as the interpreter's loop event is
issued.
Public Shared Function Shared method that formats a collection of qbVariables
Storage2String(ByVal col Storage such as is returned by the interpret Event as the inter-
As Collection) As String preter storage. The formatted storage is best viewed in
a monospace font such as Courier New.
Public Property Tag() As Object Read-write property that returns and can be set to user
data that needs to be associated with the
quickBasicEngine instance. It's a kind of post-it note.
The Tag can be a reference object. If so, the Tag object
is not destroyed when the object is destroyed.
Public Overloads Function test(ByRef Method that runs tests on the object. It returns True to
strReport As String) As Boolean indicate success or False to indicate failure. The
strReport reference parameter is set to a test report.
Public Overloads Function test Method that runs tests on the object. It returns True to
(ByRef strReport As String, indicate success or False to indicate failure. The strReport
ByVal booEventLog As Boolean) reference parameter is set to a test report. The booEventLog
As Boolean parameter may be specified as True to get an event log
inside the report.
372
quickBasicfngine Reference Manual
Public Event testProgressEvent(ByVal Event that is fired during the execution of the test
strOesc As String, ByVal strEntity method. It reports progress inside loops. strOesc
As String, ByVal intEntityNumber As describes the loop goal. strEntity describes the entity
Integer, ByVal intEntityCount As being processed in the loop. intEntityNumber is the
Integer) number of the entity. intEnti tyCount is the total number
of entities. To obtain this event, the quickBasicEngine
instance must be declared Wi thEvents, a handler for the
testEvent must be supplied, and the compile-time
symbol QBVARIABlETEST_NOTEST must be omitted or set
to False.
Public Event threadStatusChangeEvent Event that is raised when the number of threads nmning
(ByVal objQBsender As quickBasicEngine code changes or the quickBasicEngine
qbQuickBasicEngine) is stopped. objQBsender is the handle of the sender
quickBasicEngine.
Public Event userErrorEvent(ByVal Event that is triggered when there is an error in using
objQBsender As qbQuickBasicEngine, the procedures of this object, as opposed to an error in
ByVal strOescription As String, the QuickBasic source code. objQBsender identifies the
ByVal strHelp As String) quickBasicEngine. strOescription identifies the error
(and it may contain more than one line). strHelp
identifies additional help information.
373
AppendixB
END START
-Fragment of IBM 1401 assembler code
In my end is my beginning.
-T. S.Eliot
374
Index
Symbols assemble method, quickBasicEngine , 214
assemblers, 205-218
.NET See under N history, 205-212
09/11, system implications, 284-285
machine language and, 206-207
macro assemblers, 210
A quickBasicEngine, 212
for statement, C language, 280-281
abstractmachdnes, 98-101 lazy evaluation example, 277
abstract variant types, 136, 143 removing comments and labels, 217
access rule enforcement, 43 assertions tested by
action object, CreditEvaluation, 250 qbScannerTest.inspect: 123-:126 .
addFactor, IntegerCalc, 38 assignment statements, BasIC, QuiCkBasIC
addFactorRHS procedure, 175-179 compilerdefirrltion, 71-72
addition operations, subtraction associative operators, 178-179
regarded as, 38-39 asterisks
ADO (Active Data Objects), 16 in regular expressions, 33, 95
Aho, Alfred et. al., Compilers: Principles, qbVariable defaults, 146
Techniques and Thols, 49, 90, 106, auto manufacturer example, 243-244
203,285
Algol programming language, 2, 12,
283-284 B
Algorithm Design Manua~ the, by backslashes
Stephen S Skiena, 137, 170 escaping metacharacters, 33, 35
algorithms in regular expressions, 96-97
Alan Thring on , 205 backtracking problem, 75
hash algorithms, 208 Backus, John, programming language
recursive-descent algorithms, 172 pioneer, 2, 12
aliasing and the C language, 5 Backus-Naur Form. See BNF
alternation stroke operator, 55 bankruptcy rule, benign contradiction
in regular expressions, 96 illustration, 258
ambiguity banks, Ogden Nash on, 247
in regular expressions, 98,131 Basic programming language, 5
intersecting nonterminals, 75-76 alleged deficiencies, 51-52
ampersand character, QuickBasic, 112 compilers for, 8, 9
Anatomy of a Compiler, by John A.N. suitability ofVB.NET for, 10
Lee,3,12 need for formal definitions, 69
And operator, lazy vs. busy And, benign contradictions, 257, 258-261
275-278 binary files, .NET, 24
AndAlso operator, 48, 79, 276-277 bison tool, 81, 113, 181
anticipatory scans, qbScanner, 116, 120 See also yacc
Appleman, Dan, 130,131 . blanks between tokens, 115
APR contradictions, credit evaluation block structured languages, 2, 4-5
application,258,261 Blunden, Bill, Software Exorcism:
Arithmetic, .NET and QuickBasic A Handbook for Debugging and
differences, 185 Optimizing Legacy Code, 211
array variables, QuickBasic, 138 BNF (Backus-Naur Form)
convertibility, 140 analysis using the bnfAnalyzer tool,
justification for qbVariable storage, 52-67
163 avoiding looping, 175
mapping to .NET objects, 150-151 Basic language and, 52
qbVariable. toString/fromString, 146 capabilities, 54
qbVariableType serialization, 143
375
Index
377
Index
containment, QuickBasic data types, Date, C.]., What Not How: The Business
139,322,341 Rules Approach to Application
context ignored by regular expressions, Development, 247, 266
103 debugging
continuation lines, BNF, 56 C preprocessor example, 211
contradictory situations, credit linkage to source code and, 174-175,
evaluation application, 257-263 183
control structures and Thring- phone billing problem, 197
completeness, 99 regular expressions, 32, 105
convertibility, QuickBasic data types, stack code, 48
140-142 decorated notation, 85, 146
core object design approach, 121 default policy, credit evaluation
credit evaluation case study, 247-264 application, 252
credit evaluation application, 250-263 defaultValueO method, qbVariableType,
assessing an applicant's standing, 162
253-254 degenerate operations, 92
possible enhancements, 263-264 delegation, choice between inheritance
qbGUI view, 255, 257 and, 151-152
Show Basic code button, 254 device drivers, use with DLLs, 16
creditworthiness assessment, 248-249 differently abled programmers, 269
cross-platform operation and the CLR, digital switch usage example, 244-245
22-23 Dijkstra, Edsger W.
crs (Common Type Specification), 19,26 articles by, 12
curly braces in regular expressions, 95 on Cobol and Basic, 51
Customer Engineering Zone, on computer science as applied
quickBasicEngine, 198 math, 127
on GoTo statements, 2
on program evolution, 124
D programming "a radical novelty", 1
dashes in regular expressions, 97 Dilbert Factor, 245-246, 285
data, treating logic as , 246-247, 261, disjoint handles, 75-76, 81
263-264 dispatcher routine, quickBasicEngine,
data-driven processes, 270 232-234
data modeling dispose methods
See also object modeling exposing, 21-22
left to right scanning, 112 qbPolish object, 215
qbPolish class, 302 self-inspection and, 125
qbScanner object, 306 division operations, unsymmetrical
qbToken object, 316 operator problem, 75
qbVariable class, 321-322 DLLs (Dynamic linking Ubraries)
quickBasicEngine class, 360-361 code reuse and, 15
resources on, 170 DIl..Hell,17-19
data types Eleven Commandments of, 17
abuse of, CLR and, 23 documentation and language design, 283
QuickBasic variables, 134 .NET See under N
containment and convertibility, DotNetAssembly namespace, 235
139-142 due diligence, 245
serialization,142-148 Dvorak keyboards, network externality
data typing and acceptance, 268
debugging ease vs. efficiency, 272
empirical type determination, 146-147 E
importance of, 133
risks associated with new types, Easter egg, qbGUI, 190
273-274 efficiency and execution time formulae,
database tables compared to objects, 169 137
emulation, 219
entity multiplication in MIS programs,
271
378
Index
380
Index
data model and state, 306 'Hello world' program results, 193
event model, 128-130 lexical analyzer for, 91-132
'Hello world' program results, 193 parser and code generator, 171-202
object2XML method, 126-127 scanner implementation, 114-117
properties, methods and events, syntax, 51-82
308-315 syntax outline derived from
restoring nonstandard programs, 269 bnfAnalyzer, 80
scanner object model, 120-130 testing the complete compiler, 227-228
translating business rules to token types, 111-112
QuickBasic, 250, 252 QuickBasic compiler (Microsoft), 9
qbScannerTest tool QuickBasic language
inspect method, 122-126 abstract variable type model,
test method, 127-128 134-139
verify utility, 114 differences from Visual Basic, 196
qbToken object, qbScanner, 120-121 quickBasicEngine support for, 287
data model and state, 316 translating business rules to , 250, 254
properties and methods, 317-320 variable types, qbVariable data
qbVariable objects, 162-166 model, 321-322
data model, 321-322 variables, mapping to .NET objects,
evaluate method, quickBasicEngine 148-151
produces, 186 quickBasicEngine
fromString expression, 323-327 assemblers, 212
inspect method, 163 built-in functions, 294-296
objConstantValue parameter, 184 class standards and core
properties, methods and events, methodology, 298-299
330-339 CLR generation, 230-242
quickBasicEngine interpreter collection use, 209
method, 224 converting state to XML, 198-200
serialization and, 145-148 credit evaluation application use, 250
testing, 164-166 dynamic big picture, 189-190
valueSetO method, 164, 256 error taxonomy, 202
qbVariableTest.exe,164-166 full BNF listing for, 290-294
qbVariableType objects, 152-162 grammar symbols as Booleans, 174
E~varTypecompared,140-141 inspecting the state, 200
fromString expression, 342-343 keywords and system functions, 289
inspect method, 155-156 language reference, 287-296
integers as, 141-142 lexical syntax, 288
object2XML method, qbScanner, namespace imports, 235
153-155 object overview, 188
properties, methods and events, reference manual, 297-373
345-359 simple interface, 191
serialization and, 143-148 strongly typed storage, 226
shared methods, 161-162 testing GUI, 188
state, 152-153 threading capability, 232
stress testing, 160-161 utility DLLs, 297
testing, 152-162 viewing parsing and code generation,
types exposed by, 340-341 194-198
qbVariableType procedure, qbScanner, quickBasicEngine class, 363
121-122 properties, methods and events,
qbVariableTypeTester.exe,152-162 362-373
quality control, 84 quoted strings, 57
time-to market and, 130
QuickBasic compiler (author's)
analyzing the BNF, 77-81 R
architecture, 187-188 random variables, qbVariable
building the BNF, 68-76 fromString values as, 331
conceptual stages, 91-92
384
Index
387
Index
388
ASP Today
ASPToday is a unique solutions library for professional ASP Developers, giving
quick and convenient access to a constantly growing library of over 1000 practical
and relevant articles and case studies. We aim to publish a completely original
professionally written and reviewed article every working day of the year.
Consequently our resource is completely without parallel in the industry. Thousands
of web developers use and recommend this site for real solutions, keeping up to
date w~h new technologies, or simply increasing their knowledge.
Find it FAST!
Powerful full-text search engine so you can find exactly the solution you need.
Printer-friendly!
Print articles for a bound archive and quick desk reference.
Working Sample Code Solutions!
Many articles include complete downloadable sample code ready to adapt
for your own projects.
~ ASP. NET
1.x and 2.0 ~Security
~ ADO.NET and SQL ~ Site Design
~XML ~SiteAdmin
~ Web Services ~ SMTP and Mail
~ E-Commerce ~ Classic ASP and ADO
The above FREE two-month subscription offer is good for six months from original copyright date of book this ad appears in.
Each book will require a different promotional code to get this free offer- this code will determine the offer expiry date. Paid
subscribers to ASPToday will receive 50% off of selected Apress books with a paid 3-month or one-year subsCription.
Subscribers will also receive discount offers and promotional email from Apress unless their subscriber preferences indicate
they don't wish this. Offer lim~ed to one FREE two-month subscription offer per person.
JOIN THE APRESS FORUMS AND BE PART OF OUR COMMUNITY. You'll find discussions that cover topics
of interest to IT professionals, programmers, and enthusiasts just like you. If you post a query to one of our
forums, you can expect that some of the best minds in the business-especially Apress authors, who all write
with The Expert's Voice™-wili chime in to help you. Wrry not aim to become one of our most valuable partic-
ipants (MVPs) and win cool stuff? Here's a sampling of what you'll find:
JAVA SECURITY
We've come a long way from the old Oak tree. Lots of bad guys out there-the good guys need help.
Hang out and d'1SCUSS Java in v.ilatever flavor you choose: Discuss computer and nel'Mllk secuIity issues here. Just don't let
J2SE. J2EE. J2ME. Jakarta. and so on. anyone else know the answers!
HOW TO PARTICIPATE:
Go to the Apress Forums site at http://forums.apress.coml.
Click the New User link.