100% found this document useful (1 vote)
282 views

Vdoc - Pub - Build Your Own Net Language and Compiler

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
282 views

Vdoc - Pub - Build Your Own Net Language and Compiler

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 399

Build Your Own

.NET Language
and Compiler
EDWARD G. NILGES

APress Media, LLC


Build Your Own .NET Language and Compiler
© Edward G. Nilges 2004
Originally published by Apress, in 2004
Ali rights reserved. No part of this work may be reproduced or transmitted in any form or by any
means, electronic or mechanical, including photocopying, recording, or by any information storage
or retrieval system, without the prior written permission of the copyright owner and the publisher.
ISBN 978-1-59059-134-5 ISBN 978-1-4302-0698-9 (eBook)
DOI 10.1007/978-1-4302-0698-9

Trademarked names may appear in this book. Rather than use a trademark symbol with every
occurrence of a trademarked name, we use the names only in an editorial fashion and to the
benefit of the trademark owner, with no intention of infringement of the trademark.
Lead Editor: Dan Appleman
Technical Reviewer: William Steele
Editorial Board: Steve Anglin, Dan Appleman, Ewan Buckingham, Gary Cornell, Tony Davis,
John Franklin, Jason Gilmore, Chris Mills, Steven Rycroft, Dominic Shakeshaft, Jim Sumser,
Karen Watterson, Gavin Wray, John Zukowski
Assistant Publisher: Grace Wong
Project Manager: Beth Christrnas
Copy Manager: Nicole LeClerc
Copy Editor: Marilyn Smith
Production Manager: Kari Brooks
Production Editor: KellyWmquist

Compositor: Linda Weidemann, Wolf Creek Press


Proofreader: Elizabeth Berry
Indexer: Bill Johncocks
Artist: Kinetic Publishing
Cover Designer: Kurt Krames
Manufacturing Manager: Tom Debolski

Distributed to the book trade in the United States by Springer-Verlag New York, Inc., 175 Fifth
Avenue, New York, NY, 10010 and outside the United States by Springer-Verlag GmbH & Co. KG,
Tiergartenstr. 17,69112 Heidelberg, Germany.
In the United States: phone 1-800-SPRINGER, email orders@springer- ny. com, or visit
http:/ /www.springer-ny .com. Outside the United States: fax +49 6221 345229, email
orders@springer .de, or visit http: 1/www. springer. de.
For information on translations, please contact Apress directly at 2560 Ninth Street, Suite 219,
Berkeley, CA 94710. Phone 510-549-5930, fax 510-549-5939, email info@apress. com, or visit
http://www.apress.com.
The information in this book is distributed on an "as is" hasis, without warranty. Although every
precaution has been taken in the preparation of this work, neither the author(s) nor Apress shall
have any liability to any person or entity with respect to any loss or damage caused or alleged to
be caused directly or indirectly by the information contained in this work.

The source code for this book is available to readers at http:/ /www.apress.com in the Downloads
section. You will need to answer questions pertaining to this book in order to successfully
download the code.
Contents at a Glance
About the Author ................................................... ix
Acknowledgments .................................................... xi
Introduction ...................................................... xiii
Chapter 1 A Brief History of Compiler Technology ............ 1
Chapter 2 A Brief Introduction to the •NET Framework ...... 15
Chapter 3 A Compiler Flyover ..................................27
Chapter 4 The Syntax for the QuickBasic Compiler ........... 51
Chapter 5 The Lexical Analyzer for the
QuickBasic Compiler ................................. 91
Chapter 6 QuickBasic Object Modeling ........................ 133
Chapter 7 The Parser and Code Generator
for the QuickBasic Compiler ...................... 171
Chapter 8 Developing Assemblers and Interpreters ......... .205
Chapter 9 Code Generation to the
Common Language Runtime ...........................229
Chapter 10 Implementing Business Rules ......................243
Chapter 11 Language Design: Some Notes ......................267
Appendix A quickBasicEngine Language Manual .................287
Appendix B quickBasicEngine Reference Manual ................297
Index ..............................................................375

iii
Contents
About the Authors ................................................. ix
Acknowledgments .................................................... xi
Introduction ...................................................... xiii

Chapter 1 A Brief History of


Compiler Technology .............................. 1
The Mainstream: From Fortran to C ................................ 1
Basic Compilers ..................................................... 8
Summary ........................... , ................................. 11
Challenge Exercise ................................................ 12
Resources ........................................................... 12

Chapter 2 A Brief Introduction to the


•NET Framework ................................... 15
Code Reuse and DLLs ............................................... 15
DLL Heaven and DLL Hell .......................................... 17
.NET-Beyond DLLs ................................................... 19
Inside a .NET Binary ..............................................24
Summary .............................................................25
Challenge Exercise ................................................25
Resources ...........................................................26

Chapter 3 A Compiler Flyover .............................. 27


The Phases of a Compiler .........................................27
Three Theories, and Keeping It Real .............................28
Lexical Analysis with Regular Expressions ......................29
Parsing and BNF ....................................................35
Interpreters and RPN ..............................................44
Summary ............................................................. 48
Challenge Exercise ................................................ 48
Resources ........................................................... 49

v
Contents

Chapter 4 The Syntax for the


QuickBasic Compiler ............................. 51
A Tool for Analyzing BNF .........................................52
Analyzing and Coding BNF ......................................... 54
Building the BNF for Our QuickBasic ............................. 68
Analyzing the BNF of Our QuickBasic ............................. 77
Eight Guidelines for Effective BNF .............................. 81
bnfAnalyzer Technical Notes ...................................... 82
Summary ............................................................ .90
Challenge Exercise ............................................... .90
Resources .......................................................... .90

Chapter 5 The Lexical Analyzer for the


QuickBasic Compiler ............................. 91
The Compiler Big Picture ......................................... 91
Lexical Analysis Theory ........................................... 92
A Regular Expression Laboratory ................................ .101
The qbScanner Object ............................................ .109
Scanner Object Design Considerations .......................... .120
Summary ........................................................... .130
Challenge Exercise ............................................... 131
Resources .......................................................... 131

Chapter 6 QuickBasic Object Modeling .................. 133


The Abstract QuickBasic Variable Type Model ................... 134
QuickBasic Variables Mapped to .NET Objects ................... 148
The qbVariableType Object ....................................... 152
The qbVariable Object ............................................ 162
Under the CLR Lies the Beach! ................................... 167
Summary ............................................................ 168
Challenge Exercise ............................................... 169
Resources ......................................................... .170

Chapter 7 The Parser and Code Generator


for the QuickBasic Compiler ................. 171
The Recursive-Descent Algorithms ................................ 172
The qbPolish Object .............................................. 181
Code Optimization ................................................. 183

vi
Contents

The Architecture of the Compiler ............................... 187


The Dynamic Big Picture .......................................... 189
Error Taxonomy ................................................... .202
Summary ........................................................... .202
Challenge Exercise .............................................. .203
Resources ......................................................... .203

Chapter 8 Developing Assemblers


and Interpreters ................................ 205
Assemblers ........................................................ .205
Interpreters ......................................................218
Summary ............................................................228

Chapter 9 Code Generation to the


Common Language Runtime ...................... 229
ClR Generation in the quickBasicEngine ........................ .230
Towards a Complete Object Code Generator ......................242
Challenge Exercise ...............................................242
Summary ............................................................242

Chapter 10 Implementing Business Rules ................. 243


Business Rule Solutions ..........................................243
Logic As Data .....................................................246
Case Study: Credit Granting .....................................247
Summary ............................................................264
Challenge Exercise ...............................................265
Resources ..........................................................265

Chapter 11 Language Design: Some Notes ................. 267


Determining Your Goals ...........................................267
Deciding on the Semantics of Your language ................... .270
Deciding on the Syntax of Your language .......................279
Documenting Your language .......................................283
Summary ............................................................283
Challenge Exercise ...............................................284
Conclusion .........................................................284

vii
Contents

Appendix A quickBasicEngine Language Manual .......... 287


Lexical Syntax ....................................................288
Keywords and System Functions ...................................289
Parser Syntax (Backus-Naur Form) .............................. .290
Built-In Functions ...............................................294

Appendix B quickBasicEngine Reference Manual ......... 297


Class Standards ...................................................298
qbOp .............................................................. .300
qbPolish .......................................................... .302
qbScanner ......................................................... .305
qbToken ............................................................316
qbTokenType ...................................................... .320
qbVariable .........................................................321
qbVariableType .................................................... 339
quickBasicEngine ..................................................359

Index ............................................................. 375

viii
About the Author
Edward G. Nilges has programmed since 1970, when he learned machine language
for an 8KB IBM 1401 as part of an elaborate draft-dodging scheme that appears to
have gotten out of hand.
Early on, Edward discovered the power of languages and their translation in
"ordinary" management information systems (MIS) applications. He consoli-
dated several applications into one by creating a specifications language within
the 1401 's constraints, and he also provided his university with a working Fortran
compiler. Some of his early adventures are relevant to today's challenges, and
this book contains some of Edward's unexpurgated war stories.
Edward has developed millions of lines of code for MIS, telecommunications,
naval architecture, and education applications. He has developed several compil-
ers, including internal compilers for telecommunications applications at Nortel,
the QuickBasic compiler of this book, and a compiler for the Mouse language that
fits in lKB of storage. He has taught at Roosevelt University in Chicago and DeVry
University, and delivered training classes at Princeton University.
At Princeton, Edward was honored to assist the real-life protagonist of the
recent film A Beautiful Mind, John Nash, with a bug in the old Microsoft C com-
piler. Edward was also privileged to meet Cornel West, the noted American
philosopher, and Ralph Nader. He took classes in philosophy and computer sci-
ence, and gained access to Firestone Library (and has since paid fines accrued).
Currently, Edward is working in China on methodologies for transferring
client! server applications to the Web, while also studying written and spoken
Chinese.
Edward has two grown children and a former wife who he honors as the
mother of those children. Indeed, he calls himself Edward G. Nilges to disambig-
uate himself from his eldest son, Edward A Nilges, who is studying philosophy at
the University of Illinois and has contributed errata to Bjarne Stroustrup's book,
The c++ Programming Language. His other son, Peter "Chauncey" "Zeit-Bug"
Nilges, recently graduated cum laude from DePaul.
Edward has published material on computer and general topics since 1976,
when he suggested in Computerworld that it was possible to write structured code
in assembly language, and got yelled at by Ed Yourdon. Recent articles include util-
ities for string conversion and display in Visual Studio, and a critical assessment of
the language we use in speaking about database theory, published in the Austrian
journal Labyrinths.
Current interests include .NET, art, running, reading, China, and world
philosophy.

ix
Acknowledgments
THIS BOOK IS DUE to a suggestion of David Treadwell of Princeton and Microsoft,
because he suggested a list of potential languages for .NET implementation,
including QuickBasic. Initial impetus was provided by Josef Finsel, author of The
Handbook for Reluctant Database Administrators, and I am in Mr. Finsel's debt
for this reason.
Dan Appleman's support and patience during the excitements of its develop-
ment is most appreciated, as is that of Marilyn Smith, Beth Christmas, Grace Wong,
Nicole LeClerc, Karl Brooks, Bill Johncocks, Kurt Krames, and KellyWmquist, as well
as the accounting team at Apress.
Dan Appleman in fact volunteered his valuable time for a developmental
edit and put up with some of my deeper nonsense with a great deal of patience.
I need to thank the gang at the Evanston YMCA, as well as the operators of
various executive stay places around the world for providing working space at
various times, for my day jobs have taken me to the far corners of the world.
The Silicon Valley "out-to-Iunch bunch," including Rick, Ragu, William, Bill,
and Jason, are also owed a debt of gratitude for their assistance, including Rick's
wireless card, Ragu's thoughtfulness, William's unfailing kindness and Jason's lap-
top, which is toast, I'm afraid.
Helmut Epp of DePaul University, Max Plager of Roosevelt University, the
late E. D. Klemke of Roosevelt University and Iowa State University, and Gilbert
Harman of Princeton University are all academics from whom I have learned
a prior dedication to the truth of the matter.
Long-suffering managers to whom this book is dedicated include Rita Saltz,
Robert Geiger (a Visual Basic authority in his own right from whom I learned
much), Jeff Burtenshaw, and Monsieur Hugh Levaux.
Tim '!yler was and remains a source of spiritual guidance before, during, and
after the writing of this book.
Lee The at Fawcette assisted with an earlier release of part of the software in
an article on Visual Studio and was a most patient and learned editor.
My strange friend Alex, "Sasha Alexandrovich" Gaydasch, is also owed a debt
of gratitude for his support and advice over a period of many years.
Of course, we all owe Edsger Wybe Dijkstra a debt for showing how integrity
goes a long way.
But the main dedication of this effort is to Darlene Nilges, Eddie Nilges, and
Peter Nilges (junglee Peter), for in dreams begin responsibilities.

xi
Introduction
I mean, if 10 years from now, when you are doing something quick and dirty,
you suddenly visualize that I am looking over your shoulder and say to your-
self, "Dijkstra would not have liked this," well, that would be enough
immortality for me.
-Edsger Wybe Dijkstra

Let us not speak falsely now, the hour is much too late.
-Bob Dylan, '!till Along the Watchtower"

DIJI(STRA DIDN'T PLAY for the Chicago Cubs baseball club, to my knowledge, nor
did he play for the Arsenal, Chelsea, Antwerp or Eindhoven football organizations
(nor did Dylan, but you knew that). Instead, Edsger Wybe Dijkstra was a found-
ing computer scientist who was involved in the early Algol language and either
invented or reported the invention of structured programming.
And since Dijkstra passed on in August 2002, he is rolling in his grave. Here
is a book on how to write a halfway decent compiler, using object-oriented tech-
niques (about which Dijkstra was skeptical) to compile Basic (which he felt was
a mental mutilation) that has the gall, the side, the cheek ... to quote the guy!
That is because Dijkstra was also one of the few computing scientists to keep
steadfast in his mind the true proposition that computing science is applied sci-
ence, and that is because Dijkstra refused to divorce theory and practice.
Furthermore, I cannot believe that Dijkstra would dislike the desire to know
how compilers work. I have set myself the task of communicating this, at a basic
level, to a wide audience of "ordinary" (ordinary?) programmers.
These are the numerous hard-working programmers who have written code,
probably, for Visual Basic and C++ COM and now are working, probably, in C#
and in Visual Basic .NET. I would like to show that a responsible compiler can be
written in Visual Basic. I would like to provide the complete, runnable, and mod-
ifiable, source code at the Apress Web site. I would also like world peace and
harmony, but I digress.
Why not C#? I didn't choose C# because of a simple theory of mine. All, or
nearly all, C# programmers know Visual Basic, but not all Visual Basic program-
mers have made the transition to C#. And despite the flash and glamour of C#,
there is nothing doable in C# that cannot be done in Visual Basic.
My goal is to demystify and to deconstruct a skill set that can be of actual
use in .NET and Java. Write-once, run anywhere is a goal that entails a need for
more compilers, and more generally, greater portability and ease of modifica-
tion, not only of code, but also of business rules, stored as data.

xiii
Introduction

To set the scene, Chapter 1 provides an overview of the history of compiler


technology, and Chapter 2 describes the .NET background.
Chapter 3 is a fun and exciting chapter, if I do say so myself. That chapter
builds a simple "flyover" compiler to demonstrate the key concepts in a lightweight
form. This will prepare you for subsequent chapters, which build a compiler of
more than 10,000 lines of code for a significant part of the old QuickBASIC lan-
guage, a Microsoft forerunner of Visual Basic.
In Chapter 4, you willieam about the indispensable formal notation Backus-
Naur Form for designing the syntax of a language in the context of the bnfAnalyzer
tool, for which, as is the case with all software in this book, source code is provided.
Then, in Chapter 5, you'llieam the basic "lexical" level of parsing, in which
we recognize the elements of the language.
Chapter 6 shows a complete, object-oriented approach to storing variables
and their types in the Basic language. This is extensible to other languages. In
fact, it uses parsing to design an internal language for "serializing" data types
and values, thereby showing one way in which this information can be stored.
Chapter 7 is the high point, for in it, I will show you how to build the actual
parsing front end of the compiler, where the source code is translated to an inter-
preted language.
Chapter 8 shows how any computer (given enough time and memory, of
course) can simulate any other computer, based on an important early result of
hero computer scientist Alan Thring. You'll see how we can use this result to test
compiled Basic code while visually displaying, step by step, its execution.
Chapter 9 introduces techniques for taking this process one step further, and
translating compiled code to Microsoft Intermediate Language (MSIL) for execu-
tion in the Common Language Runtime (CLR). This translation makes the formerly
slow, interpreted code run much faster.
One concern of mine is how compilation techniques can make life easier for
the end users who are charged with keeping business rules up-to-date in credit,
banking, law, and other applications. Compiler technology can be used to repre-
sent these rules as ASCII data suitable for storage in a database. Chapter 9 shows
how this could be accomplished for a small loan company, Loans for the Honest
Poor, which uses flexible rules to find good risks among "ordinary" (ordinary?)
working people.
Chapter lO winds up by discussing some of the issues that can arise in design-
ing a language, including political and business issues.
Appendix A provides a comprehensive and geeky reference manual for the
language of the representative QuickBasic compiler, and Appendix B provides
another reference manual for the objects used to build the compiler.
The comprehensive source for all software and tools is available to purchasers
of this book at the Downloads area of the Apress Web site (http://'IIiM •apress. com),
along with Apress forums in which you can meet authors, ask questions, and,
within reason, schmooze and socialize. You will need Visual Basic 2003 Professional
or Enterprise to run all of the code.

xiv
Introduction

I am also available at [email protected] should you have any questions.


Thanks for buying my book, or, at least, taking it down from the shelf at
Borders, Barnes and Noble, Page One (at Festival Walk in Kowloon Tong), or wher-
ever. If your bookstore has a cafe, grab a latte, sit down, take a load off fanny, take
a load for free, and peruse. I hope you will get excited at the prospect of actually
being able to walk through the basics of compiler design theory while running
practical examples on your .NET system.
That's because, if you are like me, and indeed like the mighty Dijkstra, you
don't divorce theory and practice. Dijkstra's pronouncements have a common
theme, and they are that he never thought himself above getting down into the
actual code-whether program code or the equally demanding codes of formal,
if applied, mathematics.
Programmers get irritated, on the one hand, by academics who rattle on
about arcane proofs and theorems with no apparent connection to the real world
of schedules and deadlines (as well as sick children and cars that don't start), and
on the other hand, by managers who sketch out vast ambitions that must then be
wearily constructed in reality, by programmers with little say in the end product
(for let us not speak falsely now, the hour is much too late).
But in honor of the late, hero computer scientist Dijkstra, and indeed for the
dear old schoolhouses (in my case, St. Viator High School, Roosevelt University,
DePaul University, and Princeton as a sort of midlife crisis, for both me and dear
old Nassau), I will now attempt to steer a course between the rocks of academia
and management-o-rama, and sail true toward your mastery of something new
that will make you feel like a devil of a fellow or a hell of a gal, at a minimum. For,
the boss, in today's Big Chill, groans at us on the job to only learn the essentials,
and ultimately, this is stunting. Let's learn something almost (but not quite) for
its own sake.
Let's dweeb out.

xv
CHAPTER 1

A Brief History of
Compiler Technology
I would therefore like to posit that computing's central challenge,
viz. 'how not to make a mess of it,' has not been met.
-Edsger Dijkstra

THE LATE, HERO COMPUTER SCIENTIST, Edsger Dijkstra, was rather confident. He
seemed to know that computing's central challenge is not messing up. Some pro-
grammers and their managers might contend that the main challenge is achieving
user satisfaction.
In either case, writing parsers and compilers will be a challenge!
You can learn much about compilers from their history. Therefore, this chap-
ter describes the mainstream history of compilers and follows up with a look at
the sidestream history of Basic compilers.

The Mainstream: From Fortran to C


In the 1950s, the earliest programmers prided themselves on doing their work
without any assistance, and their work was tedious-in the extreme.
Of course, older people always like to say that they had to walk to work after
school in the snow uphill both ways to support their aging mother. But it does
appear that below the level of people like John von Neumann or J. Presper Eckert
(two early inventors of modem digital computation), the actual work of creating
programs was simultaneously invisible, brutally tedious, and sexist. l However, the
big shots soon found that they and the programmers were up against what hero
computer scientist Edsger Dijkstra shortly thereafter called a "radical novelty."
The difference between writing a proof on a blackboard and writing a pro-
gram is that the mathematician is able to appeal to the shared understanding of
a rather close-knit mathematical community, but the programmer must account

1. In the 1940s and 1950s, the presumption on the part of the big shots was that some "girl"
could prepare programs for their hardware and perhaps find an up-and-COming graduate
student to wed. (I'm not making any of this up.)

1
Chapter 1

for everything. If something goes wrong with his or her program, a programmer
is expected to fix it quickly and accurately.
John von Neumann thought that any use of the computer to assist in pro-
gramming was a waste of a valuable resource by lazy programmers. But one
early programmer, Grace M. Hopper, 2 discovered that the computer itself could
be an aid in preparing bug-free programs. Her work led to the first two major
computer languages: Fortran and Algol.

Fortran, Algol, and Beyond


At first, the programming community resisted the use of computers for program
development, perhaps for the same hard-wired reasons some guys don't ask for
directions when they're lost. However, the benefits of Hopper's "autocoding" were
clear enough. In 1954, IBM3 had a team of mathematician/programmers develop
one of the first compilers and languages: Fortran.
This team was led by IBM Fellow John Backus and did a terrific job with very
limited hardware. A bit later, a European/American group, which included John
Backus, developed a more ambitious language: Algol.
Fortran supported a style of programming in which a program is conceived
as a series of instructions to which control flows by means of the infamous goto
command. Algol, on the other hand, introduced the notion of block structure, in
which the programmer could group lists of statements, effectively creating one
instruction out of a list of instructions. In this structure, control flows much more
readably into and around the blocks. 4
In general, the Fortran compiler writers and the actual Fortran programmers,
who wrote code for science and engineering, delivered useful results faster. The
Algol compiler writers, and eventually the Algol programmers, delivered more
useful and more accurate results, but more slowly. This, as many programmers
know, can be a serious problem in the real world.
It was easier to write Fortran compilers that generated efficient code, and the
Algol team ran into problems in compiling efficient programs. Programmers of the
1950s were defensive about their skills in translating mathematical and business
requirements into assembler and machine languages, and their requirement was
that a useful compiler generate faster code than a skilled programmer could

2. At the time Grace M. Hopper began exploring the use of the computer for programming (in
the late 1940s), she was a lieutenant in the United States Navy. She later became an admiral
in recognition of her accomplishments.
3. At the time, IBM was competing with Univac, now Unisys, for dominance of the computer
industry.
4. Edsger Dijkstra was an early Algol programmer. He noticed that Algol programmers could use
block structure to avoid goto statements, and he wrote a famous letter to the editor of a com-
puter journal, "Go To Considered Harmful."

2
A BriefHistory of Compiler Technology

produce. Fortran met this requirement, but Algol did not until about 1960, when
the Burroughs Corporation provided a machine whose hardware was able to run
Algol efficiently.

An Early Fortran Adventure


In 1970, I was a long-haired kid in a computer science class at Roosevelt
University in Chicago, taught by the great Max Plager, who is still teaching
math at Roosevelt. At first, we programmed in basic machine language, and
I wrote my first program in Northwestern University's new library. It worked
fine the first time, except for the fact that it loaded itself on top of Max's
loader, causing the printer to go berserk for reasons too horrible to detail.
Max had us then write in assembler, and thus re-created the entire history of
software, as we realized the key concept that we could use the computer, as did
Grace Hopper, to enhance our productivity. Through Max, we learned the "DNK'
of computer science, not "computer literacy" (whatever that is).
However, Max was stymied by the fact that the Fortran-II compiler in our small
university data center did not work and could not be fixed, because IBM had
abandoned support. He had us use an IBM 1620 at a service bureau for Fortran
programming because of this problem.
Meanwhile, most of the faculty and students at the university went on strike
when Nixon invaded Cambodia and four students were shot at Kent State by
National Guardsmen. Max allowed those of us who joined the strike to submit
programs in lieu of attending his class, but I received only a B in the class, in
these days prior to grade inflation, since I did not do all the work.
Nonetheless, I was enthusiastic about the way in which Max taught us com-
puter science. Shortly thereafter, I got a job in the university computer center
and, after hours, I decided to see what was wrong with the Fortran-II compiler.
The compiler existed as a deck of 2,000 punched cards, and the operator inserted
the punched cards for the source code after an initial loader phase. Then, within
only 8,000 seven-bit (don't ask) bytes, 99 "phases" of the compiler would run to
gradually parse, and change, the source, creating an intermediate language,
a distant ancestor of .NET's Common Language Runtime (CLR).
In my analysis of the compiler's failure, I used John A. N. Lee's 1969 book, The
Anatomy ofa Compiler, as a guide.
One night, I triumphantly discovered the problem. The compiler had been altered
by an IBM customer engineer who had been retained on an hourly basis, since
IBM software support had been ended for this older platform. Thinking that the
university's mainframe did not include optional hardware for multiplication
and division, the customer engineer had inserted a subroutine to do multiplica-
tion and division. However, the compiler was running on a minimum amount
of memory, and this subroutine overlaid needed instructions.

3
Chapter 1

In this era, IBM would agree only to have a customer engineer show up in
a white shirt and tie, and make a "best effort" to solve the problem. 5 In this
case, the customer engineer did a credible job without seeing the real prob-
lem and without knowing the configuration of the machine. This was, in fact,
a best effort.
Max had shown us that the machine included the optional hardware, and
I simply removed the subroutine (working completely in machine language)
and replaced its call by instructions to multiply and divide. The machine then
compiled and ran several programs through to completion.
This ranks with my discovery of Visual Basic 3 as a true Eureka moment. Max
nearly fell out of his chair when, at the next meeting of the university computer
committee, I announced the fix. H. Chang Shih of the Physics department
bought me a drink at Jimmy Wong's watering hole, which was then across the
street from Roosevelt.
We used the Fortran compiler continually for teaching and support. Although it
did not generate very fast code, it was a great way to solve problems quickly.
For example, Fortran-II had a very complete Print statement with a format fea-
ture that supported both multiple lines and replacement of control sequences
by data. This made it easy to elegantly format reports. In contrast, coding
reports in assembler was very tedious.

Fortran and Algol were followed by a plethora of compilers, both famous and
infamous, and initially in the tradition of Fortran. For example, early Cobol pro-
grams, like early Fortran programs, were primarily single main procedures with
goto commands for flow control.
Cobol raised and then dashed some management expectations. An Air
Force general was overheard to say, "Now that we have Cobol, can we get rid of
all those beatnik programmers?" (In the early 1960s, "beatnik" meant "slacker.")
But managers have continued to depend on programmers throughout the beat,
hippie, Generation X, and slacker eras.
IBM introduced an ambitious programming language, Programming
Language One (PLlI) in 1964, when it introduced its System/360 line of main-
frames. This language owed much to Algol because it was fully block structured.
However, PLlI's scale and scope exceeded the capabilities of its designers and
compiler writers, and it wasn't until 1974 that truly useful compilers became
available for PLII.

5. This "best effort" approach has been replaced by the vow to solve the problem no matter
what. The benefit is that, perhaps, more problems are solved. The downside is that many
"solutions" are hacks.

4
A BriefHistory of Compiler Technology

Reacting perhaps to the overly.ambitious goals of PLlI, a smaIl group of pro-


grammers, centered around Bell Laboratories and Princeton University, and
including Brian Kernighan and Dennis Ritchie, designed the C language in 1971.
C was terribly important, but it was also rather dangerous.
C used Algol-like block structure but gave the programmer an assembler lan-
guage level of control over the machine. For example, the machine address of
a variable could be accessed using the same facilities as could be used for ordi-
nary arithmetic. This feature was known as aliasing.
Because of aliasing, C was rather dangerous. Efficient but delicate and hard-
to-maintain software could exploit machine peculiarities in undocumented
ways. Indeed, C programmers (like the assembler language programmers who
resisted Fortran in the 1950s) to this day speak of the added efficiencies they are
able to derive by staying "close to the machine." (My suspicion is that close to
the machine is warm and toasty in the winter, and nice and air-conditioned in
the summer.)
Today, C is ever so slightly out-of-date, but it won't be completely out-of-date
until another language manages to dislodge it. C predominates in system pro-
gramming along with C++. However, C doesn't "do" objects, and for this reason,
it doesn't provide the benefits of closer and more logical association of code
and data. Its lack of safety means it's a dangerous choice for mission-critical
and important management information systems (MIS) projects.6
C was the basis of many subsequent languages, including Java, Perl, and
JavaScript, because of its very clean syntax. However, Algol and Algol's block
structure were the real intellectual innovation, not C. Despite the fact that C is
regarded with some awe, Brian Kernighan and Dennis Ritchie simply wanted
a tool. For this reason, they based much of C on Algol and PLlI, while simplify-
ing the design.

The Origins of Basic


Basic was invented in 1964. By the 1960s, compilers for high-level languages were
in common use, although many programmers still preferred assembler, both for
its speed and for its cachet (pardon my French). It was in 1964 that mathematics
professor John Kemeny of Dartmouth developed the language and the compiler
for what was caIled Beginner's All-Purpose Symbolic Instruction Code, which was
similar to Fortran but intended for a much wider audience.

6. Of course, many C programmers have a variety of personal standards that prevent the choice
of C from being as dangerous as it could be. The problem is that, in general, they don't have
the organizational clout to enforce these standards over the complete system life cycle.

5
Chapter 1

Both the Fortran and the Algol teams ~ted to write compilers that would
generate highly efficient and optimized code. Their motto was "you compile
once, but you run many times." But Kemeny, and a separate group at Purdue
University, noticed that this is not true for student programs, which compile
many times and bomb out often.
Kemeny reasoned that a compiler for nonprofessionals should be fast and
should accurately link runtime errors to source code. His compiler, and the
Purdue University Fast Fortran Translator (PUFFT) system, used an "interpreter"
language and an interpreter to change instructions on the fly to actual machine
code at runtime.
Interpreters, as code that converts special codes to actual machine instruc-
tions every time the object code is run, are slower, by definition, than pure
object code. But because the special codes can be directly tied to source lines,
error reporting in interpreters can be highly accurate and understandable.

lust-In-Time Compilation
Just-in-TIme, or JIT, compilation is sometimes confused with the older technique
of interpretation in which the source program was converted to an intermediate
form and then "executed" by the interpreter. The interpreter needed to translate
the intermediate form each time an instruction was executed.
The similarity is that processing of the "object" code representation occurs
after compilation. The difference is that JIT compilation, unlike interpretation,
generates actual machine code for reuse. Interpreters produce machine code
for immediate consumption each time an interpreted instruction is executed,
and this magnifies the effect of any preexisting loops in the interpreted code.
No such magnifying effect appears in JIT compilation.
The conventional wisdom is that interpretation is slow. Therefore, the JIT com-
pilation used with .NET and Java creates a perception that such environments
might run code more slowly.
However, the use of object-oriented programming (OOP), as you will see in this
book, means that the "data" (the source code) and the procedures (the JIT com-
piler routines) are close together, and this avoids unnecessary sequential passes
over large amounts of code. The result is that JIT compilation to .NET's CLR cre-
ates code that is often somewhat slower than raw, native COM applications, but
those applications are not as flexible as .NET applications.
Earlier interpreters, because of the cost of storage, had to "pass over" source
code and translate it entirely to the interpreted special codes. Then the inter-
preter needed to modify each special code into machine language repeatedly
throughout the execution of the program, magnifying the effects of loops.

6
A BriefHistory of Compiler Technology

Suppose instead that the interpreter could save its work in the form of the com-
pilation of special code to object code, on the fly. In a procedural language, this
opens a can of worms, in which tables must be efficiently constructed and
searched. OOP provides a straightforward association of code and working data.
OOP's tighter linkage of the compiler's instructions with its data (the source and
interpreted code) means that both compilation and interpretation can be per-
formed incrementally, or "just in time," and the output of the compilation in the
form of binary machine code can be saved with specific instances of executable
objects. This is because the data is, by definition, inside the object instance. For
this reason, there is far less overhead in finding the data.
For example, the compiler that I present in this book stores all the information
about a variable in an object. The procedure responsible for accessing a variable
no longer confronts, on entry, a huge table of variables-by definition, more
than it wants to know. This procedure does not need to search a table because it
is presented with one handle to all the information about the variable, including
its name, value, and type.
Of course, the compiler does have a table of the variables. However, other proce-
dures are responsible for obtaining the variable using its name. It's true that the
basics are the same/ but overall, in a well-designed OOP solution, there tends to
be less rummaging around, because once the object is found, a rich set of data
is linked to it.

Lexx and Yacc

In the late 1960s, a number of computer scientists noted that, because computer
languages had to be strict and formally specified (unlike normal human languages),
not only was the process of writing a specific compiler itself the development of an
algorithm, it, in turn, could be algorithmically specified in a compiler generator.
However, this idea strained the capacities of mainframes and the abilities of earlier
programmers. Instead, two spin-offs from the overall effort became widely used.
These were the lexx and yacc programs of the Unix operating system. lexx accepts
a definition of the low-level syntax of the language, and yacc accepts a definition of
its high-level syntax. Together, they generate C or C++ code to parse the language.
The lexx and yacc programs dominate good compiler design practice today.
However, effective use of lexx and yacc requires knowledge of compiler internals,
since these are "white box" tools. 8

7. As you will see in Chapter 8, the Collection object provides a convenient hash-based search
to map variable names to variable objects.
8. White box tools assume that users know basically what the tools are doing on the users'
behalf.

7
Chapter 1

Today, the Java and .NET "compile once, run anywhere" credo has created
some innovation in compiler development because this portability creates
demand for compilers. In addition, the increasing use of object-oriented devel-
opment and programming has produced compilers of higher quality, since the
tighter coupling of data and software means that the compiler developer no
longer needs to build enormous tables for the entire source.

Basic Compilers
As I've mentioned, the Basic language was invented by a group at Dartmouth
University in the 1960s. It initially targeted General Electric time-sharing machines,
but a few years later, programmers of Digital Equipment Corporation (DEC)
hardware (the progressively more powerful systems DEC PDP-8, DEC 10, and
DEC 20) developed a number of time-shared Basic compilers.

Early Basic

In the early 1970s, in Menlo Park, California, a group of visionaries provided


storefront access to inexpensive DEC computers in the form of The People's
Computer Company, and they found Basic to be of broadest appeal. This might
be why Bill Gates and Paul Allen chose Basic as the first high-level language for
the first microcomputer, the MITS Altair.
Early Basic compilers were "hacks," because of the small amount of memory
available to Basic compiler writers. Most followed the lead of the PUFFT and
IBM Fortran-II compiler, because it saves memory to genemte packed code that
is executed by a software interpreter. As noted earlier, using an interpreter takes
more time than running pure object code.
However, in the 1970s, compiler runtime developers discovered a technique
for saving some time inside a commercial interpreter known as threaded code.
With threaded code, each action of the interpreter takes responsibility for
branching to the next action indirectly through a register. 9 This technique was
invented for the Forth language developed by Calvin Moore and was adopted
by some Basic developers.
Another issue in early Basic was the representation of the source code in
memory for large programs. As you know, code contains characters that the
compiler and computer do not need, including programmer comments and
white space (blanks, tabs, and other nonprintable characters). While such excess

9. I encountered an elegant implementation of threaded code in a version of the SLlI com-


pilers target machine. in use at Nortel Networks to provide a flexible range of private branch
exchange systems. My own SL/I compiler for a 24-bit (sic!) machine compiled to this envi-
ronment.

8
A BriefHistory of Compiler Technology

characters were not a problem in toy and demo programs Oike Print 'Hello world'),
they made large programs for user solutions impossible to develop because the
Basic compiler was not able to read the entire source.
As you crazed coders out there can imagine, there were workarounds for
handling large programs, including using a disk to save part of the source code.
But consider that waiting for even a modem form of virtual memory to catch up
can be very irritating. Another approach was to reduce the usable symbols of
the Basic source code to the smallest possible code. For example, if there were
only 256 different identifiers-such as PAYRATE, GROSS, NET, and so on-in
a Basic program, you could save a lot of space in the interpreted code by replac-
ing the identifier with its position in a 256-position list. This index would take
only 1 byte.

Basic in Two Kilobytes


One of the most brilliant examples of space saving was authored by Clive Sinclair
(now Sir Clive Sinclair, so there) of the UK, in his Sinclair personal computer. This
device included an implementation of Basic in only 2KB. Its physical design, with
a two-dimensional pad style keyboard, was rather charming. It looked like
a computer you might get in a box of cereal. Its power design was less charming.
It was shipped to American customers with an adapter for American voltages.
When I plugged in the machine, it started a small American fire. After the smoke
cleared, however, I was able to bring this system up using a safer voltage adapter,
and then write small-scale Basic programs. My total investment was about $150.

By the early 1980s, many desktop developers had already used various
hacked Basic compilers to create quite a lot of business and other support for
real users. IBM shipped a solid Basic, GW-Basic, with its very rugged IBM PC in
1981. Microsoft offered the QuickBasic compiler and interpretive runtime, which
was used heavily on MS-DOS systems.

Visual Basic
During the 1980s, as desktops became powerful enough to support bitmap
graphics, developers discovered the graphical user interface (GUI). At that time,
there were two common ways of adding a GUI to a program. The most popular
way was to spend a small amount of time on hacking. However, this created con-
siderable inflexibility and unneeded complexity. The other way was to spend

9
Chapter 1

a lot of time on the careful design of an underlying reusable engine for the GUI.
With luck, this would result in code that could be reused by different applica-
tions and perhaps plugged into different systems. lO
The need to generate custom graphics engines was largely eliminated in the
1980s with the introduction of MicrosoftWmdows (the notable exception at the
time being games, which demanded better access to the hardware than Windows
could provide). Windows 3.1 provided considerable additional power in the form
of forms and controls for display and entry of data-at a high and somewhat
hidden price. To create the simplest command button, programmers in C and
assembler had to write large amounts of repetitious code.
Perhaps for this reason, Alan Cooper developed and sold an engine for draw-
ing forms and controls, known as the Ruby form engine, which was usable as
a set of Application Programming Interfaces (APls) from a variety of languages.
He sold this product to Microsoft, who integrated Ruby with QuickBasic. In 1990,
Microsoft announced the stunningly high-qUality product, Visual Basic 1.
I like to code. But I do not like to code the same instructions repeatedly.
Therefore, I was thrilled, when I adopted Visual Basic 3 in 1993, to be able to
summon up simple forms and their controls, even with a language I considered
clunky compared with C.

Why Use Visual Basic to Write a Compiler for Basic?


As you willieam in Chapter 4, the clunky, old-fashioned, and keyword-intensive
surface syntax of Visual Basic-with its Ifs, Thens, End Ifs, and Do Whiles-hides
a rather elegant subsurface. You will discover that it isn't necessary to hack com-
pilers, even for languages with an older (or legacy) syntax.
C# has a classier syntax, fully based on C. So why do the code examples in this
book use Visual Basic .NET rather than C#?
While nearly all C# developers know Visual Basic, not all Visual Basic develop-
ers have made the transition to C#. I wanted the widest possible audience to be
able to read and modify the source. I wanted to write clear code, maintainable
by readers with access to the code at the Apress Web site. I avoided anything
like unmanaged code or void pointers. Therefore, there was no reason for using
C# or C++.

10. My experience is that this type of approach drives hard-working managers crazy. One rea-
son is that it's an investment not justified by most business cases. Also, in practical terms, it
is a license for ambitious programmers to spend too much time on the interesting, fun, and
possibly renumerative development of the GUI as a product they could, in some scenarios,
resell.

10
A BriefHistory of Compiler Technology

From the standpoint of compiler writers, Visual Basic releases 1 through 4


were interpreters. The compiler generated symbols, which were translated into
special codes. These codes were translated on the fly and "just in time" by a pro-
prietary interpreter, written in C++, unique to Visual Basic.
Visual Basic 5 introduced the ability to compile Visual Basic to actual Pentium,
Cyrix, or other machine instructions, and it officially avoided the need for the
Visual Basic interpreter (while, as many Visual Basic programmers can confirm,
not avoiding the need for shipping the Visual Basic runtime). However, although
the Microsoft developers of the Visual Basic 5 compiler did a great job, actual
results in the field were disappointing. At best, the CPU speed improved only by
20% when Visual Basic 5 code was shipped.
Visual Basic 6 was an incremental release, which perhaps lulled Visual Basic
programmers into a false sense of job security. Then came .NET.
.NET, as you will learn in the next chapter, completely changed the ground
rules. And while it actually removed some facilities, it added powerful interoper-
ability and flexibility. This includes the ability to use Visual Basic .NET to write
true compilers. In this book, I will make the business case for giving .NET sys-
tems the flexibility and ease of maintenance that compiler technology adds.

Summary
This chapter provided some historical background. At the bottom of the "dark
chasm and abyss of time," we don't see C. This is because time did not start in
January 1970 (nor will it end in 2034 when Unix runs out of bits to track time
since January 1971).
Brian Kernighan, the author of the basic book on C, The C Programming
Language, actually uses Visual Basic to teach introductory computer science to
non-majors at Princeton. Brian reasons that America's "best and brightest," who
will go on from Princeton to run the country in some cases, need to know about
real programming, much of which is MIS programming. MIS programming can
be intellectually challenging, but is thought to be For Dummies, with the result
that the challenges aren't adequately met, or are met by Dummies, who are
Dummies because of low self-esteem. Compiler design in Visual Basic is one
excellent way to master Visual Basic for other challenges.
Algol, not C, was truly groundbreaking, while Fortran showed it was possible
to develop a compiler that could outperform human programmers. Algol's key
concept-that a list of statements can, in turn, be a statement-generated struc-
tured programming, which Visual Basic inherits.
I conclude that programming is a human adventure and only accidentally
about programming languages and computers per se. Indeed, see Chapter 4's
introductory material for an alarming if not gnomic quote from a hero computer
scientist in this regard, which will help us to focus on the right stuff.

11
Chapter 1

Challenge Exercise
Crazed coders like challenges. Using your existing programming experience,
consider tackling the following challenge. Otherwise, return to it after reading
Chapter 10.
Your end user's system is characterized by the need to enter and frequently
change business rules such as:

if income>20000 And homeOwner then give credit with an APR of 8%

Justify developing the code for the business tier in Visual Basic, knowing that you
will need to frequently change the rules. How much time will be spent in main-
taining the code? What will happen if contradictory or conflicting rules exist in
the code, such as the sample rule, plus the following:

if income>20000 and homeOwner then deny

Resources
If you are interested in learning more about compiler history, I suggest the fol-
lowing resources:

"Revised Report on the Algorithmic Language Algol 60," CACM, Vol. 6,


No.1, January 1963, page 1; by J.w. Backus et al. This article describes
the Algol language. It is in the public domain and available at a number
of sites on the Web. CACM is the professional journal, Communications
of the Association for Computing Machinery.

"On the Cruelty of Really Teaching Computing Science," CACM, Vol. 32,
No. 12, December 1989, page 1404; by EdsgerW. Dijkstra. This article
gives a good idea of Dijkstra's contention that computer science really
is rather different and why it is hard.

"GaTo Considered Harmful," CACM, Vol. 11, No.8, March 1968, page 147;
by Edsger W. Dijkstra. This article is a bit difficult to read but worthwhile.
It ranks as Dijkstra's invention of structured programming, although he
was too humble to say so.

TheAnatomyofa Compiler, by John A. N. Lee (Wadsworth Publishing,


now Thomson-Wadsworth, 1974). Still available through Amazon, this
book is only of historical interest. It is the book I used as a reference for
IBM Fortran.

12
A BriefHistory of Compiler Technology

"The History ofVisuaI Basic and BASIC on the PC," by George Mack (2002),
http://dc37.dawsoncollege.qc.ca/compsci/gmack/info/VBHistory.htm.1lrls
Web site describes the background of early Basic. Bill Gates will be the first
to admit that he did not invent Basic.

The First Computers: Histories and Architectures, edited by Paul Rojas


and illfHashagen (MIT Press, 2000). This book describes the early dis-
covery that software matters.

Programming Systems and Languages, by Saul Rosen (McGraw-Hill,


1967). This book describes early practice (so you can see that I didn't
make stuff up!).

13
CHAPTER 2

A Brief Introduction
to the .NET Framework
Every few years, the modern-day programmer must be willing to perform
a self-inflicted knowledge transplant to stay current with new technologies.
-Andrew Troelsen

All that is solid melts into air, all that is holy is profaned...
-Karl Marx

WHEN MICROSOFT BROUGHT OUT the .NET Framework, it was a radical shift and
a wake-up call. This chapter describes the basics of this Framework.
According to people I've met at Microsoft, the Framework adds a computer
science level to Visual Basic. However, this doesn't mean that you need to return to
school. Instead, I recommend you refuel in flight. This book will help you to do so.
This chapter is an introduction to some of the issues that arise, in practice,
when code (including, of course, code inside compilers) is reused and how .NET
addresses some of the problems in reuse, including the infamous "DLL hell"
problem. This chapter will explain how the Common Type Specification (CTS)
and Common Language Specification (CLS) provide write-once, run anywhere
interoperability for code in multiple languages. We'll also take an in-depth look
at the Common Language Runtime (CLR) and identify the base class libraries
that support a large .NET toolkit.
.NET binaries provide a layer of information that avoids D11 hell. At the end of
this chapter, we'll briefly examine their structure to see how they accomplish this.

Code Reuse and DLLs


One thing that C and C++ have always had going for them is code libraries.
However, libraries do have some problems. If there is a bug in the library that is
later fixed, all of the programs that use the formerly buggy module must be
recompiled. The solution for this problem is dynamic linking libraries, or DLLs.
Dynamism in programs is nothing new. It was a common way to fit a 1MB
program into 640KB of memory. Consider an accounting package. You might
separate the Accounts Receivable and Accounts Payable sections into their own

15
Chapter 2

dynamic libraries, but they remain a part of the program. When the program
loads, it gets enough memory to load the largest dynamic library. Then when you
want to run the Accounts Receivable functionality, it loads the Accounts Receivable
library and uses it. When you need Accounts Payable, the program unloads the
Account Receivable library and loads the Accounts Payable library. These
dynamic libraries, however, are closely tied to the program, providing half of
the solution.
The second half of the solution comes from device drivers. In the past, if
you were lucky enough to have one of those RGB monitors that could do color
graphics, and you wanted to write a program to use it, you either had to write
directly to the hardware (not pretty) or write to a device driver.
Writing directly to the hardware isn't ugly because it is hard to do and requires
knowledge. Rather, it is ugly because successful use by your software depends on
a large number of preconditions, which have nothing to do with the needs of the
user. It is very annoying for the end user to need to keep old hardware alive just to
run needed packages.
Of course, just because your code worked with one company's device driver
was no guarantee it would work with someone else's.l This led to more dynamic
libraries, less for memory than to load the code that worked with the driver you
specified.
Driver troubles began to be resolved with Windows, which introduced virtual
hardware. 2 Rather than writing to the driver for the hardware, Wmdows allows the
programmer to write a consistent interface and let the operating system handle
writing to the hardware. This means that you need to write the code only once,
and it will work with any monitor, printer, keyboard, and so on.
It wasn't long before this idea spread beyond hardware, and all kinds of DLLs
were being written that provided some consistent interface for doing tasks. This
made many tasks easier. Rather than becoming a data-access expert, you could
use Open Database Connectivity (ODBC) or later, Active Data Objects (ADO).
Learn a few basics of how to connect, how to request data, and how to update
data, and you could access any ODBC-compliant database.
Of course, ODBC and ADO provide new complexities and new issues, and
they do not always make your job easier. However, improvements in technology
improve your programs, without the need to change code. If the ODBC driver is
improved, then upgrading the driver makes your program work more efficiently.
The database system isn't locked to old hardware and operating systems. On the

1. Nonprogramming computer users are often astonished by the need to acquire new drivers
for new hardware. They are not amused by device driver conflicts. a consequence of the
original (19808) vintage design of PCs.
2. Nonprogramming computer users may be ROTFL (rolling on the floor laughing) because
they still have problems with Wmdows drivers. However. they may fail to realize that driver
problems are an inevitable consequence of the new stuff they have to play with. Newer
software always tends to be more buggy. even though Plug and Play now works in the vast
majority of cases.

16
A BriefIntroduction to the .NET Framework

other hand, accessing the new goodies does impose a converse requirement: the
forced upgrade when the new feature requires a current operating system. On
the whole and from a business perspective, however, this is much easier than
operating museums of computing legacy arcana just to support users.

DLL Heaven and DLL Hell


There she stood in the doorway;
I heard the mission bell
And I was thinking to myself,
'This could be Heaven or this could be Hell'
-The Eagles, Hotel California

With the introduction ofDlls and all the derivatives (COM, COM+-the alpha-
bet soup can be dizzying), programmers became more efficient. But most
managers didn't notice, since the programmers were asked to do more work. 3
No longer did programmers need to deal with the underlying plumbing.
Microsoft continued to make their jobs easier by introducing technology like
Microsoft Transaction Server (MTS) to handle transactional processing. Yes, life
was good as long as a few rules were followed.

The 11 Commandments for DLLs

1. Thou shalt create an interface, which shall be a standard, which can be


added to but not taken away from.
2. Thou shalt version thy DU. consistently.
3. Thou shalt maintain backward compatibility.
4. Thou shalt write installation routines correctly, so that older versions
shant overwrite newer ones.
5. Thou shalt keep all internal functionality private and not expose it to
the world.
6. Thou shalt create properties using GETI LET ISET rather than PUBUC
variables.

3. In my experience as a programmer, which stretches from the ancient, second-generation


(1959) IBM mainframe 1401 to Web development in the dot.com mania, today's programmers
work much harder than in the past. Offshore developers, in particular, work terribly hard.
I worked with them in Fiji, and they shared a single room with a wheezing air conditioner.
Despite the heat, they worked very hard without coffee or Internet breaks, because the alter-
native in Fiji is cutting sugarcane.

17
Chapter 2

7. Thou shalt put error checking within all thy code and handle any errors
thou can.
8. When thou encounters an error thou cannot handle, thou shall pass it up
with correct and documented error codes.
9. Thou shalt not change thy error codes, though thou may add new ones.
10. Thou shalt generate a complete executable for the entire system, starting
at day one of coding
11. Thou shalt play nicely with other DILs.

As long as you follow the rules for DLLs, everything works, and you save
vast amounts of programming resources. And following the rules isn't too hard,
unless you happen to live in the real world of users, operating systems, and
third-party controls. Then you could easily find yourself transported into DLL
hell. Let's take a simple (and all too common) example.
You have a project that uses version 1.5 of a common third-party widget
from the Acme Novelty Company. The widget is used a lot in your program,
which is used consistently by the president of your company to keep her fin-
gers on the pulse of the company.
Your company isn't a software company, and the president, like many other
managers, is focused on her job of running a company. She neither needs nor
wants to understand the minutiae of programming. However, she has enough
technical know-how to be able to download and install software. One day, a peer
recommends a demo program. He has the demo from when he installed it last
year, so he gives it to her, and she installs it.
And that program, during the installation, installs version 1.3 of the widget
your program relies on. Now it shouldn't have, because that violates one of the
11 commandments of DLLs. But it does. And the president doesn't notice any-
thing wrong while she's exploring this new software package. In fact, she doesn't
notice anything wrong until that afternoon, when she decides to run your pro-
gram, to check the pulse of the company; in other words, to do her job. And she
gets an error.
The egg hits the fan. She calls your tech support team because your software
is broken. Time, energy, and resources go into finding the cause. And how do you
explain that someone else broke your software? You sound like a weasel.
This brings us to .NET.

18
A Brief Introduction to the .NET Framework

.NET-Beyond DLLs
One of the goals of .NET was getting rid of DLL hell. Another was making coding
and program interaction easier. And thus was born the .NET Framework. To under-
stand the Framework, you need to look at what it's made of, and that would be the
four Cs: CTS, CLS, CLR, and class libraries.
In later chapters, you'll explore the Framework pieces in detail, because the
QuickBasic compiler that we're going to build will use the four Cs. Here, you'll
just get an introduction to how these parts work. However, we'll spend some
time examining the CLR, because that will help you to understand the design
decisions in our compiler.

The CTS

The CTS is the COll1l1on Type Specification. This defines all of the possible data
types and constructs supported by the runtime environment. This means that
a 32-bit integer is a 32-bit integer everywhere. Providing the definitions of the
data types allows everyone to work together. Think of it as the metric scale for
programming languages-a way to standardize.
The CTS is, unfortunately for Visual Basic programmers, based on the C lan-
guage. In consequence, an Integer data type in Visual Basic .NET is a 32-bit integer
in the range -21\31 to 21\31-1, not a 16-bit integer in the range -21\15 to 21\15-l.
Arrays in Visual Basic .NET can no longer start at any index other than zero.
When you declare an array, such as strArray(5), you are specifying not the number
of elements, but the upper bound of the array. strArray(5) declares six elements
now numbered in .NET, at all times, from 0 through 5.
Strings in the CTS are updated differently than strings in Visual Basic 6. Visual
Basic 6's runtime manages strings in their own region because, traditionally, Basic
languages have treated the string as an independent data type and imposed no
fixed limit on strings.
In .NET, Visual Basic strings are implemented using the .NET String object. In
.NET, when the string is altered, the alteration makes a copy of the original string,
throwing away the old string.

TIP In cases where you need to frequently alter the contents of a string, con-
sider using the .NETSystem. Text. Stringbuilder object, which allows for
more efficient modification.

19
Chapter 2

The CLS

The CLS is the Common Language Sped fication. This part of the framework defines
what every language must implement to be a .NET language. It's a subset of the
CTS, because not all of the types defined in the CTS are in the CLS (for example,
Visual Basic .NET does not have the ability to declare an unsigned number).
However, as long as the code you write sticks to the CLS-defined types, it will
interact with code written in any other language without any problems.

The CLR

The CLR, or Common Language Runtime, is the heart of the .NET Framework. The
CLR handles loading your code, managing variables and objects, and providing
the interconnectivity between all .NET programs.
The CLR is structurally a simple virtual machine definable by an interpreter.
The CLR, however, doesn't have the performance penalty of classic software
interpreters, because on first execution the code is compiled, "just in time," to
native code.
The confusing fact is that while you can and s.hould think of the CLR as
a traditional slow interpreter for a virtual machine, it actually converts to native
code behind the scenes. Individual compilers act as if they were creating purely
interpreted CLR code, but behind the scenes, JIT compilers tailor the instruc-
tions to the platform. In this way, the compiler does not need to be rewritten to
generate code for a new or different machine.

NOTE In this book, you'll see two programs (the runtime o/Chapter 3's prod-
uct integerCalc and the testing runtime o/the quickBasicEngine compiler
itself) that show how to develop a virtual machine, similar to the CLR but
much less effiCient.

The Stack and the Heap


In the abstract, the CLR supports a machine with a last-in, first-out (LIFO)
stack and a heap. The stack contains value objects, which are referred to this
way because they are fully described at any time by identifying their value.
Value objects are typically numbers, and they take up fixed amounts of stor-
age. The heap supports objects of variable size and shape, including strings
and user objects. These objects are represented by pointers in the stack.
Figure 2-1 shows the stack and the heap.

20
A Brief Introduction to the .NET Framework

123
.,67e-12 I
4,.123 _ The heap contains reference obj ects
'I
of various shapes and sizes, in eluding
system objects like strings and
"I. \ user-defined objects.
\.
t-- The stack contains numbers
and pointers to the heap.

~\ \..
Customer object

\
"This is a string"

Figure 2-1. The eLR stack and heap

As you can see in Figure 2-1, the numbers are represented directly in the stack.
Visual Basic strings, which can have widely varying lengths, are represented by
pointers to the heap. Value objects can also be stored on the heap, using a process
called boxing.
The heap is somewhat like your garage, before your significant other gets
you to clean it up. The garage is where you put objects that don't fit neatly in the
house. The major difference is that the objects in the heap are accessed more
frequently than the foam plastic things that hold electronic equipment, out-of-
date computers, out-of-date computer books, infant carriers for infants who are
now anguished teens, and back issues of National Geographic.
Objects that use the heap are subject to a process known as garbage collection.
Regularly, and at an interval not under your control (unless you force garbage col-
lection by calling GC . Collect), the Framework sweeps through the heap and deletes
objects that are no longer referenced by your program.

When You Might Need to Expose Dispose


There is one case where garbage collection is a serious problem. That's when
you create an object that declares and creates references to objects that corre-
spond to or tie up limited system resources, such as window handles, system
objects, database connections, and so on. For these objects, you should imple-
ment the IDisposable interface and code a dispose method, which should call
the Dispose methods of any objects it references and release any system
resources used by the object.
Expose dispose only when necessary. If the object's variables in the General
Declarations section are numbers and strings, dispose isn't necessary. You may
wish to expose a dummy dispose in your class if you anticipate the addition of
reference objects at a future date and you need to ensure that the object's users
use dispose.
My own practice is anal, since I want to make sure, as far is possible, that the
callers of my object dispose of instances. Iexpose dispose for all objects with
state in the form of variables in the General Declarations section.

21
Chapter 2

The alert reader may have a question here. If, as I said, a string is a reference
object, then if a string appears in the General Declarations section, shouldn't
this force the conscientious programmer to implement dispose? The answer is
that the string object does not and will not itself create "open-ended" reference
objects in the heap. The string will be in the heap, but it will never create any
reference objects on its own. The string is a "closed" object, which, as a string,
can be left on the stack for the garbage collector.
But a true reference object is permitted, now or after being modified, to create
reference objects as part of its state. If you do not implement a dispose for
a true reference object (or fail to call dispose when you're finished), runaway
use of the heap is possible.
It might be the case that you know a reference object is very simple and safe,
and you know that it does not create any further references. However, real
experience in real development teaches you to also know that this situation
might change.
The goal of a team standard dispose is to avoid open-ended situations that
result when a more complex reference object is not destroyed by executing its
own dispose, leaving it and any objects it declares (directly, or indirectly by cre-
ating other open-ended reference objects). The heap becomes cluttered with
dead soldiers, which appear, to the CLR, as needed objects. If you do not
implement a dispose, reference objects will clutter the heap until the system
gets around to freeing their storage.

Portability

A major design objective of the CLR was to enable "write once, run anywhere."
For example, you may want your software to run on the Web, on different servers,
and even on different hardware platforms (and your manager may want this even
more than you do). There are many cases where the cost to create software that
runs on more than one box can't be justified.
However, software should be designed whenever possible to run on more
than one box. This is natural in the university environment, for example, where
faculty members do not want to submit to the rules of a centralized facility. In
industry, we don't talk as much about this need, but the business reality may cre-
ate it anyway. We don't control, for example, an upper management decision to
change platforms or demand that the software run as a Web service.
The CLR deconstructs the idea that computing power takes a positive amount
of extra thinking time, proportional to the power increment. If you follow the rules,
and in a Microsoft tum of phrase "let go and let the CLR," the code will be trans-
portable for free.

22
A BriefIntroduction to the .NET Framework

Of course, we've heard this before. There have been cases where the promise
wasn't fulfilled. But in significant areas, as long as the platform supports the
Framework (which happens to be free), transportability exists to a much higher
degree than it did with COM.

Reliability
A second major design objective of the CLR was to make operations predictable
and to avoid bugs based on creative abuse of data types. A classic example is
using string data for a numeric operation and forcing overflow deliberately for
a result that you "know" will occur. The problem arises when the overflow does
not occur as planned when the software runs on a new machine or in a different
environment.
For example, a poorly written program might read a text field from a SQL
database and immediately try to do arithmetic on the field, creating a crash in
the field. Or the programmer might add one to a number such as 32,767 (the
maximum value of a Short integer) just to transfer control out of a deeply nested
set of procedures.
Respect the CLR, for it encapsulates years of knowledge about how to create
reliable software, on schedule. Lessons learned by Microsoft and incorporated by
the developers of the CLR include the lesson of the stack, the lesson of typing,
and the lesson of just-in-time.
The lesson of the stack is that for solving problems, a machine or paradigm
with a stack is better than a machine or paradigm with a small number of
general-purpose registers. This is because even simple problems can and should
be broken down into simpler subsolutions. Not all managers see this, and this
may be why there has been some resistance to stacks. The stack, despite its inef-
ficiency (the admitted fact that it is a single bottleneck from the standpoint of
multiple parallel threads) represents a problem that has been modularized.
The lesson of data typing is that sometimes your need in C for a void pointer
(a pointer that points to untyped data) or in Visual Basic for a variant represents
poor design. What you really need is code that, if it compiles, will probably work.
If all your variables are of the precise type needed by the solution, you shorten
the time between a clean compile and a correct result.

Class Libraries
The final component of the .NET Framework is the base class libraries. You
can think of these as similar to the operating system DLLs prior to .NET, but they
provide access to all of the basics that you need:

23
Chapter2

• Data access

• Security

• XM:LISOAP

• File 110

• Debugging

• Threading

• User interface

All of this comes together to create the .NET Framework and to change how
your programs work. In VB.Classic, you might create an executable that uses the
VB* .DIl.. to run or VC++ to create a stand-alone executable. In the .NET Framework,
you create a binary file that sits and waits until you run it. It's the same binary file,
regardless of whether it was written in Visual Basic .NET, C#, J#, Cobol.NET or Eiffel
.NET. And although they may have a OIl.. or an EXE extension, they look nothing
like the pre-.NET files of the same type. Instead, they contain a quiet revolution. In
the next section, you'll see what I mean by this.

Inside a .NET Binary


Open a .NET binary file, and you won't find a fortune, but you will find Microsoft
Intermediate Language (MSIL). MSIL is kind of like a tokenized version of your
code that looks like assembler, but it has a very different purpose.
This binary file, called an assembly, contains MSIL code and metadata that
describes the characteristics of all the types and interfaces within the assembly.
When you want to use the assembly in your own programming, this interface
information is read and processed, and then handed up to the calling program.
Then when the assembly actually needs to be run, the CLR takes the MSIL and cre-
ates a compiled program specific to that operating platform. That means that
a .NET assembly can run, without programming changes, on any .NET platform
that supports the four Cs.

Performance Penalties in .NET


One objection Visual Basic 6 and COM programmers in general have about .NET
is, "All this loose talk about storing type information in assemblies means that
things will run slower, and a performance penalty will be incurred."

24
A Brief Introduction to the .NET Framework

The quick response is that .NET does the extra work once and not continu-
ously. Its design lends itself to "just-in-time" processes. Missing is the
redundant and behind-the-scenes work traditional interpreters (including the
Visual Basic 6 interpreter) had to do continuously while an application pro-
gram was running.
There is a performance penalty in both .NET and the Java environment,
although it is not nearly as great as that of traditional interpretation. In return,
we get code that can run as a Web service (allowing authorized remote users
and remote software to use code on any platform). Additionally, DIL hell, as
we know it, goes away.

Summary
This chapter explained how modular libraries of code were necessary for reusable
code, but also created problems. Those problems motivated, and indeed explain,
the features of the .NET Framework. My experience as a programmer and mentor
is that the deepest appreciation of a feature comes through awareness of its
absence.
As Andrew Troelsen implies, the Framework is a major change, which deval-
ues deep hacker knowledge of the specifics of things like variants and the details
of the Visual Basic 6 runtime. The Framework is actually an artifact of computer
science. It uses tested but advanced techniques to enable us to write once, run
anywhere.
As Marx (Karl, not Groucho) saw, we make progress by means of "creative
destruction," and this is why we can't get comfortable with the deficiencies of
Visual Basic 6. Marx said, "All that is solid melts into air," which means that job
you had in COM has gone with the wind.
My advice, harsh as it may seem, is this: suck it up. Learning new stuff is
a great way to stay young.
We're now ready, in Chapter 3, to address another computer science artifact:
a simple front -end compiler, as a flyover of terrain we must walk over, starting in
Chapter 4.

Challenge Exercise
What are the 11 commandments for DLLs? Can you suggest any additional
commandments?

25
ChapteT2

Resources
The following are some resources for more in-depth information about the
Framework.

"Common 'JYpe System," .NET Framework Developer's Guide, http:1 I


msdn.microsoft.com/library/en-us/cpguide/htmll
cpconthecommontypesystem.asp. This is the main CTS documentation.

"What Is the Common Language Specification?" .NET Framework


Developer's Guide, http://msdn.microsoft . com/library/en-us/cpguidel
html/cpconwhatiscommonlanguagespecification.asp. This is the main CLS
documentation.

Visual Basic .NET and the .NET Platform: An Advanced Guide, by Andrew
Troelsen (Apress, 2001). This book gives a good overview of the .NET
platform as it relates to Visual Basic .NET. Consider reading it and work-
ing through Andrew's code.

26
CHAPTER 3

A Compiler
Flyover
For insofar as we understand, we can want nothing except what is necessary,
nor absolutely be satisfied with anything except what is true.
-Baruch Spinoza, Ethics, OfHuman Freedom

BEFORE I VISIT A NEW TOWN on business, I often use Microsoft Flight Simulator to
buzz the city to get a general idea of the lay of the land. This chapter will consti-
tute a "flyover" of compiler theory and the hands-on application of the theory.
You will learn about the phases of a compiler and the three formal approaches
used by compiler developers to complete those phases: regular expressions,
Backus-Naur Form, and Reverse Polish Notation with stacks. To "keep it real,"
you will see how these approaches are applied in working code, the integerCalc
application. Understanding integerCalc is an excellent preparation for under-
standing the more complex quickBasicEngine described in Chapters 5 through 8.

The Phases of a Compiler


It is generally agreed that a typical compiler for a standard, procedural language
like Basic should do its work in three phases:

• Scanning: Divide the source code into tokens above the level of a charac-
ter but below the level of statements and expressions.

• Parsing: Use the tokens developed in the first phase to parse identifiers,
expressions, statements, methods, and so on. This phase should create
either object code or an intermediate language.

• Object code optimization: A commercial-grade compiler (as opposed to


a student or free compiler) should optimize the object code or intermediate
language by removing common constructs and making safe rearrangements
of code to reduce the code size or improve its performance.

27
Chapter 3

These three compiler phases are conceptual. Early compilers were often
constructed of separate programs, each of which executed a separate pass. In
these passes, the compiler would read the entire source text to create a table or
file of tokens. Then the compiler would read the token file or table and parse the
tokens to generate object code.
Although the quickBasicEngine compiler we will build in later chapters uses the
serial approach, it is not necessarily the best approach. For a production-quality
compiler, in place of each pass, you can use object-oriented design to fullyencap-
sulate the result of any phase.
Our goal, in writing a compiler, is not to pass over the source code for the sake
of it. Instead, the phase one goal can be an object, a "scanner server," which keeps
track of its current state and can be called to get the next token. The state of the
scanner server can be an input file and a position in the input file.
However, for now, you can think of each phase as reading a large file. In the
scanning phase, this is raw source code (in some instances, preprocessed, per-
haps by the C++ preprocessor or the more limited pound-sign statements found
in Visual Basic). In the parsing phase, this is a token file. In the optimization
phase, this is a file of object code.
As a final phase, many compilers (such as Visual Basic releases 1 through 6)
include an interpreter, which executes the compiled code in an intermediate
form. Although interpreters can be slow, their interactive nature and ability to
modify code and values on the fly make them useful for quick results and offer
easy debugging.

NOTE The .NET CLR is not an interpreter. The intermediate code produced by
.NET compilers is further compiled into native code, as noted in Chapter 2. by
a separate step called lIT compilation.

Three Theories, and Keeping It Real


In order to understand how a simple compiler is constructed, we need to turn to
three theories and three formal notations: regular expressions, Backus-Naur Form,
and Reverse Polish Notation.

• Regular expressions are associated with phase one of a compiler: the scan-
ning (lexical analysis) of source code.

• Backus-Naur Form (BNF) is associated with parsing and the initial object
code creation.

• Reverse Polish Notation (RPN) helps with a compiler'S phase three, creat-
ing, without making hardware, a computer on which to run your code.

28
A Compiler Flyover

To keep it real, let's take a look at a very small compiler, integerCalc, which uses
the theories in code to do simple integer calculations. The source code and object
code for integerCalc are available from the Downloads section of the Apress Web
site (http://WtM.apress.com). You will find the code in the egnsf/apress/integerCalc
folder.
Open integerCalc/bin/integerCalc.exe and run it. Type a math expression,
calculating with integers only, and click the Evaluate button to see its calculated
value. Your screen will look something like Figure 3-1.

~ integercalc ' "';,

I (256-1 )/3+«23-5)+45)*8

589
More Close

Figure 3-1. The integerCalc application is a simple compiler for integer


expressions.

NOTE If you check integerCalc ~ work with your Windows calculator, you will
find that it works with integers only. For the expression shown in Figure 3-1,
your Windows calculator would return 36.8235....

The source code of this application, in integerCalc/forml.VB and related


files, uses lexical analysis with regular expressions, parsing based on BNF speci-
fications, and an RPN interpreter. To understand how it works, read the code
along with the rest of this chapter.

Lexical Analysis with Regular Expressions


Both phase one and phase two of a classic compiler do essentially the same
thing-parsing. However, the parsing of raw character data is significantly differ-
ent from the parsing of higher-level constructs, in much the same way as the
spelling checker and grammar checker are different levels in Microsoft Word.
In Word, many users spell-check their documents. Amore advanced check
is done with the grammar checker, which looks for common blunders in putting

29
Chapter 3

words together, such as dangling constructs, passive voice, and run-on sentences.
The grammar checker must use the spell checker's ability to form individual words.
Job one in a good compiler, or grammar checker for that matter, is lexical
analysis. The input of lexical analysis consists of the raw stream of characters,
including newline characters. The output consists of a stream of tokens. Each
token is a small data structure, indicating the start, length, end, type, and value
of a meaningful "word" in the text or programming language.
Consider, for example, the Visual Basic statements in Figure 3-2, over which
I have placed column numbers.

_5_10_15_20_25_3o_35_40_45_5o_ 55_ 60_

lintIndex111-llintcount II Mod 1141 I' Calculate the remainder I\n I


65_70_75_80_85_9o_95_10o_ 105_ 110
I' Increment

Figure 3-2. Scanning, also known as lexical analysis

This code can be tokenized by a lexical analyzer. Let's go through some of it


to see how this works.

• intlndexl is an identifier. It starts at column 5 and is nine characters long.

• One blank white space character starts at column 14. (White space is actu-
ally shaded as gray space.)!

• One operator (the equal sign) starts at column 15.

• One blank white space character starts at column 16.

• intCount is an identifier starting at column 17 for a length of eight characters.

• From the point of view oflexical analysis, Mod is an identifier (that later will
be classified by the parser as an operator). It starts at column 26 and is
three characters.

• One blank white space character starts at column 29.

• A number starts and ends at column 30.

1. White space is a C language word that refers to the characters from 0 to blank (ASCII 32).

30
A Compiler Flyover

• Six blank white space characters start at column 31.

• A comment starts at column 37 and is 25 characters in length.

• On Windows systems, a newline starts at 62 and is two characters long


(it is represented as \n).2

• Four blank white space characters start at column 64.

• intIndex2 is an identifier starting at column 68 and is nine characters long.

You can watch the integerCalc application tokenize its source code by run-
ning integerCalc.exe and clicking its More button to see an additional display, as
shown in Figure 3-3. Enter an expression and click Evaluate to scan, parse, and
evaluate the expression. The expanded display identifies (by type, start index, and
length) the position of each token in the source code on its left side.

loom . ,

leftParenthesis at 1-1
number at 2-4
operator at 5-5
nur:lber ae 13-::',g
IOQ•• ral:or at. :'5-150 number at 6-6
nUltber al: 6- 6 rightParenthesis at 7-7
r1qhl:Parenl:heS1S a~
op~ra~or a~ le-le operator at 8-8
nUltber al: ~9 - 20 number at 9-9
r1qhcParenches1s a~ 21-2:
at 22 - 22
operator at 10-10
leftParent~esis at 11-11
leftParenttlesis at 12-12
number at 13-14
operator at 15-15
number at 16-16
rightParenthesis at 17-17
operator at 18-18
number at 19-20
rightParenthesis at 21-21
operator at 22-22
number at 23-23

Figure 3-3. The More display of in tegerCalc shows how the expression is tokenized.

2. Wmdows newlines consist of the carriage return (hex DJ followed by a newline (hex AJ. On
the Internet. this would be a single newline character.

31
Chapter 3

Lexical Analyzer Construction


A lexical analyzer can be built in three ways:

• Brute-force coding

• Using regular expressions

• Using a lexical analyzer generator

Before deciding to use brute-force coding (as is used in integerCalc), it is


important to be very clear on what you want to accomplish! In integerCalc,
I wanted to partition the source code into mutually exclusive tokens and iden-
tify each token's start index, length, and type.
You can use regular expressions to specify the low-level syntax of your language
and as a basis for writing brute-force code. Therefore, let's start with regular expres-
sions. You may have already played with them, since Microsoft supplies a regular
expression object (Regex) in the .NET Framework.

Components of Regular Expressions


Regular expressions are not so much a programming language as a mathemati-
cal and logical notation. In fact, this notation was devised by mathematicians
and logicians before computers were in general use.
As a programming language, regular expressions are deficient, as I think
you'll see. For one thing, they look like explosions in a gnome factory; some reg-
ular expressions can look like messages from foul trolls from the Dark Ages. They
can be difficult for programmers to debug and to maintain. However, they are
invaluable for the understanding and formal specification of the goals of the code.3
A regular expression is a rule for specifying a simple formal language, and it
defines the set of strings that are part of the language. It consists of a series of data
characters and metacharacters.
Most ordinary characters are data characters, and their appearance in the
regular expression means "at this location, these characters must appear." A reg-
ular expression that contains no metacharacters whatsoever specifies that the
language defined consists of one string. For example, the regular expression
Authors press specifies the language consisting of only the string '~uthors press."

3. Don't inflict regular expressions on end users, but keep them bandy for a fully precise specifi-
cation, and use them in code when you are confident that your maintenance progranuners
will be comfortable with their use.

32
A Compiler Flyover

The power of regular expressions lies in metacharacters, a rather large collec-


tion of special characters such as the asterisk, plus sign, and slash. (Don't blame
the gnomes of Unix for the use of a large number of metacharacters; mathemati-
cians used these to save space on blackboards.)
The asterisk specifies a repetition of a string or subexpression zero, one, or
more times. 4 For example, a* Oowercase a, asterisk) specifies that "the valid strings
in my language consist of zero, one, or more copies of the letter a." Using the
asterisk provides the capability of specifying languages with an infinite number of
symbols. The regular expression (Author's press)* represents the set of strings ""
(the null string), '~uthor's press," '~uthor's pressAuthor's press," and so on. This
set consists of an empty (null) string, because the asterisk specifies zero or more.
The plus sign in a regular expression specifies one or more repetitions.

Why Worry About Null Strings?


The null string is mystery meat. There is only one null string, and it contains no
characters. In business, it seems useless, and it's one of those marvelous math-
ematical constructions that gives geeks a bad name. But it is useful for the
same reason that zero is handy.
Zero is handy in business when you are broke; the null string is useful in busi-
ness when you have nothing constructive to say. Zero was, in fact, introduced
from Arab countries into Europe at that point where European merchants
needed to borrow capital for long voyages of trade, conquest, and discovery.
They needed zero and negative numbers to track the inevitable losses.
A compiler should be able to compile null strings and even null programs.

Any metacharacter, including the asterisk, can be preceded by a backslash


escape character, which changes the metacharacter into an ordinary character.
For example, \ * represents the language consisting of a single asterisk, and \ **
represents zero, one, or more asterisks. This latter regular expression should use
parentheses for clarity: (\*)*.

4. You have seen a form of this in file identifiers such as a* •txt, which defines a "languagen con-
sisting of every file beginning with the letter a and ending with the extension .txt. However,
MS-DOS never really supported regular expressions, aside from a limited form apparently
based on a misunderstanding of the Unix grep command.

33
Chapter 3

Regular expressions have a number of limitations. Without extensions, they


are unable to parse even balanced parentheses properly, and they cannot com-
pletely parse most real programming languages. Parsing based on BNF is needed
for full-scale compiling of real languages.

Regular Expression Tools


There are many more things to learn about regular expressions, and complete
documentation is available from your Visual Basic .NET CD-ROMs. At this point,
however, you can get started learning about regular expressions using the relab
tool. The code for relab is available from the Downloads section of the Apress
Web site (http://WvM.apress.com) , in the egnsf/relab/bin folder.
Open relab/bin/relab.exe and run it to see the screen shown in Figure 3-4.

~ reLab p;

Ra ulDr Expressians AvaiDble. doubIe-dick IDs lle!ete~(fixO(hx'9 • Visual BDaic code


G:a hl~l cha.:act.er ,ec (eha:!b.ceer.s en ~5~ PC ke -boe.:rd! ..
- • • • • • • %:.o!!: .... _l.:it!
V1 5:lal. 3e!l.C C(m::::!.ene lncl:ld.1ng ero!t..L..1.n;- n!!!loI .... !.:"I.~: [ ' .. f ~ ~!l:. ob"~ A:J ~:e:'" Sy:,.:.e::..:'ext.P-equle:<Exp.:ea5.lon.
i\eg'tx _
("I(~'" x)O- It ... 9:} -'' )
5•••

T",,\ OnID Stnng" Av.....bIe. dou ~ 10 seIBel


Null .5t. !:'1.!lQ:
One: blank characcf!:::
Multiple l1neo: t.l.ne ! of 3"O.O_3"~OO:Ot.l.n. 2 of ;"00013lf00010~1~' 3 of 3

Tes l Save Sellings Reston> SeI1J11gs I About Test tr'le comman regu Sf eXpiesslOM Close

Figure 3-4. The relab.exe program provides a tool for evaluating a set of prewritten regular
expressions, entering your own regular expressions, and converting them to Visual Basic
declarations.

This tool provides several canned regular expressions for your use; it docu-
ments the most popular symbols; it allows you to create, document, and save
Americas Funniest Regular Expressions; and it can convert regular expressions
with special characters (such as newlines) into readable Visual Basic expressions.
Most important, it allows you to test regular expressions, which prevents bugs in
the code where you use them.
34
A Compiler Flyover

Another tool in the source code is located in egnsf\utilities\bin\utilities.dll.


This DLL is a collection of shared, stateless utilities for common tasks. The utili-
ties.commonRegularExpressions file provides a series ofregular expressions as
strings, which you can use to drive the Regex object.

A Sample Regular Expression


For an example of a regular expression, take a look at the code in integerCalc
(integerCalc/forml.VB). Find the private function named scan Expression, in the
Scanner region.
Although I did not use the Regex object, I was thinking hard about the follow-
ing regular expression in writing the code of scan Expression:

\+1\-1\*1\/1\(1\)1[0-9]+

Note the following about this regular expression:

• The vertical bar separates alternative possibilities.

• Each operator is preceded by an escape backslash to avoid treating it as


a metacharacter.

• Square brackets in regular expressions surround character sets. These rep-


resent anyone of an alternative set of characters. They can be listed by
brute force (for example, [abc] specifies the possible appearance of a, b,
or c) or as ranges separated by dashes (as in the example, [0-9]).

The regular expression is confusing-certainly more so than well-written code.


However, it specifies precisely what is parsed, whereas code specifies nothing but
unverified behavior. While your users won't understand the regular expression, it
can form a basis for a discussion of what is expected.

NOTE My experimentation has found that code is somewhat faster than


using the Regex object provided with Visual Studio .NET, but your mileage
may differ.

Parsing and BNF


Parsing takes the chunks produced by the lexical analyzer, which you can consider
to be individual words of your language, and builds sentences and paragraphs.

35
Chapter 3

Parsing is usually based on the representation of the language in BNE The BNF
of a language is a series of productions, of the form nonterminal : = graflf1lQrSymbols.
In each production, non terminal is normally a single word that identifies a grammar
category, such as "noun" in a grammar for English or "statement" in a grammar for
Visual Basic.

Grammar Categories
There are two types of grammar categories: nonterminals and terminals.
Nonterminals are like sentences in the grammar for English, because a gram-
mar for English must provide productions that explain what a sentence is
(such as subject verb object.) Terminals need no further explanation, and they
are detected not by the parser, but by the lexical analyzer. In English, they
would be words. In Visual Basic, they would be identifiers or numbers.
In a massively oversimplified grammar for English, we might have the fol-
lowing productions:

sentence := noun verb


sentence := noun verb noun
noun := John
noun := Mary
verb : = likes
verb := sees

In this grammar, the nonterminals are sentence, noun, and verb. The termi-
nals are John, Mary, likes, and sees. The power ofBNF is that this oversimplified
grammar for a Dick-and-Jane level of English nonetheless allows us to parse
a number of different sentences, such as "John sees" and "Mary likes John."
In a grammar for Visual Basic, the nonterminal statement might look like this.

statement := assignmentStatement
assignmentStatement := IValue = expression

Note that IValue is either ~ simple variable or a reference to an array in Visual Basic.
I call it an IValue as shorthand for location value. This is because in Visual Basic, the
left side of an assignment must refer to a storage location.

BNF Tools
Unlike for regular expressions, no object is shipped with .NET to transform the
BNF to code or interpret the BNE However, such software exists for a variety of
platforms. The oldest example is the Unix program yacc. The problem with these
36
A Compiler Flyover

tools is that their effective use demands that you have written some parser code
by hand to get a feel for its complexities.

Yacc: Not Another Compiler-Compiler


Yaccstands for Yet Another Compiler-Compiler. This title is both apologetic and
inaccurate. At the time yacc was written, a number of researchers had written
programs to create compilers from their language specification.
The name apologizes for reinventing the wheel, but in fact, yacc is not a com-
piler generator; it is just a parser generator. It does not address either lexical
analysis or code generation. Calling yacc "yet another compiler-compiler" is
lamentable and like a caveman calling a wheel "yet another Ford Fairlane."
Nonetheless, as is the case with many Unix commands, the gnomic name
yacc stuck.

ManyVisual Basic .NET authors encourage you to create .vb files with a sim-
ple text editor outside the GUI to get a feel for the way in which forms and classes
are constructed. Similarly, I recommend that you write a parser that implements
the BNF to understand how to construct a BNF for a language.

BNF Design
Take another look at the source code in forml.VB for integerCalc, in the Parser
region. The parser is a series of recognizers for grammar categories of a simple
math grammar. Here is that complete grammar:

expression := addFactor [ expressionRHS ]


expressionRHS = addOp addFactor [ expressionRHS
addOp := +1-
add Factor = term [addFactorRHS]
addFactorRHS = mulOp term [ addFactorRHS ]
mulOp := *1/
term := INTEGER 1 ( expression )

Note the following about this grammar:

• Unlike parentheses, which are terminals and correspond to the appear-


ance of parentheses in the source text, square brackets in BNF mean that
the bracketed grammar symbols are optional.
37
Chapter 3

• An informal regular expression notation is used to explain nonterminals;


see the addOp and mulOp definitions. This is because all regular expressions
can be translated to BNF (but not the other way around).

• In typical BNE a name in uppercase represents a self-explanatory terminal,


which, being self-explanatory, does not need to be further defined, such as
INTEGER in the example. The lexical analyzer has already isolated tokens.

The first line declares our goal, which is to parse an expression, such as 1+2-3*4.

expression := add Factor [ expressionRHS ]

It divides the expression in this example between 1 (the add Factor) and the right
side (the expressionRHS), which is +2-3*4.
An add Factor is anything that can be a part of an addition operation. The
expressionRHS is anything that starts with a plus or a minus sign and appears on
the right-hand side (RHS) of an expression.
Now consider this BNF line:

addFactor = term [addFactorRHS]

The brackets mean that an add factor can be, but does not have to be, termi-
nated by an "add factor right-hand side." But let's set that aside for now, because
it looks like 1 does not have aright-hand side (the plus sign after 1 starts an
expressionRHS).
And, if you look at the BNF rule that defines term, you can see that a term is
an integer or an expression in parentheses.

term := INTEGER 1 ( expression )

1 is an integer, therefore, we have a term, which is also (going back up the tree)
an add factor.
Let's return to the top of the BNE Since we have found an integer, which is
a term and which is a full-bodied add Factor, we can move to the right in the
expression.
Is the string "+2-3*4" an expressionRHS? It does begin with an addOp, because
a plus sign is an addOp (see the rule addOp : = + 1-). The addOp seems to be followed by
an add Factor, the number 2. Therefore, it looks like it starts with the expressionRHS +2.
We then see another call for an optional expressionRHS in brackets inside the
rule for expressionRHS, which means that any expressionRHS can embed one or
more smaller expressionRHS instances at their end.
So, we look for the addOp, which must start any expressionRHS according to
our rules, and find a minus sign. We find in the definition for addOp that a minus

38
A Compiler Flyover

sign, like a plus sign, is considered an addition operation.5 Then we look for the
smallest add factor and find the number 3.
However, take a look at the definition of add Factor. In the expression, 3 is not
followed by a plus sign or minus sign; it is followed by an asterisk. In the definition
of an add Factor, a term (such as the integer 3) may be followed by a different type of
RHS construct. This is the addFactorRHS, the right side of an add factor, which may
start with a multiplication or division symbol (see the rule mulOp : = *I/). Since 3 is
followed by an asterisk and the integer 4, the 4 constitutes an addFactorRHS, and 3*4
is the addFactor that follows the plus sign.
In applying the BNE we are in effect making a tree of strings with longer and
more comprehensive strings at the top, and smaller and less comprehensive strings
at the bottom. Figure 3-5 illustrates this tree of strings.

Expression 1+2-3*4

expressionRHS -2-3*4

expression '1+2-3*4 at 1-7


add Factor '1' at 1-1
term '1' at 1-1
expresssionRHS '+2-3*4 at 2-7
addOP '+' at 2-2
addFactor '2' at 3-3
term '2' at 3-3
expressionRHS '-3*4' at 4-7
addOp '-' at 4-4
term '3' at 5-5
add Factor '3*4" at 5-7
addFactorRMS '*4' at 6-7
mnlOp '*' at 6-6
term '4' at 7-7

Figure 3-5. The tree and the outline representation of the parse of 1+2-3*4

Figure 3-5 also shows that the tree can be alternatively represented in outline
form. This outline view of integer expressions is provided when you click the More
button on the integerCalc screen, in the Parse Outline box, and this box can be
zoomed (see Figure 3-6, later in this chapter).
The application of BNF described here may seem a little imprecise. This is
because BNE in general, does not show you how to write code. The BNF must be

5. It makes sense, when you think about it, to treat subtraction as syntactically like addition. As
you know, in common programming languages, subtraction and addition have the same
precedence and are evaluated left to right.

39
Chapter 3

designed with care. The problem is that more than one BNF specification for
a language can be valid. Some BNFs make it easy to parse; other mathematically
valid BNFs create parsers with loops and ambiguity.
The ideal approach is to find a BNF specification for a language you would
like to convert to CLR code. If you must design a BNF specification, it is wise to
keep it simple. Try to find nonterminals that start with a small number of possible
symbols, as do both expressionRHS and addFactorRHS in the example shown here.
Find nonterminals that reduce to a single, smaller nonterminal, with an optional
trailer, such as expression and factor.
Another key concept in the BNF is the way in which we support parenthe-
ses. It does leave one hole in the BNF considered as a design for coding, which
we do need to address. Parentheses, fortunately, are parsed at only one point.
To support nested expressions such as «(1+1)*3)/4, we declare, in the last pro-
duction term : = INTEGER I ( expression ), that a term can be either an integer
or a complete expression surrounded in parentheses. While this nicely plugs in
recursion to any level, it does have a flaw: the right parenthesis inside the pro-
duction cannot be confused with the right parenthesis inside the expression. In
term : = INTEGER I ( expression ), expression can contain right parentheses.
We need to balance the parentheses to the left of the expression. We handle
this with straightforward code (see the procedure findRightParenthesis). For
example, in «2+1)*3)-5, the leftmost parenthesis is not balanced by the first right
parenthesis, but by the second right parenthesis. Now, in a production compiler
it may not be feasible to do this, because it would involve a lookahead, which
would cause multiple reads to input code. This can be handled, however, by using
a buffer or cache, and in this example, this consideration is not important.
So, you've seen that designing a usable BNF can be a little bit tricky, but you
can use the technique described here to avoid headaches. The key is to write some
sample parsers, as you'll see in the next section.

Transformation of BNF to Code


In order to make the code as readable as possible, we assign a private method,
returning True or False, to each nonterminal.
These methods are provided with four parameters: the current index (not in
the raw source code, of course, but to our table of scanned tokens), the table of
tokens, the data structure (usrRPN) for the output object code, and an end index.
The end index is normally the number of tokens in the source code, but, in this
case, will be the position of a right parenthesis when we are parsing a term sur-
rounded by parentheses.

40
A Compiler Flyover

Extending Scanning and' Parsing to Threads


Because of the simplicity of our requirements, the table of scanned tokens is
a simple, static data structure, not in the Visual Basic sense of being in static
memory, but in the sense that we know, at parse time, that this structure is
complete.
For a larger project, consider making both the index to scanned source and
scanned tokens properties of an object that hides the details of how tokens are
retrieved. This object may very well be "lazy," using the scanner logic described
here not in a loop, but one token at a time. Such an object can be used with the
.NET threading model to run the scanner and the parser side by side, with the
parser chasing the scanner in its own, separate thread.
This is very cool, but it's overkill for simple languages, including the language
of integerCalc, the example used in this chapter. Solid design that avoids cool-
ness forms a basis for extension to coolness.

The mission of each nonterminal recognizer is to move left to right through


tokens, trying to parse lower-level nonterminals, reporting True on success and
False on failure. By design, we always expect the intIndex parameter of each rec-
ognizer to leave intIndex one token beyond the last token of the source code that
corresponds to the nonterminal.
Take a look at the simple expression method:

Private Function expression(ByRef intIndex As Integer, _


ByRef usrScanned() As TYPscanned, _
ByRef usrRPN() As TYPrpn, _
ByVal intEnd Index As Integer) As Boolean
If Not addFactor(intIndex, usrScanned, usrRPN, intEndIndex) Then Return(False)
expressionRHS(intIndex, usrScanned, usrRPN, intEndIndex)
Return(True)
End Function

Since an expression must start with an add Factor, if expression fails to parse
an add Factor in its first step, it returns False. But it then can call expressionRHS,
which returns a Boolean value, as a subroutine and ignore the result. As you can
see in the BNF rule for this procedure, we do not care if the RHS expression fails
to appear; it is "gravy."
Suppose the expression in this example is passed something simple like 1+1.
intIndex is 1; usrScanned contains the three tokens 1, plus sign, and 1; usrRPN is

41
Chapter 3

empty; and intEndlndex is 4. The first line of code checks for an add Factor at
intlndex (at 1) and increments intIndex past the largest add Factor it finds at
index 1, which is the number 1. It cannot go any further because an add Factor is
either a term or a term followed by an addFactorRHS, which starts with a multipli-
cation operator, and a multiplication operator is not found. Therefore, add Factor
returns to the second line, which calls expressionRHS. This is successful at finding
the expression RHS +1, and as a result, it increments intlndex past the end of the
expression RHS, setting it to 4, the end of the expression.
Two lines of code do all this work (take a look at the source), because they
both call routines that call other routines in a rather deep nest.
An obvious question at this point would be, "Well, there might not be an
expressionRHS after the add Factor, but what if there is garbage and line noise?"
Take a look in the source, above the expression method, and at the method
parseExpression. It contains a check of the index (named intlndexl here) passed
by reference among the compiler procedures when the top-level recognizer
expression claims to be done. We report an error if that index is less than the
length (token count) of the source code.
Read the remaining source code to confirm that this code works. Also, note
the support routines, not only findRightParenthesis, described already, but also
checkToken and genCode.
checkToken is our workhorse interface to the scanner data structure. It is
overloaded because it has two jobs. The first is to check for any integer and
return its value by reference to the caller, and the second is to check for a speci-
fied string operator or a string parenthesis. checkToken enforces an important
rule, where a grammar category, like the expression, is something capable of
appearing on the left side of an expression:

• On recognition of a grammar category, the parse index must always point


one token past the last token corresponding to the grammar category.

This is because a grammar category always ends with some specific token.
Once checkToken has confirmed that the expected token (any integer, a specified
operator, or a parenthesis) occurs at the current position of usrScanned as
indexed by intlndex, it can increment intlndex. This enforces the general rule.
In the much larger parser for the quickBasicEngine compiler we will build in
later chapters, this rule takes on added importance, because we want to report
precisely error and other information that pinpoints offending, or merely inter-
esting, source code.
Therefore, here and in quickBasicEngine, a single interface is always used to
the scanner data structure, and all code makes a blood oath never to look at the
scanner data structure without going through checkToken.

42
A Compiler Flyover

Enforcing the Rules


Rules about access are easy to follow for a single programmer responsible for
an entire compiler. They are harder to enforce on a team. And a common fact
of life in programming is the programmer, working on deadline, who uses her
"own" code to access a data structure in her "own" way, causing her "own" bugs.
Unfortunately, the technical leaders most familiar with the damage that this
can cause are most often soft-spoken philosopher kings, who don't want to lec-
ture the team on checkToken-style routines that are supposed to be used to
access their associated data structure.
But as we will see in the next chapter, use of object-oriented design makes it
much easier to naturally enforce these types of rules.

Finally, genCode generates the code. For now, think of it as a black box,
because understanding genCode requires us to move on to our third and final the-
ory, in the next section.
But first take a break from slogging through the theory and run integerCalc
again. Click More and evaluate a complex expression. Look at the list box in the
middle of Figure 3-6. It presents the parser results as the list of nonterminals
parsed, which, as shown earlier in Figure 3-5, is just another way of represent-
ing a tree.

addFac~orR~S "/(26-9)W a~ :4-


expre~~~on ~~ parenthe~e
expressl.on "26- 9" a
expre!l!lionRHS
addFac~or
~erm

addOp n_w
addFac~or "26"
'term U26"
m·.llOp "In at 14- 4
exp=e~!I~on 1.n parentheses n ( ( ....

Figure 3-6. The list box shows the nonterminals as a nested outline, because unlike
scanning (which scans for nonoverlapping tokens), parsing looks for nonterminals
and terminals that nest and overlap each other.

43
Chapter 3

The list box is an indented outline because, unlike the tokens produced by
the scanner, the grammar symbols nest. Everything in our simple language is an
expression. Expressions consist of addOp, add Factor, expressionRHS, and other ele-
ments. Therefore, a tree-like display is best.

Interpreters and RPN


If we were compiling for a chip in the embedded systems world, or for the
Reflection object and the CLR, we would be finished. However, for later chap-
ters, we need to understand some of the ideas behind the CLR and software
emulation and interpretation of code.
Since the CLR is not a classic interpreter, but rather a large set of object-
oriented JIT compilers, it is not slow. The CLR is independent of underlying
hardware, and it meets its goal of write once, run anywhere. Reverse Polish
Notation (RPN) underlies many of the ideas implemented in the CLR.

RPN Construction
Most logicians and mathematicians use, instinctively, the language that is com-
piled by integerCalc, where operations appear between operands. However, in
Poland before World War II (a period in which that country enjoyed brief free-
dom from Russians and Germans, and flourished as a result, as it has flourished
since the end of communism), a group of logicians discovered a more elegant
notation, which is named in their honor. 6 In this notation, you just write the
operands before their operator. This means that 1+1 becomes 1,1,+, where the
comma separates the operands.
You never need parentheses! 3*(4+1) is 3 (write operand), 4 (go to the next
operand, just noting the presence of multiplication), 1 (go to next, again remem-
ber add), + (the add is next, obviously since 4 and 1 use it), and * (we're finished).
Early computer designers realized that evaluating Polish expressions is much
simpler than evaluating non-Polish, or infix, expressions. Infix expressions (unless
you use a careful BNF structure, as designed in the previous section) can result in
complex code, which needs to move back and forth in source to finish its job.
Suppose you have a table, accessible only on one end by means of only two
operations: push will add an entry to the top of the table, and pop will remove the

6. One reason guys like Jan Lukasiewicz (the inventor of Polish notation) and Alfred Tarski (the
leading Polish logician before WWII) should be honored is that some of them were interned
and suffered during the war, along with other smart people, who totalitarian governments
don't like. Nazis appear to have been, among other things, guys who flunked math.

44
A Compiler Flyover

most recently pushed entry. 7 Perhaps surprisingly, this simple gimmick handles
parentheses logic well, because parentheses in infix expressions essentially pri-
oritize the contained operators, making operators of lower precedence wait until
the parenthesized operators complete.
Take a look at genCode, in the Parser region of integerCalc. This builds an
RPN expression in the usrRPN data structure. It is passed an enumerator of type
ENUoperator, which can have the values add, subtract, multiply, divide, and push.
It is also passed an operand that is zero for all operators, except push.
When the parser recognizes a term that is an integer, in the method named
term, it calls genCode to append a push operator and the integer value of the term
to the end of the usrRPN data structure. usrRPN is not a stack, but rather an array
that represents the RPN of the expression to be evaluated.
When the parser recognizes the right side of an arithmetic operation, the
operands have already been "pushed" by the lower-level methods that recognize
the nonterminals in the operation. So, all the parser needs to do at recognition
time is call genCode to append the right operator (with a dummy and unused
operand) to the end of usrRPN.

Stack Use
Now, take a look in the Interpreter region of integerCalc, at the interpret Expression
method. It declares a stack as a local variable of type Stack, and then uses a very
simple For loop to index left to right through the usrRPN data structure.
When the interpreter "sees" an operator of type Push, it uses the push method
of the stack object to place the value on the stack. It does so in the pushStack
method using a Try .. Catch block, because at this point, we are leaving our code
and asking a system facility to accomplish something for us. We need to make
sure the external facility succeeds.
In the code for this book, I will always use this rule:

• When using a facility whose code is outside your code, check its result.

Just because you're paranoid doesn't mean you aren't being followed.
When interpretExpression sees an opcode,8 interpret Expression must "pop"
the two operands and, using the facilities of Visual Basic .NET, perform the oper-
ation. The only complexities here are that we need to check for stack underflow,

7. Many ViSUal Basic .NET and C# developers will be familiar with the Stack collection now
available, but it was quite simple to make stack-like arrays in older Visual Basic versions, or to
use collections as stacks.
8. This opcode is known as a zero-address opcode in some contexts, because the opcode, unlike
a typical Pentium opcode, does not need to find anything in memory. Its operands are
already in the stack.

45
Chapter 3

and we need to properly order the operands. For example, stack underflow
occurs in the (invalid) sequence of commands push 1, add, which doesn't spec-
ify what to add 1 to.
If the parser phase has no bugs, it will always generate a correct RPN expres-
sion in usrRPN, and no valid expression will cause stack underflow (1,+ is invalid).
However, we are paranoid and being followed, as noted earlier, so another rule is
as follows:

• Never be reluctant to add checks, in major subsections of code and objects,


on the work of your other sections, even if you wrote them yourself (you
might be following yourself).

Therefore, in the same manner as pushStack, popStack is in a Try .. Catch


block, which mostly will catch stack underflow, if the parser is changed in
a buggy fashion.
We also need to remember that operands in the stack for division and sub-
traction (but not addition and subtraction) will be out of order, and we cannot
code popStack-popStack or popStack/popStack. Since 1-2 will translate to the RPN
1,2,-, popStack-popStack will actually calculate 2-1!
This is an annoyance if, like me, you do not like temporary variables and
prefer to call functions for values directly. But since the Stack object fails to pro-
vide a way of exchanging stack entries, we bite the bullet on this one and use
temporary variables in interpret Expression for the nonsymmetrical operators-
subtraction and division.
Also note that interpret Expression contains a virtual or potential bug in the
way it performs addition and multiplication. A virtual bug is a bug that might be
activated by a probable modification to code. Here, the virtual bug is the fact
that in using function calls of popStack without temporary variables, we execute
addition and multiplication in the wrong order: the user codes 1+2, but we exe-
cute 2+1; the user codes 3*4, but we execute 4*3. This is a virtual bug because it
won't happen unless the code is changed to operate with real numbers with
signs. Then certain addition operations will be subtraction, and the result may
differ if either operand is small. Therefore, in a production compiler, addition
and subtraction should use temporary variables and execute the operation in
the user-specified order.
Run integerCalc again and click More. Type a reasonably complex expres-
sion and click Evaluate. You'll see the expression in RPN, as shown in Figure 3-7.
The More display also shows how a stack is used, as shown in Figure 3-8.

TIP If the display is too fast, you can check the box labeled Replay, click
Evaluate again, and then use the Step and Back buttons to review the steps.

46
A Compiler Flyover

push 4 Push term to stack


push 15 Push term to stack
add 0 Factor
push 33 Push term to stack
multiply 0 Term
push 1 Push term to stack
l.vlde 0 Tert1l subtract 0 Factor
push 26 Push term to stack
push 9 Push term to stack
subtract 0 Factor
divide 0 Term

Figure 3-7. The RPN box of the More display shows how the parsed input
expression has been converted to RPN.

push 23 PushTterm to stack


push 3 Push term to stack
push 1 Push term to stack
subtract 0 Factor
multiply 0 Term ~
1~3
Figure 3-8. The right side of the More display shows how a stack is used to evaluate
an arithmetic expression in RPN.

Stacks: Great, Wonderful, or What?


Stacks are great but not wonderful, or perhaps vice versa.
Whereas hardware designs exist based on stacks, especially for programmable
calculators, reduced instruction set computing (RISe) designers tend to dislike
stacks, because it is harder to optimize stack code.
The most famous example is lazy Or and lazy And.
In RPN, a And b becomes a, b, And. The problem is that an RPN-based machine
cannot easily recognize, without more code, a False value of the variable a, which
means that there is no need to evaluate b. A corresponding problem exists with
the Or operator. In a Or b when a is True, there is no need to evaluate b.

47
Chapter 3

Perhaps because of this limitation in simpler interpreters, many Basic compil-


ers have enforced the rule to evaluate both a and b in both And expressions and
Or expressions. This can be an important consideration whenever, in legacy
code, the b expression is a function call that has side effects, such as actually
updating a database.
This problem has been elegantly addressed both in the Visual Basic .NET lan-
guage and in the CLR by the new operators AndAlso and OrElse. a AndAlso b
is False when a is False, and b is not evaluated in this case. c OrElse d is
True when c is True, and d is not evaluated here. Support exists in the CLR,
and I urge you to abandon the old operators And and Or, because code using
AndAlso and OrElse is more reliable and efficient.
Stack code as encountered in the CLR can be tricky to debug. Sometimes the
debugger must build a complex mental model of stack structures of a great
deal of depth. The more step-by-step nature of register machines makes their
object code somewhat easier to follow. But it does appear from the software
history record that stacks make for rather reliable architecture, not prone to the
security flaws of register machines. This is because stacks avoid unnecessary
temporary variables, which are a security exposure.
The Tandem architecture used stacks to provide a notably highly reliable mini-
computer still in use in banking. In general, Algol and later C programs were
more reliable than Fortran code, all other things being equal, due to the use of
stacks in the runtime of both Algol and C.

Summary
This chapter has introduced three core theories that you can apply to developing
compilers for the CLR: scanning and regular expressions, parsing and BNF, and
interpreters and RPN.
We've had an almost complete flyover of the entire process of crafting a small
compiler. Although this compiler is of limited practical use, it could be expanded
to parse and evaluate business rules.
At this point, we haven't addressed compiling to the CLR, because witness-
ing the internals of a very small runtime is excellent preparation for Chapter 9,
where we send some object code for Quick8asic to the CLR.
The next chapter describes the front-end scanning and parsing of the com-
plete quickBasicEngine example, which scale up from the methods demonstrated
in this chapter.

Challenge Exercise
As supplied, integerCalc calculates only with integers. In order to understand
how it works, consider updating it to calculate with real numbers.
48
A Compiler Flyover

Your most important task will be to change the lexical analyzer of integerCalc
to scan real numbers. This is the method named scan Expression in the code.
For best results, construct a regular expression that correctly scans all real
numbers, where a real number consists of the following parts:

• An optional plus or minus sign

• An optional sequence of decimal digits

• An optional decimal point

• An optional second sequence of decimal digits to the right of the deci-


mal point

• The letter e (uppercase or lowercase)-for mad scientists and disturbed


engineers, we need to support floating-point expressions for very large
and very small numbers, such as 1,000,000 (.le7) or .0000001 (.le-6).

• After the letter e, an optional plus or minus sign

• Then the value of the e exponent, which you can think of as the number
of positions an implied decimal point, located to the left of the leftmost
nonzero decimal digit at the beginning of the floating-point number,
should move. By default, and if the letter e is followed by a plus sign, the
implied decimal point moves right. If the letter e is followed by a minus
sign, the implied decimal point moves left.

Since the lexical analyzer represents the source code in the USRscanner data
structure, and since scanned values are represented in an object, you probably
do not need to alter anything other than comments in the regions named Scanner
and Parser. The interpretExpression procedure needs some modification in the
way it performs arithmetic, since, as delivered, it uses integers.
When you complete this challenge, you will have experienced modifying
a simple front-end compiler in the .NET platform, and you will have a full calcu-
lator for numeric expressions.

Resources
For more information about compiler theory, refer to the book Compilers:
Principles, Techniques and Tools, by Alfred Aho, Ravi Sethi, and Jeffery Ullman
(Addison-Wesley, 1985). This is the famous "dragon" book, which shows the
compiler developer, armed with theory, defeating the dragon of complexity.
While academic in its tone, it does constitute an excellent reference. In partic-
ular, it contains information on the use of lexx and yacc.

49
CHAPTER 4

The Syntax for the


QuickBasic Compiler
The use of Cobol cripples the mind, and its teaching should be regarded as
a criminal offense.
-Edsger Dijkstra

It is practically impossible to teach good programming style to students that


have had prior exposure to Basic; as potential programmers, they are men-
tally mutilated beyond all recognition.
-Edsger Dijkstra (in a foul mood)

THE LATE, HERO computer scientist was just wrong about Basic. Dijkstra's com-
ment is academic sociology at its worst. It creates the illusion that programming
skill derives from the use of politically correct platforms and languages. l Dijkstra
was wrong because Visual Basic is Turing-complete, and it has a formal and sen-
sible syntax. Visual Basic is Thring-complete because you can use it to write any
program, as long as you disregard resource consumption.
I would revise Professor Dijkstra's aphorism. The use of Basic or Cobol as
representative of a good programming language in and of itself cripples the
mind and rots the teeth because Cobol and Visual Basic preserved (until Visual
Basic .NET) some standards and practices created in the Fortran era, which sim-
ply did not allow for effective problem breakdown. This was less a scientific fact
and more a result of a management illusion that programmers should merely
code specifications provided by the "real" experts, and not factor the problem
into subroutines, functions, and objects. Much sloppy programming results from
this false view of the field and the low self-esteem it creates in programmers, who
believe that a disreputable language permits mindless coding. In actuality, the

1. In an interview, Peter Neumann, long the hard-working modemtor of the comp.risks news-
group, told me that Dijkstm struggled with depression most of his life. Many bright people
are depressed because they are powerless to stop other people from making mistakes. Dijkstm,
unlike many successful corporate MIS types, never restrained himself from speaking his
mind. His attempts at constructive criticism sometimes bothered people who had heavily
invested in a paradigm Dijkstra did not like. Paradoxically, all who knew Dijkstra personally
said he was easy to get along with.

51
Chapter 4

reverse is true: we should compensate for the deficiencies of the language by


mindful coding.
In the 1970s and after Dijkstra made these comments, Basic compiler devel-
opers added structured constructs to the language, and commencing with Visual
Basic 4, Microsoft has been adding object-oriented tools.
Dijkstra said and wrote a lot of things, all of which are thought -provoking,
but not all of his views have stood the test of time. One strange aphorism that
still holds is that "computing science is no more about computers than astron-
omy is about telescopes."
Dijkstra meant that it's a mistake to focus on tools rather than the job at hand.
Don't forget that the computer and the programming language are your telescope,
and the stars are the user's problem and your solutions. The programmer's job is to
bring the stars down to earth.
Basic has a reputation as being vague in syntax and not formalizable, as are
more recent languages like Java and c++. As you'll see, this isn't so. Underneath
its clunky, wordy, and keyword-intensive syntax, Basic can be completely and
formally specified using Backus-Naur Form (BNF), and a tool for analyzing BNF
can be written in Visual Basic itself. This brings the stars down to earth, as long as
we cease fussing about the deficiencies of the telescope, stop hacking at it, and
grow up and get to work. I will also show you how to convert BNF to Extensible
Markup Language (XML), which gives another sensible view of BNF syntax.
In this chapter, I will discuss the rules for coding BNF by actually applying
an analyzer, the bnfAna1yzer program, to the syntax for BNF itself. Then I will
describe the construction of the BNF for our version of QuickBasic. We will then
run the syntax of our QuickBasic through an analyzer and examine the output.
We need to verify that it is analyzed without error and forms a solid basis for the
compiler we will start to build in Chapter 5. As a summary, we'll look at eight
guidelines for effectively developing BNE This chapter will conclude with a spe-
cial section on bnfAna1yzer internals, best read after you've read Chapters 5
through 8.
BNF is a valuable tool for specifying sensible .NET languages. Don't skip this
chapter. If you propose to design a language for business rules, text processing,
or making home movies, always create a solid BNF as your detailed require-
ments analysis.

A Tool for Analyzing BNF


In this chapter, we'll use a program called bnfAna1yzer to load and analyze the
syntax of QuickBasic, expressed in BNE The bnfAna1yzer executable program is
available from the Downloads section of the Apress Web site (http://WfM.apress.com) .
You'll find the code in the egnsf/bnfAnalyzer folder.

52
The Syntax for the QuickBasic Compiler

The best way to understand the examples in this chapter is to run bnfAnalyzer
while you're reading it. As long as you have the Visual Basic 6 runtime on your
machine, you'll be able to run the program. To compile the source provided at
the Apress Web site, you'll need Visual Basic 6 Enterprise or Professional, or if
you have the Learning edition, you can organize the two projects shipped into
a single project.
bnfAnalyzer reads a text file containing the BNF grammar of a language. It
analyzes the BNF specification, finding many errors that would prevent you
from using the specification to write, or automatically generate, a compiler or
that might cause serious bugs. This tool produces a language reference manual,
which includes the following:

• A list of the nonterminals, which are the categories of the language (such
as expression in QuickBasic) that need to be defined in terms of smaller
sequences

• A list of the terminals, which are the categories of the language (such as
stringin QuickBasic) that do not need to be broken down further

• The rules for forming language constructs as a numbered outline or as


an XML tag

You can use this tool to analyze your own .NET language. bnfAnalyzer uses
many of the compiler techniques discussed in Chapters 5 through 8 of this book.
A final section of this chapter, "bnfAnalyzer Technical Notes," will discuss this topic,
but I suggest you read that section after you've studied Chapters 5 through 8.

Why Is bnfAnalyzer a COM Executable?


You can run the bnfAnalyzer.exe file on any platform that has Visual Basic
COM or most versions of Office. You can compile it with your edition of Visual
Basic 6.
bnfAnalyzer is a COM executable because my .NET development laptop was
resident at the pawnshop for a brief time while I was writing this chapter. I used
Visual Basic 6 on myoId system to write a somewhat nomeusable product (for
which you can get the free source at the Apress Web site) that bases its objects
primarily on complex collections, as described in the last section of this chap-
ter. This product will run and compile on modern systems, since it uses no
special or legacy features.

53
Chapter 4

Analyzing and Coding BNF


BNF can be specified as a formal language, although it is not a programming
language. As you saw in the compiler flyover in Chapter 3, BNF does not show
the computer how to parse. Instead, it simply declares, formally, the valid con-
structs in the language. Those constructs might be the familiar sentence, noun,
and verb of English or the statement, expression, and identifier of Visual Basic.
BNF is a powerful notation. For example, any regular expression can be recast
in BNF, although the reverse is not true, and BNF is far more readable than regular
expression notation.
BNF is specifiable in BNR This means it is one of those somewhat confusing
closed systems: a mathematical concept that can be applied to itself Gust as a data-
base can record all databases in a shop's inventory).

BNF Syntax
Figure 4-1 shows the syntax of BNR It is available in the fIle named bnfAnalyzer
test 6 (BNF of BNF) .txt that comes along with the downloaded code for the
analyzer.

bnfGrammar : = production +
production . - [ nonTerminal ":=" productionRHS
(NEWLINE I EOF)
production : = NEWLINE ' Allows for empty lines
nonTerminal : = IDENTIFIER
productionRHS := sequence Factor [ sequenceFactor
sequenceFactor : = mockRegularExpression
[ alternationFactorRHS ]
mockRegularExpression : = mreFactor [ mrePostfix ]
mreFactor : = nonTerminal I
UPPERCASESTRING I
STRING I
"(" productionRHS ")"
"[" productionRHS "J"
mrePostfix : = "*" I "-"
alternationFactorRHS ,- " I " mockRegularExpression
[ alternationFactorRHS ]

Figure 4-1. The syntax of BNF

Now, this is confusing. Figure 4-1 shows the syntax of BNF, a formal syntax
for programming languages, although I just told you that BNF is not a program-
ming language. To make things worse, Iam presenting the rules of BNF in BNF,
as if you knew BNF all along or in a former life.

54
The Syntax for the Quic/cBasic Compiler

BNF isn't a programming language, but all programming languages are formal
languages (but not the reverse). The syntax of all formal languages, by definition,
can be specified in a formal notation like BNE
Let's walk through some of the rules to get a feel for the use of BNE The first
line says, when read properly from left to right, that a bnfGrammar is one or more
productions, where a production specifies possible components of a grammar
category. (The plus sign in production + means one or more repetitions, just as
it does in regular expressions.)
Okay, cool. What's a production?
Glad you asked. Go to the second line. A production is normally a nontermi-
nal, followed by a colon and an equal sign (:=), followed by a production on the
right-hand side (RHS), followed by either a newline character or end of file (BOP).
Note that the nonterminal, :=, and RHS are optional, which means that blank
lines are allowed.
A nonterminal is an identifier as seen in Visual Basic. An RHS is more com-
plex. It is a sequence factor, perhaps followed by another sequence factor.
Okay, what's a sequence factor?
A sequence factor is a mock regular expression, followed by an alternation fac-
tor RHS. A mock regular expression (so-called because it isn't a full-scale regular
expression) is a simplified regular expression that consists of a mock regular expres-
sion factor (mreFactor) followed by a mock regular expression postfix (mrePostfix).
An mreFactor can be one of several things: a nonterminal, a completely uppercase
string (which, in our language, represents a terminal symbolically), a quoted string
using Visual Basic conventions, a parenthesized production RHS, or a left bracket.
As this example shows, you read BNF by following branches of a tree (and if
you examine the code of bnfAnalyzer, you'll see that it represents the source BNF
in a tree data structure). Each branch is less an instruction than a timeless law,
which is always true no matter if the cows come home or not, as we say on the
farm. The comfort of this is that the rules never change.
But let's step back a bit.
Like a programming language, BNF has operators. These include the alterna-
tion stroke I and the mock (as in not complete) support for the regular expression
operators asterisk (*) and plus (+). Oddly, when white space occurs on the RHS of
a production, between two grammar categories, it is an operator that specifies
that the material on its left is followed by the material on its right. Also, a group-
ing of square brackets is an operator, which specifies that the material it contains
is optional.
As in a programming language, these operators have precedence. The
sequence operator (consisting of blanks or white space) has lowest precedence,
followed by the alternation stroke, and then the mock regular expression opera-
tors. However, parentheses can be used to group operations and change this
precedence. For example, if in your language, an a nonterminal consists of a b or
the sequence c d, the production is a := b I ( cd). The square brackets change
precedence in the same way, while also specifying that the bracketed material is

55
Chapter 4

optional; a:.. b I [ cd] specifies that an a is a required b or optionally c and d.


Note that this production allows a to be a null string.

Rules for Coding BNF


The following lexical rules apply specifically to coding formal BNF as far as the
analyzer is concerned. Note that different rules will apply to BNF as supported
by other tools, including yacc on Unix.

Comments: Completely blank lines and lines that commence with an


apostrophe are treated as comments. lines may also end in comments;
any characters after the leftmost unquoted apostrophe, including the
apostrophe, are treated as comments.

Continued Hnes and new Hnes: Lines that contain individual produc-
tions (definitions) may be continued, simply by making sure that the
first character of the continuation line is a blank. A newline suitable for
the environment in which the analyzer is run (carriage return and line-
feed on Wmdows; linefeed on the Web) is a "real" newline only when it
is followed by a nonblank character or end of file. Suppose a Wmdows
newline is the NEWLINE terminal of your language. In your lexical ana-
lyzer, this would correspond to a small routine that checks for the
proper newline at the current position and advances a scan pointer.
Identlflers: Identifiers follow Visual Basic 6 conventions: starting with
a letter, they should contain letters, digits, and the underscore exclu-
sively. There is no limit to the length of identifiers, except common
sense. However, unlike Visual Basic identifiers, identifiers are com-
pletely case-sensitive, as in the case of C++. The case of the first letter
of the identifier shows its type.

• Identifiers that start with a lowercase letter are assumed to be nonter-


minals of the grammar. These identifiers must appear as defined on
the left side of at least one production.

• Identifiers that start with an uppercase letter are assumed to be sym-


bolic grammar terminals (note that strings may also be terminals).
These identifiers may not appear on the left side of a production.

• For best results, symbolic grammar terminals should be exclusively


UPPERCASE, but they may also be Proper case, with only the first
character in uppercase.

56
The Syntax for the QuickBasic Compiler

Nontenninal detlnltion: The := (colon and equal sign) operator is the


preferred production operator to separate a defined nonterminal from
its definition. The plain equal sign may also be used for this purpose,
but := is less apt to confuse the reader. Note that you should not expect
to be able to enter special characters to represent themselves in the
right side of a production. assignmentStmt : = rhs = value should be
coded as assignmentStmt : = rhs "=" value, with the second equal sign
in quotes.

Quoted strings: Quoted strings follow Visual Basic conventions (double


quotes are delimiters; internal double quotes must be doubled) and are
used to specify exact character sequences as terminals. Sadly, Visual
Basic's limitation applies: nonprinting characters cannot be specified in
strings. You should make nonprinting sequences into terminals, named
in proper case or uppercase, and your lexical analyzer should take care
of their recognition.

A Grammar Test

We'll now test (if you are following along with downloaded software) the
grammar for BNF using the bnfAnalyzer program. Open the bnfAnalyzer.vbp
file using Visual Basic 6 Enterprise or Professional and compile this project,
producing VBPanalyzer.exe. Run bnfAnalyzer.exe.
The first screen presented when you run bnfAnalyzer will be a general
announcement screen, as shown in Figure 4-2. Most of the software in this
book will include these "about" screens, which appear the first time the soft-
ware is run. Subsequently, they are available using a button and/or a menu
item labeled About.

57
Chapter 4

... bnfAnalyzer _- "iiI..., ~


sSll<Je at 3/1/2004 8: 14: 11 PI-! frOID the following cCIIIpOnent:

f~ana1yzer.displayAbout

This fo",", and application parses files containing IINl' and it pzpdDces an analysis of the BNI'
definition, including ... list of teaoinal symbols, .. list of nodto.z:ainals, and at least the
start of a reference manual for the language defined by the 1INl".

This form and application was developed starting on 4/17/2003 by:

~rd G. Nilges
[email protected]
ttp://members.sereenz.com/BdNilges

Continue

Figure 4-2. bnfAnalyzer's About screen

Then you will see the main screen of bnfAnalyzer. In the list of directories
on the left side of the main screen, find and double-click Test Files, as shown in
Figure 4-3 .

.. Backus-Naur Fotm AnalYZer


fie Tools H~

Creote Reference Stetus Reports


3 Menual 3/1/2004 8::3:54 PM ~oadln9
3/1/2004 B::3:54 PM Retrlnln
SoveS~ng9 ~~E.mlallD~~EI!!!I~~~lIilllillI"lIlI
Restore Setllngs

P Progress reports
Nontermrnals Terminals
BNFenlllyzer test 0 (zero length file) W ...
Perse Stetus
BNFenlllyzer test 1 (bl~k file) W
BNF~lIlyzerte. laW r Noreport
BNF~lIlyzerte.t 11 W
BNF~lIlyzer tes 2 (non-null comment) t
BNF~lIlyzertesl3 (1 produClJon).W
BNF~lIlyzertesl3e (1 production).tId
r Complete
report
BNFen ertest 3b 1 roduction tid

aose

Figure 4-3. bnfAnalyzer's main screen

In the list of files in the lower-left comer, find and select the file named BNF
analyzer test 6 (BNF ofBNF).txt. Click the Create Reference Manual button. After

58
The Syntax for the QuickBasic Compiler

a sequence of progress reports, you'll see the Reference Manual Options form.
Enter BNF as the language name, as shown Figure 4-4 .

.... Reference Manu~LQl>tions

ILanguage Name IBNFj

(e Text format r Place sections olthe manual in boxed comments


(note: not suitable for large files)

r XMLformat r TeL 0 lid nelude e BNF OUT£:<!

~ I dMduaf + g~ tlould ppear on ep e rn~

~ r de dS dn It bu e to production t gs

~ lent ~ndte.g th 1I1e ll~ ,DCW.te namp '" 1I1e S!wt


t:tg

P" Include non-terminoJ Index


~ Indude terminal Index

Syntox Reference

P" Include syntax


p" ',d,de""" ,h...,. whe.. """bo. "" "eo" oddOo, 10 ~e;, d",Oo",.
in the syntax
I
r Inelude formal Backus-Naur definitions

P" Indent the language reference manual


P- Always show this form before producing 1I1e reference manual
Oose (and create
Save Settings Restore Settings Cancel
manuoJ)

Figure 4-4. Reference manual options

This form allows you to tailor the reference manual. bnfAnalyzer supports
two formats for the reference manual. The default text format uses the mono-
spaced Courier New font to format the manual. An option is also provided to
create an XML reference manual.
Click the button labeled Close (and create manual) to see the reference
manual, as shown in Figure 4-5. This report can be selected, copied, and pasted
into a Notepad or Word file.

59
Chapter 4

MessAge at 3/1/2004 8:20:19 PM fro. the following component:

fmoBNPanalyzer. ml<ReferenceHanual

*** •••••••••••••••••••••••••• ** ••••••••••••••••••••••••• - ••• _---


• • I
*RE~ERENCE MANUAL rOR THE BNF •
LANGUAGE •
• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • __ t • • • • _ • •

The following are the nonte%1li.nal sy.bols of the language

Where Used

sequenceractor and alternati.onFactorRHS


Start sy.bol
sequenceFactor and alternationFactorRHS
mockReqularJb<pression
mockRequ!arExpres.ion
DonTeaoinal production and .reFactor
production bnfGr-=
productionRIIS production and areFactor

Figure 4-5. Start of the language reference manual produced l7y the BNF analyzer

Nonterminals
Scroll down the reference manual screen to examine the nonterminal symbols,
as shown in Figure 4-6. These are the grammar categories ofBNF that have fur-
ther expansions.

60
The Syntax for the QuickBasic Compiler

x
NON'I'ERMINAL S»IBOI.S

The follow1nq are the nonte>:ainal symbols of the language

I
Nonter:aina.l Where Used

alternationractorRHS sequenceractor and alternationrantorRHS


nfGrammar Start symbol
mockRegularExpression sequencerantor and alternatioDractorRHS
mreFactor aockRegularExpression
mrePostfix aockReguiarExpression
nonTerainal production and areraotor
production bnfGramaar
productionRHS production and areFactor
production (2)
sequenceractor productionRHS

Figure 4-6. List of non terminals

The first column identifies the nonterminals in alphabetical sequence. The


second column identifies the nonterminals that use the non terminal in the first
column. The third column will identify undefined terminals (not shown) that are
used but do not appear in the left side of any production, as well as start sym-
bols that are not used on the right side of any production. These are typically
the most important constructs of your grammar. For English, a start symbol
might be a sentence; a start symbol in a programming language might be
a complete program.

NOTE If a nonterminal has null in the Where Used list, it is a start symbol.
The reverse is not true. If your language defines the major construct recur-
sivelyas smaller instances of itself, the start symbol column will be blank. For
this reason, I suggest you do not define the start symbol recursively.

Terminals
Scroll down further to see the list ofterminals, as shown in Figure 4-7. Terminals
in our BNF are of two types: strings and symbols that at least start with an upper-
case character.

61
Chapter 4

T!UlMINAL SYMBOt.S

The following are the terminal element. of the language

TeDllina.l Where Osed I


arePactor

J
arel"actor
mrePostfix
mrePo.tfix
production
" [" areF4ctor
"I" mreFactor
"I" a.lternationPactorRHS
EOI' production
IDENTIFIER nonTerminal
IfflWLINE production
STIUNG arePactor
llPPERCASESTlUNG arePactor

Figure 4-7. List of terminals

Like the nonterminallist, the first column in this list identifies the terminals
in alphabetical order. Note that several strings are terminals and operators of
BNE We've also identified some terminals as symbols.
Because they start with uppercase, the identifiers EOF, IDENTIFIER, NEW-
UNE, STRING, and UPPERCASESTRING are treated as symbols in the grammar
that will be understood by the lexical analyzer. These follow the convention that
symbolic terminals should be in all uppercase characters, for maximum visibility
in a medium or large BNE
These symbolic terminals can be used to express the fact that a terminal that
will be recognized by the lexical analyzer is a potentially infinite number of sym-
bols, such as IDENTIFIER, STRING, or UPPERCASESTRING; or a nonprintable
string not expressible as a Visual Basic string; or a condition that is not a string
at all, such as EOF (at end of file.) In fact, NEWLINE, in the parser for BNF, is
both a string and a condition. When the BNF lexical analyzer (in the procedure
BNFcompile_scanner_findNewline) finds a newline, it then checks to see if the new-
line is followed by a space and an underscore (the continuation indicator), and
if so, it cancels the newline.
The lexical analyzer's mission in life, which we will revisit in the next chapter,
is to make life easy for the parser in any way it can. Here, it does this by replacing
funky terminals by simple symbols.

62
The Syntax for the QuickBasic Compiler

Recall that regular expressions represent the conditions "start of input line or
string" and "end of input line or string" using the characters caret and dollar sign.
In general, conditions and sequences of characters can be usefully considered by
the lexical analyzer as simple characters and abstracted as simple tokens.

Syntax Outline

Scroll down to see the (pardon my French) piece de resistance, or reference man-
ual outline ofBNF (or any other language expressible in valid BNF) , as shown in
Figure 4-8.

.. bnfAnalyzer .. ":'1' ~" .'

The followin9' <U:1!I the rules of the lADgu"ge

1. A Bnf Gr&mma% CAD consist of the following: I


1.1. This (repeated one or more times,:
1.1.1 A Produotion
A Bnf Gr&mmar is " start symbol
2. A Production can consist of the fo11owin9':
2.1 . Th is sequence:
2.1.1 This optional sequence:
2.1.1.1 This sequence:
2.1.1.1.1. A Non Te~inal
2.1.1.1.2. This sequence:
2.1.1.1.2.1. The string ":-"
2.1.1.1.2.2. A Production R H S
2.1.2 This set of alternatives:
2.1.2.1 A NEWLINE
2.1.2.2 An £or
A Production can appear in a Bnf Gram.ar
3. A Non Te~inal CAD consist of the following:
3.1. An IDENTIFIER
A Non Te~inal can appear in a Produotion ADd a Mre ractor
4. A Produotion R H S can consist of the following:
4 .1. This sequence:
4.1.1 A Sequence Pactor
4.1.2 This optional sequence:
4.1.2.1 A Sequenoe r"otor
A Production R H S can appear in a Production and a Hre r ..otor

Continue

Figure 4-8. Language syntax outline

Any valid BNF can be transformed into an outline of the language, suitable
as a basis for a complete reference manual. This overcomes a major, and quite
valid, managerial objection to the very idea of forming our own language for
business rules, by providing accurate documentation of the language.

63
Chapter 4

Reference Manual Display


One potential problem with the reference manual's nonterminal and terminal
lists and the outline is that they are in a fixed format, which is suitable only for
a monospaced font such as Courier New, because the format uses blanks for for-
matting. In fact, an option in the Reference Manual Options form, Place sections
of the manual in boxed comments (see Figure 4-4), will transform the material
into a boxed comment, suitable for inclusion in program text. This option works
if you prepend each line with the comment symbol of the language (such as
apostrophe in Visual Basic) or surround the comment with balanced comment
tags (such as I * and *1 in C or the XML comment tags).
If word wrapping in any of your BNF files creates an unreadable reference
manual in the display, copy and paste the reference manual into a Word docu-
ment. Set Page Setup to landscape printing and the font to Courier New. Make
the font size small enough for you to see the output properly formatted.

XML Reference Manual

We need a less proprietary and more flexible way to format the output, so we can
use Visio and other tools to create documents based on our reference manual.
XML is the best choice.
Click Continue on the reference manual outline screen, and then click Create
Reference Manual again on the main screen. This time, the Reference Manual
Options form will appear immediately, since the manual has already been parsed.
Click the XML format radio button. Make sure the first XML format option, labeled
The XML should include the BNF source as a comment, is unchecked. The other
three options-Individual tags should appear on separate lines, Add BNF source
code as an attribute to production tags, and Comment end tags with the associate
name in the start tags-should be checked. These choices will avoid replicating
the BNF source in a leading XML comment, place newlines between XML tags,
and enhance the tags with source BNE Click the button labeled Close (and create
manual) to see the screen shown in Figure 4-9.

64
The Syntax for the QuickBasic Compiler

the following component:

•••••••••••••••••••••••••••••• ** •• **.**t •• ***.** ••••••••••• I


• Backus-Naur FOEll definition of the language 8Nl"analyzer •
•••••••••••••••••••••••••••••••••••••••••••••••••••••• * ••••
.
<nonter.inal_>
<bnfGra.aar/>
roducdon/>
<nonTer.inal/>
<product1onRHS/>
equenceFaotor/>
<DockRegularExpression/>
alternationFactorRHS/>
<mreFactor/>
Postfix/>
/nonteDlinals>
tel:1llinals>
.q. colon equals >:-</Y. colon equals>
- LINE>NEWLzm</NEWLM> - -
Of'>£OF</EOF>
<IDENTlrIER>IDENTIFI!:R</IDENTrFI!:R>
PERCASESTRING>Wl'BRCASESTRING</Ol'l'ERCASESTRING>
<STRING>STRING</STRING>

Continue I
Figure 4-9. The XML reference manual screen

You can format this XML. Using Visual Studio .NET, create a new Windows
application and choose to add a new item. At the prompt, select XML file and
paste in the XML, commencing with the comment tag <!--. Click Browse with,
save the file, and select Internet Explorer. You will see the formatted XML, as
shown in Figure 4-10.

65
Chapter 4

<?xml version=w1.0· encoc!ing="utf-8" ?>


- <!--

'" Backus-Naur Form deiinit:ion of t:he lanq"U.age ENE'


"

-->
- <BNF>
- <I'lOntenninals>
<bnfGramrnar I>
<production I>
<rlonTenninal/>
<productionRHS I>
<sequenceFactor/>
<mockRegularExpression I>
<altemationFactorRHS I>
<mreFactor I>
<mrePostfix I>
</nonterminals>
- <terminals>
<x_coion_equals_>:= </X_colon_equals_>
<NEWUNE>NEWUNE</NEWLINE>
<EOF>EOF</EOF>
<IDENTIFIER>IDEN1IFIER</IDENTIFIER>
<UPPERCASESTRING >lJPPERCASESTRING </UPPERCASESTRING>
<5TRING>S11llNG</STRING>
<X_leftparenthesis_>t </X_leftparenthesis_>
<X_rightParent:hesis_» </X_rightParenthesis_>
<x_leftBrackeC>[ </X_leftaracket_>
.-Y rinhtAr::lrln.t ~l.-/¥ ';nhtAr.." ...",t ~

Figure 4-10. The formatted XML reference manual

Figure 4-10 shows the beginning of the XML reference manual, which lists
the nonterminals and then the terminals of your language. Scroll down to see
the actual BNF productions, as shown in Figure 4-11.

66
The Syntax for the QuickBasic Compiler

- <bnfProductions>
- <GS name="bnfGrammar">
- <OP name="production">
- <OP name="oneTripRepeatn>
<production !>
<lOP>
<!-- ::::nd o:::e7:::iJ;:!<.epeac -->
<lOP>

<lGS>
<!-- End. .bnfGraTI'If'.ar --:>
- <GS name="production">
- <OP name="production">
- <oP name=nsequence">
- <OP name="optiona!Sequence">
- <OP name="sequence">
<nonTerminai />
- <OP name=nsequence">
<x_doubleQuote_colon_equals_doubleQuote_l>
<productionRHS I>
<lOP>

<lOP>
<!-- End .secr.ler.ce -->
<lOP>
<!-- End cpcicna15ecr.;.er.ce -->
- <oP name="alternatives">
<NEWUNE />
<EOF />
<lOP>
<!-- End alcerr.ati'1e3 -->
<lOP>
<!-- End 3ecr.:;.er.ce -- >

Figure 4-11. The BNF productions in the XML reference manual

Note that each production will start with a GS (grammar symbol) tag and
contain one or more OP (operator) tags. Each GS tag will name the grammar sym-
bol and show its BNF (if the appropriate Reference Manual Options setting is in
effect). Each OP tag will identify the operator. For example, the first OP tag identi-
fies the production operator :=. The ending OP tag will identify the start tag in
a comment if the corresponding Reference Manual Options setting is in effect.
You can use the XML format with a large variety of formatting tools to view the
language reference. The basic text format is more suitable in simpler documents.
In this section, you've learned quite a lot about BNF, in the way in which many
programmers want to learn-by getting your hands dirty. You've learned how to
use the free tools provided with this book to get started with language design.
In the next section, you'll see how the much larger grammar for our
QuickBasic was built.

67
Chapter 4

Building the BNF for Our QuickBasic


Using bnfAnalyzer, we can analyze the BNF or our QuickBasic compiler. Figure 4-12
shows the beginning of this larger syntax, available as BNF analyzer test 5 (quick
basic).txt in the Test Files directory. Since the complete file is available for down-
load, in this section, I will restrict the detailed forced march to just three areas, to
give you a feel for what it is like to construct a large BNP The three areas we will
look at are the start of the definition (the big picture, so to speak), the important
definition of the assignment statement, and the equally important definition of
the expression.

• --- Immediate commands


immedlateCommand := sinqleImmediateCommand (":" sInqleI edlateCommand)*
slnqleImmediateCommand := expression I expltcttAsstqnment
, --- Source programs
sourceProgram :- optionStmt
sourceProgram :- sourceProqramBody
source Program :- optlonStmt logicalNewline sourceProqramBody
optionStmt :- OptIon ( "Base" ("0" I "1") ) I Explicit I ExtenSIon
sourceProgramBody := ( openCode I moduleDeflnltion ) +
openCode :- statement [ logicalNewllne source Program 1
loglcalNewllne := Newline I Colon
state ent := [Unsignedlnteger I identifier Colon 1 statementBody
statementBody := ctlStatementBody I unoonditionalStatementBody I
assignmentStmt
ctlStatementBody :- dim I
doBeader I
else I
endIf I
forBeader I
forNext I
ifl
whileBeader
loopOrWend
unconditionalStatementBody := Circle I
co ent I
data I
end I
exit I
qoSub I
goto I
input I
print I
randomize
read I
return
screen
stop I
trace

Figure 4-12. Start of the BNF for our QuickBasic

2. For the same reason a film can't show all the action and lack of action in a person's life,
a metaprogram discussion cannot reproduce, at base, the writing of each line of code, its
debugging, and its modification. Before "extreme programming," this was endured in isola-
tion by the programmer. Today it is, in extreme programming, a sort of MTV Real World or
Survivor show, with the boring parts left in, and fewer hotties overall.

68
The Syntax for the QuickBasic Compiler

Are Formal Definitions Necessary?


It is an urban legend that languages like Java and c++ have a clear syntax, while
languages like Visual Basic cannot be formalized at all. This is not true, because
were it not possible to formalize Basic, then it could not be compiled.
Indeed, the Java programmer's cubicle is decorated with beautiful syntax charts,
typically produced from formal BNE of the entire language. The Visual Basic
programmer's cubicle is, we suppose, littered with Jolt Cola empties, pawn tick-
ets, and back issues of Motor Trend. There is no law of nature that this needs to
be so.
Many compilers for older Basics were, in fact, coded in the 1970s by noble sav-
ages who were innocent of academic computer science. This was true for other
old languages as well, including the old Fortran compiler I mentioned in Chapter l.
But for the same reason different cultures do the same abstract math with the
same results in wildly different notations-from the abacus to knotted strings to
Visual Basic and C-the noble savages were able in most commercially viable
instances (including a small company operated out of a motel in New Mexico)
to produce in actuality what were formal definitions of Basic, whether on
diskettes or on paper tape. 3
I have near-zero tolerance for compilers hacked above and beyond the lan-
guage definition, as you will learn in Chapter 5, where each parser procedure
must quote the BNF justifying its code. Compilers hacked beyond the formal
definition of the language are very dangerous tools. Especially in the case of an
"inside job" (a compiler written for internal use), the developers and users rely-
ing on the compiler may develop programs and business rules that exploit
undocumented extensions and compiler bugs. This means that the compiler,
and not its formal specification, becomes the law, and the compiler cannot be
repaired!

The Big Picture

Designing a BNF is a top-down process. We start with a goal: to write a compiler


that can evaluate immediate expressions for calculator-style solutions and more
complex programs with state, just as the original Basic authors wanted to sup-
port engineering and scientific needs for a handy calculator and more extensive
modeling.
Referring to the start of the BNF analysis (see Figure 4-12), take a look at the
definition of an immediateCommand:

3. Microsoft operates a Microsoft Museum in Redmond, and while waiting for my ride after the
2001 author's event, I was able to see the paper tape Bill and Paul created for their Altair Basic
compiler.

69
Chapter 4

imrnediateComrnand := singlelmmediateComrnand (":" ImmediateComrnand)*

This declares the "what" and not the "how." It says in English, '~ immediate
command is a single immediate command, followed by zero, one, or more occur-
rences of a colon, followed by an immediate command." It is a recursive but not
circular definition, because we can tell, by just looking at it, that the immediate
command on the right side of the production is shorter than the immediate com-
mand being defined, by at least one character-the colon.
We then define a single immediate command:

singlelmmediateCommand := explicitAssignment I expression

This declares that a single immediate command can be either an explicit


assignment or an expression. An expression will support our responding to the
entry of formulae with the value of the formula, but we also, in the immediate
command mode, need to allow the user to enter variables, as in the case of the
entry Let a=4: a*32.
If you look up the productions for explicit and implicit assignment, you'll
discover that they differ only in that explicit assignment starts with the keyword
Let. As it happens, we cannot support implicit assignment in immediate com-
mand mode, since it would not be possible to tell the difference between "assign
4 to a' and "return true if a equals 4." Therefore, a single immediate assignment
must start with Let.
We then need to define a source program. Where there is more than one
production for a given nonterminal, the multiple productions are Or'd, and any
one can be satisfied by a particular input string. a: =a Ib is equivalent to a: =b fol-
lowed by a : =c.
A source program is a single option statement; a source program body; or an
option statement, followed by a logical newline, followed by a source program
body. This apparently complex definition expresses the fact that you cannot put
Basic's Option statement anywhere but at the beginning of your code. This is only
the first example of how what seems to be a rather ad-hoc rule can be formalized.
Yes, it's still ugly compared with C, which abandoned the very idea of header state-
ments, but it isn't vague.
The Option statement, in turn, can be easily defined as the sequence consist-
ing of the terminal Option and the terminal Explicit.
The definition of a source program body is more complex. This is because we
allow functions and subroutines, in addition to global definitions (which are at
the module level in Visual Basic). Unlike Visual Basic and C, we also allow open
code. Open code consists of global variable declarations (known as module-level
declarations in Visual Basic) and instructions that are part of a main procedure,
which receives control initially.

70
The Syntax for the QuickBasic Compiler

Therefore, asource program body is a series of one or more occurrences of


either open code (data and instructions) or the definitions of functions and sub-
routines, expanded much later in the BNF file.
Next, we define the logical newline, which is used at several points. A logical
newline is either a real newline or the colon, which allows multiple statements
per line. We make a mental note to use the Visual Basic .NET reserved word
vbNewline for the actual newline, rather than a ChrW function using linefeed,
because the reserved word makes the newline correspond to the runtime envi-
ronment. On Windows, it is carriage return and linefeed, which we use to rile
Unix people, but on the Web, it is linefeed.
Next, in defining a statement, we need to account for another ugly feature
of our language that might be considered unformalizable. This is the presence
of statement numbers and symbolic statement labels, which opens the entire
can of worms around the question of Go To.4
It's a snap to formalize. A statement is an optional unsigned integer or an
optional identifier, followed by a colon, followed by the body of the statement.
A body of a statement can be a control flow statement, an unconditional state-
ment commencing with a keyword, or an assignment statement.
These definitions at the start of the file give you an idea of how the BNF
is built. Let's now move to the important definition of the Basic assignment
statement.

The Assignment Statement


Figure 4-13 shows the assignment statement definition. An assignment state-
ment is either explicit or implicit. As already mentioned, it is explicit if it starts
with Let, and implicit otherwise.

I Assignment
assignmentStmt := explicitAssignment I implicitAssiqnment
explicitAssiqnment := Let implicitAssiqnment
implicitAssiqnment := IValue "=" expression
lValue := typedldentifier [ "(" subscriptList II)"
subscriptList := expression [ Comma subscriptList

Figure 4-13. The definition ofassignment

4. As early as Microsoft's release of QuickBasic in the 1980s, Go To was useless to skilled pro-
grammers except in error handling. This has been only recently fixed in the provision, in
Visual Basic .NET, of Try . . Catch •• End Try error handling.

71
Chapter 4

To understand the syntax of an assignment statement, we need to be careful


not to make the grammar ambiguous with respect to the equal sign, because
unlike C, we do not have different terminals for assignment and equality. We need
to avoid allowing expressions to occur on the left side of the assignment; a+ 1=b+1
is math, not code.
Therefore, I have swiped an important concept from C, that of the LValue, or
location value. An LValue refers to, or can be made to refer to, a named piece of
storage, whether the name is a simple identifier or an identifier along with its
subscript.
All LValues are expressions, but not all expressions are LValues, because the
value of any complex expression like a+l refers to a nameless quantity that appears
on the stack at runtime only when it is needed. Therefore, an LValue is an identifier
with type, optionally followed by a subscript list. We need to use the annoying
nonterminal typedldentifier, because this is an identifier aook up its definition in
a
the file) that is optionally followed by a special character indicating its type. do
not use this feature, which produces ugly variables such as uglyString$, where the
dollar sign implicitly says "uglyString is a string," and it is my sad duty to imple-
mentit.)
Finally, let's proceed to the way in which we define expressions.

Expressions
We need to define the precedence of operators from the very low-precedence
operators And and Or to the very high-precedence multiplication, division, and
exponentiation operators, and we need to account for parentheses.
Figure 4-14 shows how we define expressions. There are a lot of "gotchas"
here, so pay attention.

72
The Syntax for the QuickBasic Compiler

, -- - Express~ons
express~on :~ orFaccor ' orOp express~on )
orOp :- Or
orOp := OrElse
orFaccor :- andFaccor andOp orFaccor
andOp :~ And
andOp : = AnclAlso I
andFaccor : - ' oc I nocFaccor
nocFaccor := ~keFaccor 'nocFaccorRHS
nocFaccorRP.S : - L~ke likeFaccor ~nocFaccorRHS
11keFaccor := concacFaccor ' ~keFaccorRHS;
ikeFaccor~~S :- "&" concacFaccor .1~keFaccorRHS
concacFaccor :- relFacCor :concacFaccorRHSI
concacFaccor~~S := re Op relFaccor concacFaccorRHS
relFaccor :- addFaccor relFaccorRHS
relFaccor~~S :- relOp relFaccor • relFaccorRHS
addFaccor := mulFaccor addFaccorRHSj
addFaccorRRS : - muIOp mulFaccor 'addFaccorRAS
mulFaccor := powFaccor .mulFaccor~~Si
mulFaccorRHS : = powOp powFaccor :mulFaccorRHS,
powFaccor := (" ... " "-") cerm
term : - UDSl.gnedNWtber
scr1DO
Value
True
False
funcc10nCall
( express10n
funcc~onCall funcc~onName "I" express 0 LisC ")"
tuncc~oa~ame :- Abs Asc Ce11 ChI I Cos Eval Floor Inc I
I~f Isnumer~c Lbound Lcase Lefc Len Log
Max ~n M1d Replace I R1ghc Rnd Sin I
Sgn Scr~ng Tab I Trim
Ubound Ucase I Uc~l~cy
'ns1gnedNumber:= ( Uns qnedRealNumber Uos1gnedInceger
: num:'ypeChar :
cyped!der.c1f~er :- 1dentif~er : cypeSuff1x .
CypeSUtf1X :- numIypeChar CurrencyS)~ol
r.umIypeG:'lar : = PERCENT lIHPERSAND EXCLAHAIION POtJNDSIGN
~dent1f1er := ~ecter :eccersNumbersUnderscores
sCI1nq := Double~~oce Anych~ngExcepcDoubleQuoce DoubleQuoce
re:~Op "<" ">" "=" I tt<z:d t "'>=" "=" "<>"
addOp := n_" I"_"
mulOp "'''' 1'' ' '' ' '' \ " "Mod"
powOp := ".~n l nA"

Figure 4-14. Expressions definition

It is one thing to define an abstract BNF that validly expresses the set of pos-
sible sentences in a language, but early compiler designers found that it is quite
another to design BNF from which actual debugged code can be generated,
whether by hand or by using a parser generator. As I mentioned in the previous
chapter, you need to design a real-world BNF with care.

73
Chapter 4

Expression Operators

In defining the typical binary operator of an expression, such as the Or operator,


we must be careful to avoid left recursion. We cannot define an expression in the
obvious way-as expression := expression Or Expression-because to get started
with parsing an expression, we would need to parse the lowest precedence pro-
duction (Or), and to do this, we would need to parse an expression. This would
cause an immediate loop, since we would always start parsing an expression in
the same place!

NOTE Remember that looping isn't recursion. Recursion is applying the same
code to a smaller integer or a smaller set of data. Looping is getting stuck so
that no matter what your code does, it re-creates the same state, including the
position in the parse.

Therefore we could define (but we won't) Or as an or factor (an expression


that contains no Or operators), followed by an Or, followed by an expression. This
avoids left recursion because an or factor can be parsed, and by the time we get
to a recursive parse of the expression, we know we have eliminated some tokens-
at a minimum, an Or operator. Because the length of the input string is smaller,
this is recursion and not looping.
Typical code generators will generate valid code for Or operators in this sce-
nario. However, they will generate very bad code using this same approach for
any operator that is not symmetrical, such that a Op b equals b Op a, and fully
associative, such that a Op (b Op c) is the same as (a Op b) Op c. 5 In particular,
they will generate code that evaluates subtraction and division operators to the
right of other operators of the same type, which will give wrong answers.
Therefore, we instead decide on the same pattern for addition and subtrac-
tion operators. They start with their factor (the longest expression type that doesn't
contain the operator or any operator of lower precedence) and end with an RHS
consisting of the operator, followed by a recursive and a smaller instance of the
left-side production.
For example, we need to parse a-b-c-d as a, followed by the subtract RHS
-b-c-d. We need to parse the subtract RHS as the minus operator, followed by

5. Some of my students complain I use math too much. This isn't math. It's symbolic logic. That
is even worse. However, symbolic logic, unlike traditional math, does not require an extensive
background to understand. To understand college calculus, you need to have succeeded at
four years of high school math. To understand these formal notations, you need only read
this book, do the examples on your computer, and, like Billy Crystal's Second Gravedigger in
the Kenneth Branagh film of Hamlet, "cudgel thy brains."

74
The Syntax for the QuickBasic Compiler

a subtract factor, followed by the smaller subtract RHS -c-d. This is the best
sequence because it simplifies generating subtraction operators left to right, as
you'll see in Chapter 5.
Were we to parse a-b-c-d as a factor, a minus sign, and an expression, some-
thing unexpected would happen. In the first parse, the a would be the or factor,
and the expression would be b-c-d. We could generate code to get the value of a,
but we could not generate code to subtract! That's because we first need to gen-
erate code to calculate the value of b-c-d. But if we generate code to calculate
b-c-d, the two subtractions to the right of the first subtraction will be generated
before the leftmost subtraction, and this is wrong. The second subtraction will
happen after the third subtraction. Suppose a=l, b=2, c=3, and d=4. Properly
evaluated, a-b-c-d is -8. But if the subtractions are executed right to left by an
incorrect parser, c-d is calculated first, giving -1. This value is then subtracted
from b to give 3, since the value of c-d is negative. This is then subtracted from
1, giving-2!
Similar problems happen in real and integer division. Basically, subtraction
and division are left and not right associative; therefore, for the productions cor-
responding to these, we define the RHS as starting with the operator.
The attractive feature of defining the RHS of a binary operator in this way is
that the production on the right side has a very simple handle, which is the sym-
bol with which it must begin. We only need to look for the symbol to see, moving
from left to right in the source code, whether the entire sequence is present. This
avoids a bane of early compiler developers called backtracking, 6 which is retreat-
ing from right to left in the source text, because the parser has realized that what
it thought it had does not occur. Backtracking is a problem because we want the
parser to be responsible for various tasks, including the generation of object code,
all of which would have to be undone. This gets nasty.
In general, and in parser theory, the handle is the set of terminal symbols
that can occur validly at the start or end of a nonterminal. The left handle is the
set of terminals that can occur at the beginning. The right handle is the set that
can occur at the end.
In producing a pragmatic production, a rule of thumb is to watch for ambi-
guity, in the form of adjacent nonterminals, whose right and left handles are sets
of symbols that intersect. For example, expression orFactor is an ambiguous
sequence. An expression's right handle happens to be the set consisting of all
identifiers, the right parenthesis, and all numbers. An orFactor's left handle is the
set consisting of all identifiers, the left parenthesis, and all numbers. Since these
two sets have a large intersection, there is no easy way of telling where the or

6. Backtracking was a bane of early compiler developers, both on mainframes of the 1950s
and micros of the 1970s, because whenever the input text was on a serial medium such as
magnetic or even audiocassette tape, the medium had to rewind. The rewind had a high
''fwizz" factor in that it was fun to watch but it wasted time.

75
Chapter 4

factor begins! For example, a+l and b might be the expression a and the or factor
1 and b, or it might be the expression a+l and the or factor b. You might object that
we know that and has lower precedence than +, but we cannot use this "knowledge,"
because we're constructing it as the BNF itself.
Compiler generators such as yacc are developed to efficiently determine sets
of handle symbols in order to find out whether the grammar is ambiguous.

Parentheses

Finally, let's look at the way in which we support parentheses. This is a topic we
touched on in Chapter 3.
Look at the definition of a term. A term is the smallest component of an expres-
sion, and typically it is an identifier, string, or number. However, what parentheses
do in a language like Visual Basic is make a simple term out of complex expressions;
therefore, a possible term is left parenthesis, expression, and right parenthesis.
The ambiguity here is that working from this definition alone, an expression
such as (a* (b-c) ) with nested parentheses would be parsed improperly. This is
because a typical implementation in code of a BNF grammar would find the left-
most parenthesis and conform to the BNE because BNE by itself, does not specify
where to stop.
The BNF definition for term shows the alternatives unsignedNumber, string,
LValue, True, False, functionCall, and the parenthesized ( expression ).
If the candidate string for a term starts with a left parenthesis, we know that
the only valid possibility is a parenthesized expression, because, as we can see
by examining the BNE no other candidate starts with a left parenthesis. The left
handle of an unsignedNumber is plus, minus, and the digits. The left handle of
a string is a double quote, and so on. None of these left handle sets include the
left parenthesis.
The problem, as we've seen in the miniparser of Chapter 3, is that we need
to pass a substring to the expression parser. If we pass the entire source program
one character beyond the left parenthesis, it will be rejected as a valid expression
because it will end with unbalanced material, as in a+l) -b.
Just as we did with integerCalc in Chapter 3, in Chapter 5, we will implement
a simple code workaround as a submethod in the term recognizer, and search
ahead for the balanced right parenthesis.
We've reviewed the critical parts of the BNR You've seen that with only one
exception-the use of code as a workaround to balance parentheses (a strategy
that can also be used with languages of the C family)-our version of QuickBasic
can be formally specified. Let's now take the complete BNF and run it through
the analyzer to see what happens.

76
The Syntax for the QuickBasic Compiler

Analyzing the BNF of Our QuickBasic


Let's rerun bnfAnalyzer.exe. Click the file BNF analyzer test 5 (quick basic) .txt,
and then click Create Reference Manual. The processing will be more intense
as the application progresses through the large file.
When the Reference Manual Options form appears, set it up as shown in
Figure 4-15. The large amount of output will exceed the capacity of the text box
used to display the results, and the boxed, monospace comments are inappro-
priate for large outputs.

... Reference"Manual Options

ILanguage Name IQUiCk Basiq

Ie Text format r Place sections of the manual in boxed comments


(note: not suitable for large files)
------------------------~~
r XML format r TIle Iv1L auld nw ude the BNF source .lb =:omment

(7 r ufii _~'l au d e,3pe!'lr n sep'3.rt1te Imes

r t. B IF- n ttnbute to productJol) ag6

(7 men E' dt ':I It~ trte !;"oClatl:! n me In the .,1Ii't


aq

P- Include non-terminal Index


P- In dude terminal index
Syntax Reference

P- Indude syntax
P- Indude lists shOWIng where symbols are used. in addition to their definitions.
10 the syntax
r Include formal Backus-Naur definitions

P- Indent the language reference manual


P- Always show this form before producing the reference manual
Oose (and create
Save Settings Restore Settings Cancel
manual)

Figure 4-15. Options set up for QuickBasic BNF

77
Chapter 4

The output with these options is just too large to fit in a text box. Therefore,
you will see the prompt shown in Figure 4-16. This will allow you to also store
the output in the named text file.

bnfAnalyzer .'; 'i.~~

The reference manual may not fit into the display because it OK
is 76495 characters. it may be truncated at the end. ~

Cancel
aick OK to save the reference manual in the file
"C:\egnsf\bnfAnalyzer\bnf.TXT". change the file id to a
preferred value and dick OK or just click Cancel to proceed,

If you click OK you can then view the file in most


contemporatyversions of Notepad.

Figure 4-16. Saving the reference manual for QuickBasic as a text file

You will then see the reference manual, commencing with the nonterminals,
as shown in Figure 4-17. Note that to obtain the properly formatted effect, you
will need to copy the text from the (pinkJ) dialog box and paste it into Notepad,
because it wraps in the dialog box.

.
~_.~~.*_ .•• ~~~w ••• *~
.
* •• **~~ ** •• ~ ••• *.* •• w. . . . w•• *.w*~.*.w.w.~.~
• REF ERE NeE MAN CAL F qR THE QUI C R ·
BAS I C LAN G U AGE

NONTERKlNAL SYMBOLS

The follow~nq are ~be non~erm1nal ~ymbol~ of ~he lanquage

Number of undet~ned ~ymbol~: 0

Non~erm1nal Where C~ed Remar~

addFac~or relFac~or
addFac~orRHS addFac~or and addFactorRHS
addOp St8r~
symbol
andFaccor orFactor
andOp orFactor
andOp 2
a~Clau~e dimDefin1~~on and form4lParame~erDef
8ss1qlllll"'ntStmt: 5~atemencBody

Figure 4-17. Start of nonterminallist for QuickBasic

78
The Syntax for the QuickBasic Compiler

This lists the grammar categories of QuickBasic. Notice the appearance of


the nonterminal andOp twice, with the second occurrence containing a sequence
number.
This occurs because I could not resist adding Visual Basic .NET's AndAlso as
a new operator. As explained in Chapter 3, AndAlso allows a lazy And, which does
not evaluate the right side of the And when the left side is False. Our implementa-
tion also contains OrElse. Therefore, there are actually two productions for andOp:
andOp := And and andOp:=AndAlso. When multiple alternative productions appear,
they will create multiple lines in the nonterminallist, as in Figure 4-17. The Where
Used column information, however, will be provided for only the first line,
because it applies to all forms of that production.
Scroll down to see the terminals, as shown in Figure 4-18. The list starts with
all terminals specified, in the BNE as strings. Scrolling down in the list, you will
find a complete list of reserved words associated with this language, specified
using proper case.

TERMINAL SYMBOLS

The follow1no are che ce~nal elemeDc~ O~Che lanquaoe

Terminal Where U~ed Remark.,

"&" 1keFaccorRHS
"(A Value and fUDcc10DCall
"0" formalParamecerDef
")" Va ue and funcc10nCall
",.." lIIUOp
powOp
~1gn. powFaccor and addOp

", " forma Para~cerL1~cBody


"-" ~1Qn. powFaccor and addOp
"I" mulOp
"0" opCl.onSCmc
"1" opcl.onScmc
l.I!:!Il<!!diaCeCommand
"in prl.nc
"<" relOp
relOp
"<>" relOp
"-" ~11.c1c~~l.gnmenc. forHeader and re Op
relOp
relOp
"Sa5t!:" OPCl.ODSCmC
"End" ~ubDefl.nl.C1on and funccloDDef1D1c10n
"Funct.10n l ' fUDcC10DDefl.D1C10n
"In~c" c:::ace
"Linf!:" crace
"M~ry" crace
"Mod" It;1.I1Op
"!loSox" trace
"Objecc" crace
"Source" crace
"Scaclc" crace
"Sub" ~ubDef1nl.C10n

Figure 4-18. Start of terminal list and part of the reserved words for QuickBasic,
including extended reserved words for the trace instruction

79
Chapter 4

Finally, scroll down further to see a complete syntax reference for our version
of QuickBasic, synthesized in English from the BNF alone. Figure 4-19 shows the
beginning of this reference. This is a comprehensive outline for QuickBasic pro-
grams, organized as a sequence of rules. 7 For example, the first rule declares that
an immediate command (used to type an expression for immediate evaluation as
seen in the Immediate window of Visual Basic) is a single immediate command,
followed by zero, one, or more sequences of the form: immediate command.

I
The following are the rules of the language

1. An r-edi"te Ccmlaand can consist of the following:


1.1. This sequence:
1.1.1 A Single Immediate Command
1.1.2 This (repeated zero, one or acre tL.es):
1.1.2.1 This sequence:
1.1.2.1.1. The string ":"
1.1.2.1.2. A Siogle Immediate Command
An 1mmediate Coumand is a start symbol
2. A Singl.. bDediate C""""""d can coosist of the following:
2.1. This set of alternatives:
2.1.1 An Expression
2.1.2 An Explicit AssignDent
A Single ~diate Command can appear in an ediate Command
3. An Expr.. ssion can consist of the following:
3.1. This sequence:
3.1.1 An Or ractor
3.1.2 This optional sequence:
3.1.2.1 This sequence:
3.1 . 2.1.1. An Or Op
3.1.2.1.2. An Expression
An Expression can appear in a Single Immediate COIDIIIaJld • an Implicit
Assiqnment , a Subscript List , a Circle , a Do Condition , a For Header
, a Go Sub , a Goto , an If , a While Ontil Clause , an Expression List
• a While Header , an Expression and .. Teno
4. An Explicit Assignment can consist of the following:
iJ
Con~nue I
Figure 4-19. Start of the reference outline for QuickBasic

Take a look at outline item (3). It claims, truly, that an expression can consist
of an or factor, followed by an optional sequence consisting of one of the or
operators (Or or OrElse) and an expression. It claims, truly, that an expression

7. As a "word" person. who went into computing as part of an elaborate draft-dodging scheme
that got completely out of hand. I have always been underwhelmed. to say the least. by the
absence of truly automatic documentation from source code. This is my two cents.

80
The Syntax for the QuickBasic Compiler

can appear in many contexts, including single immediate commands such as Let
A=l+l (where 1+1 is the expression), implicit assignments such as A=l+l, lists of
subscripts as in array(A, B+1), and so forth.
This is useful information, although the raw outline needs to be "decorated"
with tutorials, examples, illustrations, and witty remarks to be an actual reference
manual. If you want to enhance the reference outline extensively, you should use
the option to convert it to XML.

Eight Guidelines for Effective BNF


You've seen how to use BNF in the requirements phase of our language, as
a formal language that is not a programming language, and we have some tools
for documenting the language. Here are some general guidelines for creating
goodBNF:

1. Create the valid BNF before you code the compiler.

2. Validate the BNF by making sure that bnfAnalyzer, Bison, or yacc processes
the BNF completely and without any errors, before you code the com-
piler, or expect to use the output of Bison, yacc, or any parser generator.

3. If the BNF tool you are using supports comments, comment the BNF
with descriptive information. Consider accompanying each rule with
a complete explanation in your natural language.

4. In general, nonterminals that are sequenced (as in a b) should have dis-


joint sets of handles (possible terminals) in the right handle of a and the
left handle of b. For example, the sequence identifier identifier should
never occur if identifier has Visual Basic syntax, because the right handle
of the first identifier is the set of all letters, numbers, and the underscore,
while the left handle is the set of all letters (for .NET format identifiers, the
left handle also includes the underscore). This means that unless blank is
also defined as having a syntax role in your language, you literally don't
know where the first identifier ends and the second identifier starts.

5. In general, nonterminals that are alternated (as in a Ib) should have


disjoint sets of handles in the left handle of a and b. For example,
number Iidentifier is probably okay, because number can start with
a digit only, while ident i fier can start with a letter only.

81
Chapter 4

6. When coding BNE take your special programmer hat off and wear your
special requirements hat. BNF is a formal language, rather than a pro-
gramming language. It specifies the set of possible sentences in your
language, not how to parse.

7. In spite of rule 6, develop the BNF for the programmer of the parser and
not for posterity.

8. Don't show the user the BNE Instead, use bnfAnalyzer to create the basis
of a rules manual. Don't show the user this manual. Instead, use it as
a basis for a presentation that shows you have done your due diligence
and a reference manual is available for posterity.

bnfAnalyzer Technical Notes


As I mentioned earlier in this chapter, this section is best read after you've read
Chapters 5,6, and 7, since it discusses how bnfAnalyzer was developed as a small
compiler for BNE
Uke our flyover compiler of Chapter 3, and like quickBasicEngine, bnfAnalyzer
is a scanner that outputs individual tokens to a table (U5Rscanned, defined in the
General Declarations section of frmBNFanalyzer.frm). Then a recursive, top-down
descent parsing algorithm is used to convert the scanned code.
We need to scan because the input to bnfAnalyzer is free-form, consisting of
white space, comments, and tokens similar to a Visual Basic or QuickBasic pro-
gram. However, we do not parse to object code as in the case of integerCalc or
quickBasicEngine. Instead, a tree structure is created, showing all the rules.
Recall, however, that my very cool Vaio laptop was briefly resident in the pawn-
shop during the initial development of the software for this chapter alone, in April
2003. 8 For this reason, I had to use Visual Basic 6 on my older Compaq Presario
laptop to develop bnfAnalyzer. I also wanted to develop a solution in a hurry as
exclusively form code.
It is generally better to factor large projects into objects: objects for scanning,
objects for representing variable types, objects for representing variables, and so
on. As is shown in Chapters 5 through 7, this allows us to ruggedize the objects
by creating a comprehensive test methodology and test the GUI for each object.
But often enough, schedule pressures don't permit this. However, you can take
lessons from genuine object-oriented development and apply them to proce-
dural development, and this has been done in bnfAnalyzer.

8. Many pawnshops will no longer accept old laptops and will make loans only for contempo-
rary laptops that can show DVD movies, since their clientele will often want to watch movies.

82
The Syntax for the QuickBasic Compiler

The tree that represents the compiled BNF (in COLparseTree inside General
Declarations, inside frmBNFanalyzer.frm) is just a Collection. But while most
vanilla collections are simple one-dimensional arrays and hash tables, COLparseTree
exploits the fact that any collection item can be a variant, and in particular, it may
be a collection!
This means that recursive data structures can be, without too much loss of
efficiency, represented as collections that contain sub collections.
For example, item(l) of COLparseTree is a one-dimension collection of all non-
terminals found in the BNE Item(2) is a similar collection of all terminals. Item(3)
is the root of a tree, as described in a comment header placed in the code before
COLparseTree, which represents the compiled BNR
COLparseTree is, in fact, a virtual object, since, for all intents and purposes, it
encapsulates the complete parse in one place that is easily passed between rou-
tines. When a new input file is selected, it is set to Nothing, and then rebuilt when
the user chooses to create a reference manual. However, I was far more concerned
with the reliability than the efficiency of this approach.
The problem with legacy, plain vanilla collections in Visual Basic 6 is that they
can contain any variant value in any item. The parallel problem in .NET is that
plain collections can contain any object in any item. This means that the lan-
guage and the Framework won't enforce any rules on our behalf, and if we mess
up, incorrect results will occur without warning!

The Inspection Report


Years ago, starting to use more and more complex user defined types (UDTs) in
the C language, I found myself developing two routines rather consistently for
those data types. I needed a routine to print the UDT so I could check it for valid-
ity, and, for advanced applications, I needed a routine to audit or inspect the UDT
for correctness, over and above the rules enforced by the compiler and operating
system. These later became a core object methodology, which I describe in the
next chapter, because objects also need these tools.
This is why you may have noticed, during the parse of your BNF files, a series
of progress messages claiming to "inspect" the parse tree in COLparseTree. After
creating the parse tree, a series of rules is applied to the collection, and the pro-
gram prints an error message if they are violated.
Of course, if I've done my job, these rules, which have strictly to do with the
internal structure and not with user error, will always be satisfied. However, they
are not quite a complete waste of time, since I've provided source code, and they
comprise a check on changes to the source code.
Run bnfAnalyzer.exe on any input file and go to its Tools menu on its main
screen. Choose Inspect the parse tree. Part of the resulting inspection report is
shown in Figure 4-20.

83
Chapter 4

Message at 3/1/2004 9:00:22 PM fro. tbe following component:

INSPECTION or PARSE TREE AT 3/1/2004 8:59:59 PM I

•• ** ••••••••••••••••••• ** •• ** ••••••• ** ••••••••• ** •• **.tt*.

• The parse collection must be SOIIIething and not Notbing •


• Rule application has succeeded •

•••• *•• *.**.** •• ~ •••• *** ••••• **.* •• ~~ ••••••• **.****.* • •• **

* The parse collection must contain ewo entries at miniau.


• Rule application bas succeeded
• Count of colleotion i. 754

• Itfta(l) of the parse collection is a subCollection, containing


• .ub-sub-collection. in eacb of its 1~.

• Item(l) of each .ub-.ub-collection identifies one non-


tenainal as a String

Conrinue I
Figure 4-20. Part of the COLparseTree inspection report

Notice that the inspection report explains in painful detail what it needs to
find, as you will see as you scroll.
Am I going overboard? No, I'm not.
A large object, whether it is a true object or a UDT that dreams of being an
object, is something with a determinate state, and that state is either correct or
bad. Abane of maintaining legacy, non-object programs was the way they, as
large collections of disconnected variables, could easily get into a bad state,
never to return.
The inspection routine is not only accessible from the menu, it is also caIled
when bnfAnalyzer parses an input file. This provides valuable ongoing quality con-
trol. We are, in other words, doing real quality control on the ground.

Collection to String Conversion


Another way to check the parse tree is to see the data structure as a collection.
The bnfAnalyzer project includes clsUtilities, which is an embedded utilities
class (a more ambitious .NET utilities class, implemented as Shared methods, is
84
The Syntax for the QuickBasic Compiler

used in our .NET projects). clsUtilities contains a tool for converting any collec-
tion to a readable string called, unimaginatively, collection2String. To see how
this displays COLparseTree, open the Tools menu and select Dump the parse tree.
The result is shown in Figure 4-21.

9:02:15 PM froa the followinq coeponent:

vbColleetion
( I
vbCollection
(
vbCollection
(vbStrinq("ilIIooedi .. tee.,......nd"). vbLonq(l)) ,
vbCollectioD
(vbString("5ingle~di .. teCa.aand·), vb Long (2) , vbLonq(l».
vbCollection
(vbString("expression·), vbLong(3) , vbLong(2) , vbLonq(43). vbLong(46). vbLong(28) ,
vb Long (61) , vbLong(23), vbLong(33) , vbLong(34), vbLonq(25) , vbLonq(67), vbLonq(68) , vbLonq(26
). vb Long (3) , vbLong(97».
vbColleetioD
(vbStrinq(·explieitAssignaent"), vb Long (4) , vbLonq(2) , vbLong(18»,
vbColleetion
(vbString (·soureeProgr...... ), vbLonq (5), vbLoDq (11) ) ,
vbCollection
(vbStrinq("optionStat"), vbLonq(6) , vbLonq(5),
vbCollection
(vbStrinq (" sourcePrograaBody"). vbLonq (1), vbLonq (5) ) •
vbColleetion
(vbStrinq("sourceProgram[2]"), vbLonq(8»,
olleetion
(vbStrinq("loqicalNewline"), vbLonq(9) , vbLonq(5) , vbLong(11) , vbLoDq(104) , vbLoDq(105»),
vbCollecticn

Conbnue

Figure 4-21. COLparseTree dump

This shows the collection and its members, which are subcollections using
parentheses whenever a sub collection occurs. This is what I call the "decorated"
approach to showing Visual Basic values in Visual Basic 6 and .NET. The deco-
rated approach serializes variant types and values in Visual Basic 6, and object
types and values in .NET. As you can see, it explicitly identifies not only the val-
ues of collection items, but also their types.

bnfAnalyzer Tools
There are a number of other options available in bnfAnalyzer, all accessible from
the Tools menu, shown in Figure 4-22.

85
Chapter 4

Cr~te tanguage reference manual


Reference manual optlons

Create the parse tree


Parse the BNF
Destroy the parse treee
Dump the parse tree
Parse tree to XML
Inspect the parse tree
Vre!'N the source BNF

Ust nonterminal symbols...


Ust terminal symbols ...

Dump sc:anTabie .. .
t Inspect sc:anTable.. .

Figure 4-22. The bnfAnalyzer Tools menu

These functions are available from the Tools menu:

• Create language reference manual: Performs the same function that the
button performs.

• Reference manual options: Calls up the options screen for the refer-
ence manual.

• Create parse tree: Creates the parse tree structure, without parsing the
BNF or showing the reference manual.

• Parse the BNF: Creates the parse tree structure and parses whatever file
is selected, without showing the reference manual.

• Destroy the parse tree: Destroys the parse tree, by freeing all subcollec-
tions and then setting the tree to nothing. Freeing all sub collections is
important to avoid COM clutter.

• Dump the parse tree: Creates the collection dump of the parse tree (see
Figure 4-21).

• Parse tree to XML: Converts the parse rules to an XM:L file, as shown in
the previous sections of this chapter.

• Inspect the parse tree: Inspects the parse tree (see Figure 4-20).

86
The Syntax for the QuickBasic Compiler

• View the source BNF: Allows you to examine the source code, but not to
change it. (You can use Notepad to change the source BNE)

• List nonterminal symbols: Provides only the list of nonterminals, which


is also embedded in the reference manual.

• List terminal symbols: Provides only the list of terminals, which is also
embedded in the reference manual.

• Dump scanTable: Prints the scanned BNF in a readable form. This is the
lexical analysis, as shown in Figure 4-23. The lexical analysis starts with
several newlines because the input text file starts with several comment
lines, and comments are ignored by the scanner. The scanner captures the
token type, start index, length, and value of each token. The value is dis-
played using tools in clsUtilities, which serialize unprintable ASCII to
a viewable form.

Me~~aQe ac ~/:/2004 9:06:1e PM from che fo:low~n; componenc:

SCAN DUMP AS OF 3/1/200~ 9:06:03 PM

:::olcer. Type

newll.ne ~o 2 "<ne:wl:.ne>"
newl::.ne 43 2 t'(newllne>"
nt!wl:..ne 117 2 rt(newll.oe>"
newline 168 2 "<new11ne>J'
newline 242 2 It<newll.ce>"
nt!wl1n~ 299 2 lI<newl.:.ne>"
newll.ne 302 2 "<newl~ne>"
newll.ne 305 2 "<newl::.ne>"
new l.ne 33. 2 "<newl~ne>"
noncerml.nalldenc::.fl.er 333 :6 "::.mmedlaceCornmand"
produccl.or~~l.qnrnenc 350 2 H :_"

nonCerm1nal:denc::.f::.er 353 22 "sicqleln:medl.aCeCommand"


parenche~l.s 3"6 " (n
sc:-::.cgTolcen 377 3 vbScr~cq(ChrW(3'l) & "." & Ch:-W(34»
nonce:-m1nalldenc~f~e:­ 381 22 "sl.nqleZmmed::.aceCommand"
parenchesl.~ 403 H)"
m::eOperaco:- 404 WI,. ••

new ne 405 2 "<new11ne>"


nonce~na :der.c~f~er 407 22 "s::.nqleImmed1.4CeCommand"
p:-oduccl.o~~ssl.gnmenc 430 2
nonce:-mina_Zdencl.fl.er 433 10 Hexp:res.s10n u
al'Cerca'C:.oc 4'14
non'Ce~~na::dencl.fl.er 446 "expll.c::.cAss::.qr~nc"
newlu:e 464 :2 "<newll.ne>"
newlu:e 437 :2 "<n~w l.r.e>"
nonte~~~~a~:der.c~t1e~ 4e9 ~3 ".sou=ceProoram b

producc::.or~ss~gnmenc 503 2 n :_"

non'Cerml.nalldenc1fle:- 506 :0 "OpC10nSCmC"


newll.ce S:e 2 "<n~ w:l.r.e>n
noncer~nal dencif::.er 520 :3 hsou::,c~Pro(J=amn
produccl.or~~~l.qnmenc 534 2 n :_"

Figure 4-23. Scanned BNF dump

87
Chapter 4

• Inspect scanTable: Audits the scanned BNF for internal errors and pro-
duces the long and very boring inspection report shown in Figure 4-24.

NOTE My decision to use a user type for USRscanned instead of a collection


means that I don't need to make certain checks. For example, the use ofa type
(or .NET structure) means that the compiler and the runtime enforce confor-
mity to expected member types. However, the lack of an unsigned type means
I do need to check members, in a full-dress inspection, for positive values. In
addition, as I discuss in the next chapter, the tokens may not overlap, and this
is checked in the inspection.

Messaqe at 3/1/2004 9:09:11 PH fra. the followiuq component:

f~analyzer.anuToolsInspectScanTable_C1ick

INSPECTION or THE SCAN TABLE AS or 3/1/2004 9: 08: 35 PH


I

The scan table aust be an allocated azray


ule application bas succeeded

Each entry anst contain a valid token type (exc1udinq co_utI


Rule application bas succeeded
Type at 1 is NEWl.INE

Each ""try "WIt contain a valid start index


Rule application has succeeded
Start inde", at 1 is 40

Each entry must contain a valid 1enqtb


Rule applioation has succeeded
l.enqth at 1 is 2

Eacb entry ..ust contain a valid token type (exc1udinq coaaent)


Rule application bas succeeded
Type at 2 is NEWLINE

Continue I
Figure 4-24. Scan table inspection report

The Status Report


You will have noticed that each time we create a reference manual, a Status box
in the upper-right comer of the main form, and a progress bar under the Status
box, go berserk and show status and progress. However, it's difficult to see what's
being shown in the Status box.
88
The Syntax for the QuickBasic Compiler

You can select to see a detailed status report from the main bnfAnalyzer form.
To see how this works, run bnfAnalyzer and, on the main form, select a small file,
such as BNFanalyzer test 4, from the list in the lower-left corner of the screen.
Notice that three levels of status report are available in a Parse Status group box
on the main form. Select the highest level of detail: Complete report.

Parse Ste.tus
r Noreport

r Simple report

Ie [Cei'mpfete 1
~.QQ..rL ___ J

Click Create Reference Manual, and then cancel the options form to return to
the main form. Click the Zoom box in the upper-right corner of the form. You'll see
another pink dialog box, as shown in Figure 4-25. This one logs the status of scan-
ning, parsing, and all other steps. Scroll through it to see how the BNF is compiled.

3/1/2004 9:14:53 B~ Scanning BN? at character 152 of 205


3/1/2004 9:14:53 ~ Scanning BNF at character 154 of 205
3/1/2004 9:14:53 ~ Scanning BNF at character 166 of 205
3/1/2004 9:14:53 ~ Scanning BNF at character 169 of 205
3/1/2004 9:14:53 ~ Scanning BNF at character 179 of 205
3/1/2004 9:14:53 ~ Scan complete
3/1/2004 9:14:53 ~ Inspecting scan table I
3/1/2004 9:14:53 PM Inspecting scan table at token 1 of 13
3/1/2004 9:14:53 ~ Inspecting scan table at token 2 of 13
3/1/2004 9:14:53 PM Inspecting scan table at token 3 of 13
3/1/2004 9:14:53 PM Inspecting scan table at token 4 of 13
3/1/2004 9:14:53 PM Inspecting scan table at token 5 of 13
3/1/2004 9:14:53 PM Inspecting scan table at token 6 of 13
3/1/2004 9:14:53 PM Inspecting scan table at token 1 of 13
3/1/2004 9:14:53 PM Inspecting scan table at token 8 of 13
3/1/2004 9:14:53 PM Inspecting scan table at token 9 of 13
3/1/2004 9:14:53 PM Inspecting scan table at token 10 of 13
3/1/2004 9:14:53 ~ Inspecting scan table at token 11 of 13
3/1/2004 9:14:53 ~ Inspecting scan table at token 12 of 13
3/1/2004 9:14:53 ~ Inspecting scan table at token 13 of 13
3/1/2004 9:14:53 ~ Inspection complete
3/1/2004 9: 14: 53 PM Parsing the .canned BN?
3/1/2004 9:14:53 pt.! Checking for the nonte""inal "bnfGra-ar" using the
production "bnfGrammar :- production [ , production In: the handle is
"<newline>" at token 1 of 179
3/1/2004 9:14:53 PM Checking for the nonteca1nal "production " using the
production "production :- [ nonTecainal :- productionRHS J ~INB": the
handle is "<newline>" at token 1 of 119
3/1/2004 9:14:53 PM Checking for the nonte%ainal
"noDTe~inal" using the production ~nonTe~1nal :_

Conbnue

Figure 4-25. bn[Analyzer parser, detailed status report

89
Chapter 4

This progress report outlines the top-down recursive descent algorithm used
in bnfAnalyzer. This same approach is used to parse QuickBasic, as explained in
Chapter 7. A goal is set (parse the scanned BNF), then broken down into sub-
goals, and then narrated in this level of detail.
In summary, bnfAna1yzer is itself a form of compiler, which compiles docu-
mentation rather than code. I did a lot of extra work in constructing it in the form
of inspection and dump, so that I could rely on its output. Starting in Chapter 5,
you'll see how these core methodologies allow you to create solid compiler objects.

Summary
You have seen that BNF can be specified using BNF, and, by processing the BNF
file, you've seen how to use the bnfAna1yzer tool included in the sample code.
We've examined how the large BNF for our QuickBasic was developed and pushed
this file through the analyzer to make sure it is valid. And you've read eight rules
for using BNF as a requirements definition language.
We can now write our compiler, which will consist initially of a lexical ana-
lyzer, a parser, and our own "Nutty Professor" interpreter.

Challenge Exercise
Develop the BNF for a simple language that uses letters as logical variables and
the logical operators And, Or, and Not. Your language must support operator prece-
dence such that Or has low precedence, And has medium precedence, and Not has
high precedence. Your language must support parentheses.
Use bnfAnalyzer to make sure that your specification produces lists of non-
terminals and terminals, as well as the reference outline, without error.
Remember how to support parentheses: define parenthesized groups at the
same level as simple variables.

Resources
As noted in Chapter 3, a good reference for compiler theory is Compilers: Principles,
Techniques and Tools, by Alfred Aho, Ravi Sethi, and Jeffery Ullman (Addison-
Wesley, 1985). This book contains an excellent discussion ofBNR

90
CHAPTER 5

The Lexical Analyzer


for the
OuickBasic
- Compiler
The question whether a computer can think is no more interesting than the
question of whether a submarine can swim.
-Edsger Dijkstra

'What do you read, my lord?' 'Words, words, words.'


-William Shakespeare, Hamlet

YOUR BNF DEFINITION of the language, expanded into a reference manual perhaps
using the bnfAnalyzer software described in Chapter 4, is the detailed design, or
requirements document, for your compiler. It explains, in enough usable detail,
the semantic effect at runtime of user statements.
This chapter will enable you to get started with your own .NET compiler. It
describes the big picture, which starts with lexical analysis in support of parsing.
You'll learn some of the theory behind lexical analysis-just enough to help you
see how code implements theory (whether it wants to or not). You'll then see
how a scanner object (qbScanner) produces scanned tokens for lexical analysis.
The final section of the chapter describes object-oriented design principles
as they apply to the scanner object. These principles will be used consistently in
the rest of the QuickBasic compiler project.

The Compiler Big Picture


Our implementation of a classic compiler breaks down into three conceptual stages:

• The lexical analyzer, which reads the raw input text (almost always
a stream of ASCII and Unicode characters) and synthesizes meaningful
lexical units, passing them onward and upward to the parser

91
ChapterS

• The parser, which synthesizes the meaningful elements of the language


(such as assignment statement, if statement, and so on)

• The code generator, which emits usable object code

Note that software tools for working with source code, other than compilers,
might have this structure but replace the code generator with another form of
generator. For example, the bnfAnalyzer tool used in Chapter 4 scans (lexically
analyzes) BNF and parses it to create an internal representation of its structure.
However, bnfAnalyzer generates documentation instead of object code. The yacc
product generates C source code in place of object code, as do many preproces-
sors. This is, in fact, why this book stresses the front end of the compiler, as
opposed to code generation for MSIL. The front end, consisting of the lexical
analyzer and the parser, are utilities that allow you to craft, in any language, tools
to make your job easier.
There are other conceptual units in commercial compilers. For example,
popular units might be optimizers, which take either the source code or the
object code and improve its performance by transformations that are known to
be valid.
Our QuickBasic subset compiler, for example, notices degenerate operations
in the source code, such as division by one or addition to zero. These operations
are degenerate because their result is known: adding zero to a number always
results in that number without change. l Our compiler can, as you will see in
Chapter 6, remove these operations.
Our compiler also contains an assembler that resolves cross-references in
the generated object code. The assembler will be discussed in Chapter 6.

Lexical Analysis Theory


As noted in Chapter 3, lexical analysis and parsing do the same general task.
Literally or conceptually, they apply a formal grammar to the undifferentiated
stream of characters and build a meaningful structure. They both apply a for-
mal grammar, expressible using the BNF notation described in Chapter 4, to
raw input.
For example, a Visual Basic identifier in .NET has a formal grammar. In ordi-
nary training and in books, we state that a .NET identifier must start with a letter

1. Degenerate operations are not like staying out late in clubs and having fun. Mathematicians
call operations like a+O degenerate because, compared with useful operations like a+lO, a+O
is a waste of my time and yours.

92
The Lexical Analyzer for the QuickBasic Compiler

or an underscore, and it must contain one character. 2 A .NET identifier may con-
tain letters, numbers, and underscores up to an arbitrary length, but keep it short
for your sanity's sake.
The rule can be expressed in BNF:

VbIdentifier := ( LETTER I UNDERSCORE ) ( LETTER I UNDERSCORE I DIGIT ) *

However, notice that the right side of the informal BNF is actually using a notation,
which you may already be familiar with: the regular expression as seen in .NET.
In Chapter 3, we briefly touched on the topic of regular expressions. Here, we
will look at regular expressions and their relationship to formal automata, includ-
ing Thring machines and a specialized, limited abstract machine called the finite
automaton. This discussion should illuminate not only the tools for scanner gen-
eration, but also the manual writing of a scanner. It will show you how to think
before you code the lexical analyzer for your language.
Regular expressions specify in a formal notation the rules for a class of strings.
They originally appeared in Unix and are supported in Linux, as well as in objects
shipped with COM and .NET. Regular expressions are a terse, if not gnomic, way
of expressing the format of expected data. They are used to create good lexical
analyzers, and thinking in terms of regular expressions is an important skill for
the compiler developer. As you'll see, understanding a regular expression allows
you to make predictions about what strings will satisfy it, and this is what makes
a regular expression so very ... regular. 3

Core Rules of Regular Expressions


Some core rules of regular expressions are seen in nearly all regular expression
implementations. The rules for regular expressions can be expressed in BNF, as
shown in Figure 5-1.

2. The ability of a .NET identifier to start with an underscore is a new, and somewhat useful,
feature, added to Visual Basic as of .NET to bring Visual Basic in line with C++ practice. I use
this ability in the code of this book. Shared variables in classes, which are not part of the
object instance's state and which are, as their name implies, shared between objects, start in
my code with an underscore. This reminds the reader that "we're not in COMsas anymore,
Toto," and we are using a new .NET feature.
3. Mathematicians call regular expressions regular not because the regular expressions are regu-
lar; indeed, regular expressions appear rather irregular. However, the strings they specify have
a regular and predictable structure once the regular expression is known.

93
ChapterS

reoex :- sequencePaotor ( reoex J


sequencePactor :- a1ternatlonPactor alternatlonRBS
I
a1ternationPaotor :- ( postrixPactor [ postrlxOp J ) ( zeroOperandOp
a1ternattonRBS :- STROKE sequenoePactor
postrixPactor :- .trlno I obarset I ( reQex
.tr1no :- 1ooica1Cbar ( strlno J
post!1xOp :- ASTERISK I PLUS I repeater
repeater :- LIIf'T BRACE ( INTEGER) [ C<H1A J [ INTEGER) RIGHT_BRACE
zeroOperandOp :.-CARAT I DOLLAR SIGN
cbarset :- LIIPT BRACKET cbarsetExpression RIGHT BRACKET
cbarsetExpression :- cbarsetRanoe cbarsetExpression
oharsetRanoe :. 100lcalChar ( DASH 1ooloalCbar J
1oolca1Cbar :- ordlnaryChar I hexSeqaenoe I esoapeSequenoe
ordlnaryChar :- ORDI1~YCHAR ' Where an ORDINARY CHAR is ['\t +\'\S\\\-\{\}\[\]]
hexSequeDce :- BEXSEQUENCl!! ' Where a bex sequenoe is \\x[012J456789ABCDEPabodet)+
escapeSequence :- ESCAPESEQUaNCE ' Where an eso sequence 1s \\[\t\+\A\$\\\_\{\)\[\))

Figure 5-1. Regular expressions in BNF (generic; may miss some features of actual
processors)

We can use the bnfAnalyzer program to create a reference manual skeleton


for a generic regular expression syntax, as shown in Figure 5-2.

following are the ru1es of the language

A Regex can consist of the following: I


1.1. Thi5 sequence:
1.1.1 A Sequence Factor
1.1.2 This optional sequence:
1.1.2.1 A Regex
A Regex can appear in a Regex and a Po.tfix Factor
A Sequence Factor can consist of the following:

J
2.1. This sequence:
2.1.1 AD Alternation Factor
2.1.2 AD Alternation R H S
A Sequence Pactor can appear in II Regex and an Alternation Il H S
AD Alternation Paotor can cODsist of the following:
3.1. This set of alternatives:
3.1.1 This aequence:
3.1.1.1 A Postfix Faotor
3.1.1.2 Tbis optional sequence:
3.1.1.2.1. A Postfix Op
3.1.2 A zero Operand Op
An Alternation Pactor can appear in a Sequence Faotor
An Alternation R H S oan oonsist of the following:
4.1. Thi • • equence:
4 . 1. 1 A STROKE
4.1.2 A Sequence Factor
AD Alternation It H S can appear in a Sequence Faotor
S. A Postfix Factor oan consist of the following:

Continua I
Figure 5-2. bnfAnalyzer output for regular expression syntax

94
The Lexical Analyzer for the QuickBasic Compiler

NOTE For specifics on the Regex object available in Visual Basic .NET, see your
Help system (you did install it, didn't you?). The lexical analyzer described here
implements regular expressions without using a Regex object. Instead, it imple-
ments an understanding, in code, a/the regular expression model o/QuickBasic
syntax, at the lexical level.

Metacharacters

Any ordinary string can be a regular expression. For example, the regular expres-
sion A specifies only those strings consisting of the uppercase A However, it's the
use of special metacharacters that makes regular expressions so powerful. You
saw a few examples of regular expression metacharacters in Chapter 3.

Asterisk

Any regular expression followed by an asterisk specifies all strings that meet the
requirements of that regular expression, repeated zero, one, or more times (some-
times called zero-trip, because zero "trips" are allowed). Note that the asterisk
allows null strings to satisfy its rule and that it allows a potential infinity of strings. 4
For example, A* is satisfied by a null string or any string consisting of uppercase As
only. Also note that this is a recursive definition, because it uses the concept in the
definition ("a regular expression followed by an asterisk"). This is not cheating,
since the inner regular expression is shorter and must meet all the rules of regular
expressions.

Plus Sign

Any regular expression followed by a plus sign specifies all strings that meet its
requirements repeated one or more times (called one-trip). For example, A+ is
satisfied by the letter A and any string of As.

Curly Braces

Another way of specifying iteration is to include a specific count of iterations in


curly braces, as a multiple-character postfix operator to the right of the regular
expression being repeated. For example, A{2} specifies only the string AA. Nearly
all regular expression processors allow the digit in the braces to be a range, using

4. We say a regular expression is satisfied by a string; this means that the string conforms to the
regular expression.

95
Chapter 5

the comma to separate the minimum and maximum repetitions; therefore,


A{l, 2} specifies the strings A and AA.

Parentheses

Just as we have already used parentheses to group BNF elements, parentheses are
metacharacters that can be used to group and clarify complex regular expressions,
whether for the sanity of the reader or for correct execution. For example, A(BC) *
is different from the regular expression (AB)C*. The first regular expression is sat-
isfied only by strings that start with A followed by zero or more repetitions of Be.
The second regular expression is satisfied by strings that start with AB followed
by zero or more repetitions of e.

Vertical Stroke

The vertical stroke character (I) may be used to specify that, at the point where
it occurs, the regular expression on its left is alternated or Or'd with the regular
expression on its right. The regular expression (A*) I (B+) uses parentheses (which
are actually unnecessary) to specify the valid "set of all strings," consisting of
the null string (because the left side uses zero-trip iteration), a string of As, or
a string of at least one B.

Concatenation
To place a regular expression next to another regular expression is actually to use
an invisible or implied operator, that of regular expression concatenation. This
arrangement specifies that the regular expression on the left is followed by the
regular expression on the right, and that correspondingly valid strings must sat-
isfy the regular expression on the left and then that on the right, moving in that
direction. This invisible operator is comparable to concatenation in BNE For
example, A*B+ specifies zero, one, or more occurrences of the letter A, followed
by at least one or more Bs.

Backslash

Because so many special characters are used in regular expressions to specify


operations, anyone of them may be preceded by the backslash character to spec-
ify that it appears as the occurrence of that character. In particular, the backslash
character itself may be doubled to represent the literal appearance of the back-
slash character itself. For example, \ * can be used to represent a real asterisk. A*
represents zero, one, or more occurrences of the letter A, and A\ * represents the
letter A once, followed by an asterisk.
This is one of those marvelous Unix rules, an elegant feature in the sense that
it gives the coder a lot of power. But the price is the gnomic character of regular

96
The Lexical Analyzer for the QuickBasic Compiler

expressions. \* represents an asterisk. \ \ * represents a backslash repeated zero,


one, or more times. \ \ \ * represents a backslash foUowed by an asterisk. \ \ \ \ * rep-
resents a backslash repeated one or more times (and it is equivalent to \ \+).
In general, any odd number of backslashes foUowed by an asterisk represents
a string of backslashes equal in length to the original number of backslashes,
minus one, divided by two, and followed by the asterisk. Any even number of
backslashes foUowed by an asterisk represents the set of strings consisting of n lit-
eral backslashes, where n is the backslash count minus two and divided by two,
followed by zero, one, or more backslashes.
Like many structures invented by our friends, the gnomic developers of
Unix, things tend to come together at the last minute in an elegant fashion.

Square Brackets and the Dash

Another important feature supported by aU regular expression processors is the


ability to identify a character set, and to thereby specify that "at this position,
any character from this set is valid." The identification of the character set is per-
formed in a regular expression by listing the members of the character set in the
square bracket metacharacters. Metacharacters (including the square bracket)
may be included in this list, as long as they are preceded by the backslash. In this
list, the dash (-) metacharacter may be used to specify a range of adjacent char-
acters in collating sequence. For example, [_A-Za-z] specifies the set of valid
characters at the beginning of a Visual Basic .NET identifier: underscore, upper-
case, and lowercase.

NOTE A set is not a string. A set is an unordered bag of objects. no two of


which are the same (like my socks). A string is an ordered sequence of char-
acters that can contain duplicates. All sets can be made into strings without
loss ofany information, and this is why we often represent a character set as
a string. See the veri fy method in the utilities. vb code shipped with this book,
for example. It accepts the character set to be verified in a string. However,
because of order and the possibility of duplicates, not all strings can be con-
verted to character sets.

Regular Expression Processors


The core rules described here are supported by most regular expression proces-
sors. However, keep in mind that there are subtle, and sometimes dramatic,
differences in the way specific processors work.
Also, it is easy to specify ambiguous regular expressions for the same reason
it is simple to specify ambiguous BNE The regular expression a* ( (aa) {o, l}b) * is

97
ChapterS

a simple example whose specific treatment may vary from one processor to
another, with no error indication being typically provided, depending on the way
the processor is implemented.
Consider each regular expression processor a new language, potentially dif-
ferent from the regular expression processor it replaces. This is very powerful
stuff and confusing as hell. In the MIS world, as opposed to the more gnomic
Unix world, you need to document regular expressions and avoid complexity for
its own sake. Avoid ambiguous constructions.
In BNE two adjacent grammar symbols should not share a right handle and
a left handle, because this ambiguity can result in parse bugs. For example, the
sequence "identifier identifier" is probably not valid, since an identifier may
start with an alphabetic character or an underscore, and this set of characters
overlaps the set of characters that may end an identifier (alpha, underscore, or
digit). Likewise, it is usually not a good idea to concatenate two sub expressions
in a regular expression such that the set of characters that ends the first subex-
pression overlaps the set of characters that start the second sub expression. For
example, ([ abc] d*) * ([ def] g) may behave in unpredictable fashion, because the
d might end the first part or start the second part.
And bear in mind that regular expressions can be used, not as a program-
ming language driving a regular expression engine, but as a way of formally and
in detail specifying a syntax for coding. Indeed, this is what has been done in
quickBasicEngine.

The Relationship of Regular Expressions to


Turing Machines and Finite Automata
Regular expressions can, with relative ease, be translated into an equivalent,
abstract, mathematical machine, known in the math racket as an automaton.
These formal, and often simplified, paper machines can prove important theses
about the power and limits of computation, some of which are unfortunately
ignored in the real world.

Turing Machines
The most famous abstract machine was also the earliest. Alan Turing described
his 1936 Turing machine to show the limits of what is computable.
Now, what's a Turing machine? Is it a real computer in a museum, like the
Commodore or Speak and Spell, from long ago? No, the Turing machine is a purely
paper machine, the ultimate "Nutty Professor" computer- my affectionate term
for computers that are described but never built.

98
The Lexical Analyzer for the QuickBasic Compiler

As Alan Turing described the Turing machine in a famous paper written in


1936, it is conceptually (1) a state, (2) a read-write head of undetermined tech-
nology, and (3) a long paper tape upon which symbols can be read or erased.
The Turing machine, when fired up, is in a start state, usually state zero. It
examines the symbol on its tape. It then consults a list of quintuples of the form
(oldState, oldSymbol, motion, newState, newSymbol). It locates the quintuple
containing its current state (initially zero) and identifies the symbol it sees. Then
it stamps the current square with a newSymbol, enters the newState, and goes
left or right, depending on motion, which we can conceptualize as True for right
or False for left.
Turing (who later became an early programmer) discovered using this for-
malization that, in general, a Turing machine cannot read the suitably encoded
specification of another Turing machine and tell whether that Turing machine
will halt. This formal limit to what is "computable" is why we get to debug our
programs; why, indeed, the programming racket cannot be fully automated.
Turing machines are quite powerful, if you give them a lot of time. They don't
use random access memory, and for this reason, would take, if actually built, an
inordinate amount of time for the simplest problem. Windows 2000 on a Turing
machine is too horrible to contemplate. But if you ignore the important question
of time, and if the Turing machine has enough memory, it can make any calcula-
tion that a real computer can perform.
Turing machines are interesting to the programming language designer
because, in most cases (not all), you would like your language to be Turing-
complete, and as such, capable of expressing any calculation that a Turing machine
can perform. The hard way to do this is to write, in your proposed programming
language, a software simulation of a Turing machine. This is a great way to waste
a lot of cycles on calculating sums of one-digit binary numbers, but it proves
that your language can solve any problem that can be translated into a set of
Turing machine quintuples.
Indeed, Turing's proof was that one application would be the acceptance of
a set of quintuples that resulted in the Turing machine simulating any other Turing
machine. All you need to do, after all, is decide on a way to encode the program,
the quintuples, of the other Turing machine, and once you have done this, it is
quite easy to write quintuples that examine the encoded program and do what
it does.
This is really the fundamental theorem of computer science, because Turing
proved that rule-following is, itself, following rules!
The easy way to prove that your language is at least as powerful as a Turing
machine is to support, in your programming language, each of the structured
control structures with which you're probably familiar: straight-line code includ-
ing expression evaluation, If..Then.. Else, and looping. This is because it's known
that if a language supports these structures, it is Turing-complete.

99
ChapterS

Once Upon a Time, There Was This Machine •••


The difference between programming and math annoys many programmers.
I can wave my chalk around, not tell you implementation details such as how
the 1\rring machine finds the right state, and symbol in the list of quintuples.
But in the same way BNF as a formal notation and not a programming lan-
guage allows us to assume shared knowledge and the shared willingness to
suspend belief, Thring machines are a formal notation. The story could begin,
"Once upon a time, there was this goofy machine..." It ends with a proof of
what Thring machines can do (simulate any other Thring machine) and what
1\rring machines cannot do (detect in general whether another 1\rring machine
will halt).
I'm not going to inflict the proofs on you, nor inflict on myself the need to write
an amusing recount of the proofs. If you are interested, check out the numerous
books on 1\rring written for the general reader. If you are really interested, write
down some sets ofThring machine quintuples for solving some simple prob-
lems (such as binary addition). If you really want to dweeb out, write a visual
simulation of the 1\rring machine in Visual Basic, but don't show it to your man-
ager. This is one example of a completely useless program.
The ultimate geek movie, for the Beautiful Mind and Lord of the Rings crowd,
would be about some aliens who use a Thring machine. Then we could see this
gizmo on the silver screen, stamping and shuttling over a paper tape!
See www. turing. org for a discussion ofThring machines and simulators on real
computers.

Finite Automata

Regular expressions can be transformed in a straightforward way to a simplified,


limited abstract machine known as a finite-state automaton. Any'furing machine
can simulate a finite automaton. This is a Thring machine, with a state and a mem-
ory tape, but it can travel only in one direction and cannot change the tape.
In a book with the formidable title Programming Languages and Their Relation
to Automata (out of print, published by Addison-Wesley in 1969), Jeff Hopcroft
and Jeff Ullman, of Princeton and Bell Labs, showed proofs that finite automa-
tons and regular expressions parallel each other. Any regular expression and any
finite automaton accept the same set of languages, in the sense that they will
detect an error in the identical collection of languages. Both are incapable, in
turn, of carrying out a larger set of tasks.
A regular expression (at least without extensions) cannot express all the syn-
tax forms expressible by BNE and regular expressions are less readable than BNE

100
The Lexical Analyzer for the QuickBasic Compiler

Regular expressions, as such, cannot (again, without extension) support struc-


tures with unlimited nesting, such as parenthesized groupings and the block
structure of a typical programming language.
Another more real-world limitation of regular expressions is that they are
less readable than BNF because they express syntax on a single line. For this
reason, regular expressions (without extensions) such as those in .NET's Regex
object, do not allow you to modularize and break down a complex parsing
problem; BNF does.
Here is an example of a realistic regular expression for the syntax of the first
line of a Visual Basic COM function or subroutine:

«\xOD\xOA)I\xOA)[ ]*«Public )1(Private )1(Friend»{o,l}«Sub )1(Function )1


(Property (GetISetILet) »([A-Za-z_][A-Za-zo-9_]*)

I used regular expressions as a conceptual tool in developing the lexical ana-


lyzer for QuickBasic because mere code, in this complex case, did the work more
efficiently and is easier to understand. I have nothing against regular expressions
and have used them in application programming to save coding time. As an
actual tool, and as a formal requirements notation for detailed geeky design, reg-
ular expressions are just as important as BNE Therefore, I next present a regular
expression laboratory.

A Regular Expression Laboratory


You were introduced to the regular expression laboratory application, relab, in
Chapter 3. As I noted in that chapter, it helps you learn about regular expressions
for common tasks. This application also allows you to test regular expressions
using the standard .NET regular expression processor, and it converts regular
expressions to their declaration in Visual Basic .NET. The code for relab is avail-
able from the Downloads section of the Apress Web site (http://W.iM.apress.com) ,
in the egnsf/relab/bin folder.

Regular Expression Testing


Run relab.exe from the code supplied with this book. Dismiss the initial Easter
egg (after studying it, of course, to learn what the code is about). You will then
see the form shown in Figure 5-3, which allows you to view, modify, and enter
regular expressions.

101
ChapterS

F1e Tools Het>

Re ullSr expressIons AwilDbIa: doubllH:flck In lIIIied VlSulSlBaSIC axle


G.:aphl.cal charac:.er 3~'e Icha.raceer.s on ::cst PC k!!yboarch ..
V.!.8ual BaSlc co=.ent. exclud.1:lg c:ullng newl.1.ne: . (('" x
VUIl1&l BaSlC CQr::e.n~ l.nelud.1ng tral11nq newl1.11e: ( ] . t ( ...

Save

N'l.lll s:.rlng:
One b!4nk ehoroccer:
M- tlple hnu: Ll~. 1 of 3"O~O.3,'OOO:OLin. 2 of 3,J00013 •• 00010Llne 3 of 3

Test Save Sett ngs Reste<e Setungs I About Test thecanmon regulorexpresslO<lS Close

Figure 5-3. Use relab.exe to test, save, and document regular expressions.

Notice the list of regular expressions under the label Regular Expressions
Available. Double-click the last visible entry, which starts with "Simple Visual
Basic identifier (release 6 and before)." Now, drop down to the light-gray area
under the label Test Data (the darker gray area under that label is for storage of
your favorite test strings), and enter the string 12345 _a abce identifier1.
Before you do anything else, ask yourself, "What is the first (leftmost) iden-
tifier in the string?" Write down your answer.
Now click the Test button in the lower-left corner of the form. Since one of
the purposes of the laboratory is to allow you to save your regular expressions
and test cases, you will see a rather poisonously green screen, as shown in
Figure 5-4.

102
The Lexical Analyzer for the QuickBasic Compiler

(0 Add the Test string with the following description

(0 Alweys show this prompt


OK
(' Never snow this prompt. never add

(' Never snow this prompt. alweys add with


defeult description Cancel

Figure 5-4. Trust me, this screen, which prompts you to identify your test data, is
a yucky green.

The green screen also allows you to control the way in which you save test
data. As you can see, you can tell relab to never either prompt or save test
strings, or you can tell relab to always save, but not prompt for a description.
The Tools menu of the main form will allow you to bring up this screen without
adding test data.
When you are returned to the main form, be sure to click to the left of the test
string and in the light-gray test string area, since the tester will always start at the
location you specify. Click the Test button in the lower-left area of the form.
Oops, why was a in _a highlighted? This is because regular expressions have
no opinion about the strings that surround them, and this is something to keep
in mind. When used to find a string, they simply search for the handle of that
string, consisting of anyone of the set of characters that can start a string that
satisfies the regular expression. For the regular expression we've selected, this set
is [A-Za-z] (in regular expression set notation). Therefore, this is where the cur-
sor has moved. Regular expressions, unlike BNE do not care about context.
Press the right-arrow key, and then click the Test button to see the next regu-
lar expression, abce. Press the right-arrow key and click Test once more to see the
last regular expression, identi fier1.
Try a new regular expression. In the text box under Regular Expression at the
top of the screen, enter the regular expression ("[ ]+[ ]+)*. Our goal is to find
a series of blank delimited words.

NOTE This regular expression has a bug. What is it? If you know what it is,
enter the buggy regular expression anyway, without fixing it, in order to fol-
low the text.

103
ChapterS

Press the Tab key to exit the text box. You'll see a blue screen that allows you
to add a regular expression description. Enter the description Parse words, as
shown in Figure 5-5, and click OK.

Add Regular expression ,':h


"r.!" Add the Regular expressIOn
' 'he
WIth ~I' ..
t fol OWIng descnptton

Parseword~

r. Always sha.v this prompt


OK
C Never show this prompt never add

Never show this prompt always add with


default description Cancel

Figure 5-5. The blue screen (but not ofdeath) allows you to describe the purpose of
regular expressions.

Move back to the light-gray Test Data area and enter Moe Larry Curley,
with a few spaces before Moe, perhaps between the other Stooges,5 but no spaces
at the end! Click Test to see the green Add Test String screen (Figure 5-3), enter
a description for the test string, and click OK.
Click all the way to the left of the test string in the light-gray box, and then
click Test again.
Strange-only the blanks in front of Moe are highlighted. This makes no
sense, since our goal was to find a sequence of blank separated words, and we
entered "find a nonblank sequence, find some blanks, and repeat."
But we made a simple clerical error. We entered the caret before the square
bracket. In this position, it matches "the start of the input."
Fix the problem in the black-on-white text box, and tab out to be prompted
for a description of the new regular expression. (The Delete button above the
Regular Expressions Available list box allows you to delete old regular expres-
sions.) Click at the far left of the light -gray test data box and click Test. Oops,
something is wrong.

5. The Three Stooges were three American comedians of the 1930s who lost title to their films.
As a result, their films were repeatedly shown on American television during the 1950s. Their
name became a byword for cluelessness. They may correspond in Russia to The Five Stupid
Guys, or in India to The Junglee Fools from the Country.

104
The Lexical Analyzer for the QuickBasic Compiler

No match was found.

OK

The relab application allows us to "cudgel our brains" in isolation from code
using our regular expressions and to focus on cleaning them up. It's no fun to debug
a regular expression inside the business tier, in the server room, at 3:00 AM.
If you don't see why no match was found, ask yourself-cudgel thy brains-
what is the handle of the leftmost unit. Since the leftmost meaningful unit is
a nonblank, the regular expression doesn't start at the beginning of the string
Moe Larry Curley, with three blanks.
But shouldn't the search for the regular expression find, starting at the first
blank, the regular expression that starts with M{ It does not because we're start-
ing inside characters that are valid inside the regular expression handle.
Furthermore, there is a bug inside the regular expression. It doesn't allow the
input string to start with blanks. Many text strings will start with blanks. And it
actually requires that the input string ends with a blank, which is not true of most
input strings.
In other words, the regular expression is completely broken, showing the
value of relab. It actually needs to be ([ ] *[" ]+) *.
Strangely, the best way to express the fact that spaces occur between words
is to start with zero-trip spaces, because the one-trip nonspaces required by
[" ]+ will always parse the word. Placing the zero-trip spaces first defines a word
as "that which is preceded by zero or more spaces."
Enter the correct solution, and document it in the Add Regular Expression
box (Figure 5-5) if you like (or press Cancel to skip this step; note that this will
cause the solution not to be stored) . Click again at the left side of the test string
to see the correct answer finally highlighted.

Regular Expression Conversion to Visual Basic .NET


Next, examine the dark-gray text box on the right side of the relab form to see
the regular expression converted to Visual Basic .NET code, which you can copy
and paste. This means that once you've tested a regular expression (one you've
typed into the text box at the top of the screen), you can grab this code (which is
commented with the description you entered in the Add Regular Expression box)
and paste it into your own code.

105
ChapterS

NOTE The relab program doesn't actually convert the regular expression to
code that doesn't use a regular expression. Instead, it produces the formatted
definition of the regular expression commented with the definition. It is
a more formidable task to convert the regular expression to code, although
you can do that using the lexical analysis and parsing methods described in
this book. However, you also need to know how the regular expression is trans-
lated to a nondeterministic finite automaton and from that to a deterministic
finite automaton. If you're interested, refer to Aho, Sethi, and Ullman's "dragon
book,"Compilers: Principles, Techniques and Tools (Addison-Wesley, 1985).

When you leave the regular expression laboratory, it will save all of your test
expressions and test strings in the Registry in a standard location. You will have
your stash of tested and documented regular expression tools-your very own
gnome factory.

TIP To create a laboratory for a programming team, in which you can share
regular expressions and test data, you can convert the source code for relab to
save information in MicrosoftAccess or SQL Server. See the methods
form2Registry and registry2 Form for the code that should be modified.

Regular Expressions for Common Tasks


The relab tool comes with the following common regular expressions:

• The regular expression that defines the characters available on standard


PC keyboards in the US and displayable in most fonts (I call this the
graphical character set in the documentation, but note that it has nothing
to do with graphics)

• The regular expression for a Visual Basic comment that starts with an
apostrophe, extends to the end of the line, and contains no tabs or other
white space characters other than the blank

• The regular expression for a Visual Basic comment including end of line

• The regular expression for a block of contiguous Visual Basic comments

• Visual Basic identifiers (release 6 and before), excluding compound identi-


fiers of the form object. property

106
The Lexical Analyzer for the Quic/cBasic Compiler

• Visual basic identifiers (.NET), excluding compound identifiers

• Visual Basic identifiers (release 6 and before), including compound identi-


fiers of the form object. property

• Visual Basic identifiers (.NET), including compound identifiers

• The newline for Windows (carriage return and linefeed) or the Web Oine-
feed only)

• The header of Visual Basic COM procedures (subroutines, functions, and


Get/Set/ Let properties), excluding their formal parameter list

• The header of Visual Basic .NET procedures outside event declarations


(subroutines, functions, and properties), excluding their formal param-
eter list

• The Visual Basic COM formal parameter definition (a formal parameter is


one that appears in a function, subroutine, or property header, as opposed
to the actual parameter used to call the function, subroutine, or property)
of the form [ByVaiIByRefJ identifier As Type

• The Visual Basic .NET formal parameter definition of the form


[ByVaiIByRefJ identifier As Type

TIP See the Visual Studio Help system for more application-oriented regular
expressions, including regular expressions for phone numbers and ZIP codes.

Most of the included regular expressions have to do with parsing source code.
However, I don't recommend their use in a full compiler. This is because the com-
mon regular expressions do not take context into account and blindly accept the
next string that meets their rules.
For example, a Visual Basic identifier by itself is a valid formal parameter dec-
laration when Option Strict is not in effect in Visual Basic .NET, as in Private Sub
A(B). The ByVallByRef clause is not required (it defaults to ByRef in COM and to
ByVal in .NET), nor is the As clause, although omitting the As clause is always bad
practice in COM and .NET. This means that if the common regular expression is
used in the middle of arbitrary source code to find the next formal parameter, it
will return a false positive when an identifier occurs to the left of the first formal
parameter. In Private Sub A(B), the identifier Awill be mistakenly recognized as
a Visual Basic 6 formal parameter definition using the regular expression supplied
in relab, as shown here.

107
ChapterS

( «ByVal ) I (ByRef » {O, l} ( [A-Za-z]


[A-Za-zO-9_1*) (\([,]*\»{O,l}([ ]+As ([A-Za-z]
[A-Za-zO-9_]*»{O,l})

The regular expression can be used only after a procedure header has been
located, along with an immediately following a left parenthesis.
The common regular expressions were developed for a variety of software
tools that read and examine Visual Basic source as quick solutions to client prob-
lems, including the need to identify aU procedures of a certain type. They use an
ad-hoc or "lazy" approach to full-scale parsing of Visual Basic that is interested
only in certain strings.
Full-scale parsing, almost of necessity, involves parsing not characters (as do
these regular expressions) but of scanned tokens, such as those produced by an
object like qbScanner, described in the next section.

The Dangers of Using Ad-Hoc Code to Examine Source


It is usually a mistake to sit down and write a software tool that reads Visual Basic
source (or source in another language) and uses ad-hoc code, including ad-hoc
regular expressions, to find structures that need to be examined or changed. That
approach is full of nasty surprises, of which false positives, consisting of com-
mented source code and quoted source code, are only the beginning. It gives
tool-building a very bad name.
Essentially, to build self-reflexive Visual Basic tools (or tools in another language)
that have source code as their input, you need to parse using the full-scale tech-
niques of this chapter and Chapter 7, or you need to wam the users of the tool
that it may fail, or you need to apply ad-hoc methods very carefully.
It would be great, for each Visual Basic dialect, to have a full-scale parser object
that you could snap in to tool applications. In the add-in model, this object is
available, in a sense, because it gives you access to the object model of the
Microsoft parser. However, this is not something you can pick up and send, as
part of a product, to a customer. Such a parser object may emerge from the Mono
project to reverse-engineer an open-source .NET (visit http ://WIM.go-mono.com).
In fact, you can use the techniques of this chapter and Chapter 7 to develop such
a tool, covering yourself with fame and glory.

The final feature of relab that I would like to show you is its regression test of
the common regular expressions. Although the regular expressions are hard-coded
and inside a Shared (static) class, utilities.dll, I was nervous about getting them

108
The Lexical Analyzer for the QuickBasic Compiler

right, so I included the regression test feature. Also, since you have the source,
you might change them.
Click the button labeled Test the common regular expressions, on the bottom-
right side of the form. You will see a success dialog box, followed by a Zoom box,
which provides a text box view of a report, as shown in Figure 5-6. The report
shows a series of test cases applied to the common regular expressions, such
that the application is tested against expected results.

CO~~ON REGtr~ EXPRESSIO 7£S7 3d.2/2004 8:19:30 PI-!

Te~t1ng the graphLcalCharacterSet regular expres~1on "[ A-Za-zO-9


\-\' !\@\ S\'\A &\ \(\l_\-\+\= { [\)\,\ 1\\\:\; "" '\<\,\>\.\?\I:":
Graph~cal character ~et (characters on rno~t PC keyboards 1n the USA)
7est stnng: , OOOOOABCDEFGHIWC!·INOPQ;lS':UVWXYZ,"OOOOO
abcdefgh1:klcnopqrstuvwxyz0123456789&t00019-'!@iS%A& ()_-~={[J11\:;
"'<,>.&:00063/

Close

Figure 5-6. Regression testing the common regular expressions

We've revisited regular expressions. The next step is to see how we've wrapped
the hand-coded lexical analysis into an object, which gives us a reusable tool for
scanning QuickBasic and languages with related syntax.

The qbScanner Object


The lexical analysis of quickBasicEngine is implemented as a distinct object model
and stateful object, named qbScanner. It is a stateful object because it contains
variables that occupy memory, in its General Declarations section.
The .NET GUI application, qbScannerTest.exe, available from the Downloads
section of the ApressWeb site (http://www.apress.com). in the egnsf\Apress\
QuickBasic\qbScanner\qbScannerTest directory, lets you examine how this
object works. Figure 5-7 shows this program.

109
ChapterS

FIe Tools ~

Source Code Loed lesl stong Loedfrom file Seve to file


• & : ,ideatifier
+_. J(); "stria," 32767 ·32 6 ~12"0! #S "litis strina is HtlfaJlc'y"".41 "This stri.D1 is ''''f3-DCY''u. It 'fUCODt-aiIlS'tu !In
".er"" ItnuIIII"striDIS."wt'''.lH •. 2 eadld "This StriA, uses smart quotes"

:~
Seen Scennext

Reset Inspect
Objectto XML I W Include Abcut Info

M8X source • No Iom,t ["""100 W Includecomments


Test SlIVe Selltngs Close-don'l
Mf}X tokens • 1\'olom,1 S8ve settings

et_Settngs Restore Seltlngs

Scunncd Source Code (Typc~rt . cnd IInL'flumbcr value) Zoom

Figure 5-7. The qbScannerTest program

The purpose of a lexical analyzer is to get from an undifferentiated stream of


input characters, including line breaks, to a series of tokens, where each token can
be thought of as a word in our language. The set of tokens returned by an accept-
able scanner partitions the input stream of characters into smaller sequences of
characters.
As you will see, this is quite different from the situation in the parser. Even in
a simple statement such as 1 = 1 + 1, the parser will need to recognize several
different constructs. The entire statement is an assignment statement, which
contains an LValue (1), followed by an operator, and so on. The LValue 1 to the left
of the assignment is also an identifier.
Tokens as scanned by a scanner do not overlap in any way, but are instead
a simpler partitioning of the input source code, which makes the parser's life
easier. In our lexical analyzer, "meaningless" blanks (blanks outside comments
and strings) are eliminated, also to make life easier for the parser.

110
The Lexical Analyzer for the QuickBasic Compiler

NOTE We could downsize the lexical analyzer out of existence, since what it
actually does is a low-level parse. We could use the techniques described in
Chapter 7 instead. But this would make the parser much too complex.

Token Types

When you start up qbScannerTest.exe, it displays examples of each of the token


types defined in the QuickBasic language:

• Apostrophe

• Ampersand

• Colon

• Comma

• QuickBasic identifier, which has the same syntax as Visual Basic identifiers
prior to .NET

• Arithmetic and other operators (a single token type that excludes the
ampersand)

• Left and right parentheses (a single token type)

• Semicolon

• String

• Unsigned numbers

• Pound sign

• Dollar sign

• Newline (carriage return, and line number)

Note that our scanner does not limit the length of identifiers, as did older
QuickBasic and Visual Basic processors. This was necessary in older compilers,
where tables had to be carefully allocated in C or even assembler to preserve
scarce memory. In our implementation of quickBasicEngine, we have the nearly
unlimited length String data type, so this limitation is not enforced.

111
ChapterS

NOTE Several complex strings are provided in qbScannerTest. They test lexi-
cal analysis ofstrings and the Basic rule that doubled internal double quotes
represent double quotes.

Whether or not we use a regular expression to scan these tokens, any sensi-
ble code will be, in effect, an implementation of a regular expression. This raises
a problem with token schemes, including any possible proposal to make signed
numbers into tokens.
Ideally, each distinct token type should have a different handle-a different
set of characters that may appear at the beginning of the token. In Chapter 4, the
handle of a grammatical class was the symbol, or set of symbols, that could begin
the grammatical class. For example, a Visual Basic expression starts with an iden-
tifier, a number, a plus sign, or a minus sign.
We would like to simply scan left to right for anyone of the set of characters
that can start a token, for each token, and take the first token we find left to right.
There are, as I will show, dangers in this simple plan, but basically it's a good idea.
In fact, each token type in this data model has a different handle. The very
simple token types for single characters (apostrophe, ampersand, colon, comma,
semicolon, dollar sign, and pound sign) each has a unique handle: the character
to which it corresponds, which does not appear anywhere else. Parentheses are
restricted to the parenthesis token type. Identifiers start with letters and under-
scores, which appear nowhere else. Here are some examples:

• The apostrophe starts a comment.

• A letter starts an identifier, or, possibly, an operator like Mod, which has the
form of an identifier.

• A plus sign starts and ends an operator.

• A decimal point or a digit starts a number, which in our tokenization is


always unsigned; we treat the sign as a unary prefix operator. The next
section explains why we use unsigned numbers.

NOTE The ampersand is not included with the operators because it does dou-
ble duty in Quic/cBasic as the string concatenation operator and a type suffix
in an identifier. (Quic/cBasic has a legacy and ugly feature, which was pre-
served in Visual Basic through release 6: you can define the type of a variable
using a special character at the end of its name.)

112
The Lexical Analyzer for the QuickBasic Compiler

The Consequences of Syntax Changes


At the 2001 Microsoft Author's event, we were sitting around discussing the
new Visual Basic .NET capability of starting an identifier with an underscore.
We agreed that it was a good idea because it gives the programmer the ability
to isolate a class of "special" identifiers by prefixing them with the underscore.
Somebody asked why Microsoft did not go ahead and also allow digits at the
beginning of identifiers, as this would be more "powerful." The problem, of
course, is that then, the Visual Basic .NET token for an identifier would have
an overlapping handle with the number, and the lexical analyzer would be
very complex.
Failure to think through the consequences of "powerfu!" changes to the syntax
can result in permanent and ineradicable compiler bugs (known as "issues" or
even "features" when the user starts depending on them).
An old example happens to be the Fortran statement DOI=JT03, which could be
a valid Fortran assignment (of the value in JT03 to the identifier DOl, or a Do state-
ment (equivalent in Basic to a For statement) with no white space. The original
designers, although they were heroes, wanted to allow programmers to leave out
white space between the tokens of the language, but at the time they worked,
language theory was in its infancy. They therefore wrote the rule out in English
and did not spot the problem.
Software tools such as lexx, yacc, and Bison express in what I call a reified form
(converted into a concrete thing) the accumulation of programmer wisdom,
but this wisdom is a two-edged sword. Doing things in a more manual, less
reified form, as we do them in this book, allows us to learn why lexx, yacc,
and Bison enforce the rules they actually enforce.

Why Signs Are Not Included in Number Tokens


The number token defined in the QuickBasic language can contain an integer
part, an optional decimal part, and an exponent (consisting of uppercase or lower-
case e, an exponent sign, and the integer value of the exponent). 6 But why aren't
signs included in numbers in tokens?
If we allowed signed numbers as tokens, then the number would share
a handle with the plus and minus, because signs are also binary operators. There

6. The exponent represents shifting of the value by powers of ten. Exponents are used by nutty
professors, mad scientists, and disturbed engineers to represent the very large and the very
small.

113
ChapterS

are ways to code around the problem. The difficulty is that there are many ways
to code this, and nearly all of them are wrong.
You could look for a number to the right of the operator or sign, but this
means either one of two things: you have code for a token type of unsigned num-
ber anyway, which is useless to your user but that supports your actual number,
or else you merely move to the right and look for a digit. Oops, remember, a valid
unsigned number can start with a decimal point, and your code has to work for
-.1, which is valid. Consider a- -1, which is ugly but valid. Ifwe defined a signed
number as a token, when we encountered the minus or plus sign, we would need
to back up and examine context.
Another consideration is that your scanner would need to violate its com-
mitment to basically act like a finite automaton, and move left to right. Where
did this commitment come from? It came from the fact that our objective is to
broadly define the scanner object's behavior in clear and understandable terms,
as an implementation of a finite automaton, and a finite automaton moves left
to right only. Also, if the scanner can move backward, and undo scanning, this
makes scanner progress reporting all the more complex. As discussed in the "The
Scanner Object Model" section later in this chapter, qbScanner exposes events to
let user code manage progress reporting. If the scanner moves backwards, the
user will be confused by progress reports that back up. More generally, any extra
features of the scanner generate work as the scanner moves from left to right,
and this work would need to be unperformed and undone on backup. Finally, if
you want to use the scanner in multiple threads or as a scanner server that pro-
vides tokens on demand, a scanner that backs up will make the final product
very complex.
Fortunately, there is a simpler solution, and this is to simplify the scanner.
The scanner promises to give the parser unsigned real numbers only. It lets the
parser worry about the difference between a plus sign and a unary positive sign,
and between a minus sign and a unary negative sign. The result is that we can
implement the scanner by scanning forward for the handle of each distinct
token at any instant in scanning.

Scanner Implementation
Unfortunately, Visual Basic .NET does not provide an easy way (as do C and C++
with strspn and strcspn) to find anyone of a set oftokens, other than by using
the regular expression object (Regex), which is overkill for such a simple task.
Instead, if you examine the qbScannerTest project in the source code, you will
find a project and a stateless class (one with no variables that occupy storage at
runtime in General Declarations) called the utilities class. This class exposes
the verify utility, which scans for a set of alternative characters, or for the com-
plement of this set: the set of all characters not in the specified set.
The proposed design looks simple in structured pseudocode:

114
The Lexical Analyzer for the QuickBasic Compiler

Do until done
Find the leftmost token
Add It to the scanner's collection of tokens
End Do

However, there are two problems with this initial design. We will need to
have some sort ofindex to characters in source to keep track of position. This
initial design does not tell us how to manage the index, and this management
is tricky. Also, the pseudocode might result in slow real code, since Find the
leftmost token implies an inner loop.
Here is how we can manage the index:

Set Index to 1 (first character)


Do until Index > length(source code)
Find the leftmost token starting at Index
Add It to the scanner's collection of tokens
Add the length of the leftmost token
End Do

This more refined design neglects the possibility of blanks between tokens.
This is a fairly simple issue to resolve-just place a simple loop, which won't
make the code much slower (because, generally speaking, one blank will appear
between tokens in source code) under the Do until and before the Find the
leftmost token.

Set Index to 1 (first character)


Do until Index > length(source code)
For each possible token
Find its leftmost location starting at Index
End For
Add It to the scanner's collection of tokens
Add the length of the leftmost token
Skip over blanks and other white space
End Do

However, the wasteful inner loop (or straight-line sequence oftests, which
has the same constant effect on runtime) remains. It is wasteful because, in the
case of tokens that overlap, it will return false positives. For example, if we are
scanning the string "Identifier" (a string containing quote, Identifier, quote),
the inner loop will find the Identifier one position beyond the start of the
string. It won't return this bogus Identifier as a real token, because when the
string is found to be the real token, Add the length of the leftmost token will
shift the index over the false Identifier. But in a more complex case, such as
the string "*/Identifier", the scanner will scan two bogus arithmetic operators
unnecessarily.

115
ChapterS

In the case of the string "*/ldentifier", the programmer obviously means "a
string containing an asterisk, a slash, and the word Identifier." However, the inner
loop in the preliminary pseudocode will find a string at positions 1..14, an aster-
isk at position 2, a slash at 3, and the identifier Identifier at 4.
It would be better to create an "anticipatory" scan, and this was implemented
in qbScanner. In this algorithm, each trip through the major scan loop goes through
all or some tokens. For each token that hasn't already been found to the left of the
index, the scanner locates this token. It doesn't do this in an inner loop. Instead, it
uses lnstr. Then it selects the leftmost and widest token. The next loop can sim-
ply ignore false positive tokens, which are inside the leftmost and widest token.
Consider the quoted string "*/ldentifier". The first time through, the inner
For loop will find all three token types: a string, an asterisk and a slash, and an
identifier. But it can also note that the string is the leftmostand widest token.
Therefore, the stringwill be selected as the next token.
Of course, the scanner could also find other tokens beyond the string. Suppose
the string is followed by a number, one beyond the end of the string at position
15. Since the number is a different token type than the asterisk, slash, identi-
fier, or string, it is also recorded in an array in the For loop. In fact, consider
what happens when control returns to the For loop a second time: the main
scan index will have been increased by 14 characters, since this is the length of
the string. The second execution of the For loop will simply move beyond all
preset tokens when they occur to the left of the main scan index, and scan for
the next occurrence starting at the main scan index. Recall that its loop text is
"find its leftmost token starting at index"!

NOTE Don't be overly concerned that "* IIdenti fier"123 is, in syntactical
terms, complete garbage. Recall that it is the parser's job to worry about garbage
at this level. As far as the scanner is concerned, this string is just fine, and it con-
sists of two tokens: a string followed by an integer.

The scanner implements an array for each token type-the anticipatory


array-and each entry in this array has the following states: token unknown,
token found, or token does not exist to the end of the string. Each time through
the scanner's inner loop, when a token has unknown state, the scanner must
locate that token.

Set Index to 1 (first character)


Set all tokens in an "anticipation" array of possible
"candidate" tokens to unknown
Do until Index > length(source code)
Set the Index of the "candidate" past the end of the source code, because the
"candidate" is the token that might be the next real token

116
The Lexical Analyzer[or the QuickBasic Compiler

For each token In the array


If It Is unknown
Find It (If It doesn't exist then create an entry
pointing past the end of the source code)
ElseIf It Is completely to the left of the leftmost token "candidate"
Select It as the "candidate"
End If
End For
If the "candidate" Is beyond the end of the source code, we're done
Add the candidate to the scanner's collection of tokens
Set the candidate to Unknown
Add the length of the leftmost token to the scan Index
Skip white space
End Do

Note that completely to the left of the leftmost token "candidate" means
that it is not enough to compare the starting index of the candidate with the
starting index of the anticipated token, because tokens can overlap. Instead, the
start index plus the length of the candidate token must be less than the start
index of the anticipated token.
All this seems fairly complex. In a nutshell, the algorithm does the following:

1. Finds the leftmost and widest token, while also finding a set of useless
tokens inside the leftmost and widest tokens, and more usefully finds
another set of tokens fully to the right of the leftmost and widest token

2. Adds the leftmost and widest token to the output list of tokens, sets the
character index one past the end of the leftmost and widest token, and
repeats until done

This algorithm is implemented in the Private function scanner_ in


qbScanner. VB, and it is probably the most complex feature of the scanner. Note
that my coding standard within classes (but not within forms) is to end Private
method names with an underscore.

A Scan Test

Let's try running the scanner for the test tokens. Click the Test button on the
qbScannerTest GUI (Figure 5-7). You will get a Yes/No message box announcing
success. OickYes to see the output, as shown in Figure 5-8. As you can see, the
output contains a test string, the expected results, and the actual results.

117
ChapterS

To.ting qbSCILD1l"~ qbSClLD1l<>r0001 3/2/2004 8:21:08 PM at 3/2/2004 8:33:32 PH

Th.. test .t~ing is: • , : •identifier "0001310'00010+ - • I ( 1 ; -.tring" 32761 -32167 .. -12 , I , $ "Thi.
atri.n.cJ i. ""fancy ..... "'1'bl • • triDg t. ,nfaDCY·". It .... contain." .. ""innor"" " .... I'1 ...... tring."""., ........ 2 endId
U08220Tbh ltd"", u .... ...art quote.UD8221

••••• ~ R£SOLTS •••••••••••••••••••• l1li • • " ............ " •• " ••••••••••••• , ••••••••••••••••••

• 1Ipo_ t~opb..U .. 1 : 1 • •
• Amper ...nd93 .. 3:1 ,
.. Colon95 .. 5:1 :
• co..a@7 .. 7:1 • I
• rdentifi"rQ9 .. 18: 1 identifier
• N.."Un"a20 . . 21:2 "00013""'0010
• 0pe~"tori22 .. 22:2 •
• Operator824 .. 24:2 -
• Oper"tor@26 .. 26:2 • -
• Operatori28 .. 28:2 I
• Par.... th.. §b&30 .. 30:2 (
• P"renthe.io832 •. 32:2 1
• Seaic:olon834 .• 34 : 2 ;
• Sttinq936 .. 43:2 - . ."lng"
• Un.igne<Untoqer845 .. 49: 2 32161
• Oper"tor8Sl. .51:2 -
• on_ign~alNmobe~fS2 .. 60' 2 32761 .. -12
• l'e~oent@62 .. 62:2
• ~al..... t!.on@64 .. 64:2 I
• Pound966 .. 66:2 ,
• Currency@68 .. 6B:2 S
.. String@70 .. 96:2 "This .t.rln<J t ..... fancy·'.. ... ..
.. Str1ng@99 .. 170:2 "Thi. strinq 1s .. ltfancy ..... It:. ""cont.ain. '" 'l"iaoer"" "'''''"''~.tring . .... ''It''I'I .. !It
• l'eriodll112 .. 112: 2 . •
- On. ignedl1aa.lNwober81 74 .. 11 5 : 2 . 2

Close

Figure 5-8. Scan test output (zoomed)

The test string exercises qbScanner for each token type to regression test the
scanner when its source code is changed. It also includes marginal tokens to
make sure the scanner works for these cases. The test string uses XML notation
for the nondisplayable characters in a newline.
The expected and the actual results list tokens in a serialized form, where to
serialize an object is to convert it to printable characters. We have converted
newline in the test string to a displayable form given ourWmdows international-
ization locale (which, for us, is ASCII).
In the scanner data model, each token is an object of type qbToken, and its
toString method creates the view of the token in the actual results.
For each token, the scanner has named its type, identified its starting and
end index, shown its line number (to the left of the colon), and displayed its
value in an expression of the following form:

tokenTypeName@startlndex .. endlndex:lineNumber value

We really want the line number, by the way, since this will help display errors.
Programmers can't find character indexes as readily as they can spot line numbers.
To ensure that nondisplayable ASCII and Unicode characters are displayed
properly, a utility function (string20bject, available in utilities.d.ll and for which
source code can be found in utilities.vb) converts nonprintable values to their
118
The Lexical Analyzer for the QuickBasic Compiler

XML format, which is ampersand and pound, followed by the five digit decimal
value of the ASCII or Unicode character. Ideveloped the string20bject to sup-
port serialization because it drives me nuts when nonprintable characters such
as newline sequences appear in output.
Scroll down to see the complete actual results and how numbers are han-
dled, as shown in Figure 5-9 .

...... , AC'TUAJ.. JU!SOLTS ••••• , ••••••••••••• ,." •••••••••• " ................... 111 •••• 111 • • • iII • • • • • • • • • • • • • • • •
• Apo. .ropheU .. 1: 1 '
• lUIIperoand!!3 •. 3: 1 ,
• ColonSS .. 5:1 :
• C......91 .. 1:1 •
• IdentUierU .• 18: 1 id.... tifi"r
• lIewliDe920 .• 21:2 "00013"00010
• Op<lrator@22 .. 22:2 +
• Op<lrator824 •. 24: 2 - I
• Op<lrator@26 •. 26: 2 •
• Op<lrnor828 •. 28:2 I
• Parenthulo830 .. 30: 2
• Parenth ... h@32 .. 32:2 )
• S-icolon@34 .. 34:2 ;
• StriD'Jl!36 .. 43: 2 ".tring"
• Ol,.l.gnedIDtoqer@45 • . 49:2 32767
• 0per&tor@Sl .. Sl:2 -
• On.ignedR<oalllUllber852 .. 50: 2 32767e-12
• Perc.... tU2 .. 62:2 ,
• hoI ....... to .. U4 .. 64: 2 I
• Poundtl66 • • 66:2 I
• cw:rency968 .. 58:2 $
• S<riD9870 •. 95 : 2 "Tab .trinq h "-fancy"".-
• Strin989a .. 170:2 "This strinq 1. ""fancy·", It ""cont4in.s"" ""inner"" ,..,. .. ·' .... striDg...... ·.,." .... 111
• Period@112 .. 112:2 .
• OnoignedR<oalllUllber8114 .. 175: 2 .2
• Identlfier8177 .. 181: 2 " .. dId
• String.183 .. 213: 2 "08220Tb1a •• ring u.ea .... rt quote."08221
................................ ",. •••••• ,. •••• ,. ............................................. *•••• ,.

Teotil><J the clone .... thod wi. th tbe aompareTo _thod

Close

Figure 5-9. Complete results, including number scanning

The pure model of an unsigned number has actually been changed! It's true
that the real number -32767e-12 has been divided into a unary minus followed
by an unsigned real number. However, note that in the case of real numbers
only, we've slightly violated the rule that two distinct token types must have two
distinct sets of leading characters.
Unsigned integers may start with any digit; unsigned real numbers may start
with a decimal point or any digit. The character set "any digit" is shared by two
distinct token types.
The string 32767 was scanned as an unsigned integer, while -32767e-12 is
scanned as an operator, followed by an unsigned real number. We've kept our
promise not to include signs in numbers and let the compiler sort them out; how-
ever, we return two different, and apparently overlapping types, integer and real,
which of course will have common handles. This is readily explained.
119
ChapterS

We could have simplified the scanner to just parse integers and let the
parser synthesize real numbers. Real numbers have a sensible BNF syntax. But
overall, it is the scanner's job to make life easy for the parser.
Instead, the token type Unsigned Real is a synthetic type that is based on
Unsigned Integer. When it's time to find a real number in the inner, anticipatory
scanner loop, the real number finder looks for an integer, followed optionally by
a decimal point, followed optionally by another integer, e, exponent sign, and
a third integer. The result is that in the anticipatory table described earlier in
pseudocode, there will be overlapping entries. To select the correct entry, the
scanner not only takes the leftmost token, but it also takes the widest token.
I have rather slyly postponed this discussion because I wanted to show how to
stick to an ideal as long as possible, and then compromise. This issue could be
a bug, but it isn't as long as the anticipatory loop selects the widest token. And it is
an example of the kind oflow-Ievel and painful issues that arise in lexical analysis.

Scanner Object Design Considerations


This section discusses the scanner object model, certain core procedures found
in this and other compiler objects, and the display of the scanner's internal data
in XML. It will then address the scanner's event model. The goal is to introduce
principles used in the compiler to achieve solid design and reliability.
I wanted to separate this book from an academic book about compilers, not
by ignoring theory, but by a greater appreciation of the need, in the real world, to
deliver software that works and is delivered on time. 7

The Scanner Object Model


For the scanner and the other objects in the compiler, I have developed an object
model, which is a statement of what is in the object's state.
The scanner's object model consists of the raw source code, which is preserved
in the state as a string variable and a collection of zero, one, or more tokens, where
each token is an object of the type qbToken. The object model of a qbToken consists of
the following:

• The start index of the token in the source code

• The length in characters of the token

7. I find that while I am at times late with my software, especially when I have no say on the
delivery date or am being passively aggressive, what I deliver is sound, as long as I have taken
the time to use a structured approach. That way, I don't get any business as a "consultant"
because the software works--<Jops. Seriously, the time-to-market statistic doesn't completely
capture software quality.

120
The Lexical Analyzer for the QuickBasic Compiler

• The end index of the token in the source code (which, as a property of the
token, can be calculated from the start index and the length)

• The type of the token: identifier, operator, parenthesis, string, stand-alone


special character, and so forth

Note that there is no need to store the actual value of the token-doing so is
a waste of space. The source code is a part of the scanner state, as is the start and
length of each token; for this reason, we can always get to the token by finding
the Mid (substring) of the source code using the token's start index and length.

Core Object Design


The scanner implements a core object design approach that I find very useful. In
this approach, we implement a core, favored set of properties and methods. This
set has fuzzy boundaries (in the sense that some might not be present in some
objects developed in the methodology) and thus cannot be implemented using
an interface, but conceptually, it is like an interface.
Let's take a look at how the core object design classifies objects and uses the
core procedures of qbScanner: Name, inspect, object2XML, and test. Then we will
look at the simple event model of qbScanner.

Classification of Objects
The approach classifies objects into stateless and stateful objects. Stateless
objects consist of pure code. These objects define static (shared) methods and
properties, and never need to be created using New.

NOTE A good example of stateless objects in the code for this book is utili-
ties.dll, a large collection of string handlers, math gizmos, and other methods
I have found useful over the years.

Statefulobjects, on the other hand, have variables in their (O/JI1lon Declarations.


To execute any property or method of a stateful object (that isn't declared as
Shared), you must create an instance of the object. Many of the stateful objects
that implement the compiler also expose as Shared procedures, which do not
interact with the (ommon Declarations. These procedures are typically tools with
a close association with the object. For example, qbVariableType represents, in its

121
ChapterS

state, the facts about a variable's type, but it also exposes a set of shared methods
for working with types in the abstract, including a shared method for telling
whether two types enclose each other.

The Name Property

Tell me sweetie, what's my name?


-The Rolling Stones, "Sympathy for the Devil"

All stateful objects have a Name property so we can identify different objects in
output. Name defaults to classNamennnn date time, where nnnn is the sequence
number of the object as created in the process.
Take a look at the New constructor for qbScanner in qbScanner.vb. Note that
it references a Shared variable named _INTsequence. It starts with an underscore
because it is shared. It then contains the Hungarian prefix for its type INT, and
then contains a descriptive name, sequence. I uppercase the Hungarian prefixes
of variables in the Common Declarations section.
We use the threading model to increment the sequence number for the
default Name, since to add one to it would make the object unusable when run-
ning more than one instance of the object in multiple threads.

The inspect Method

An important method of stateful objects is the inspect method. This executes


a series of assertions on the variables in the state of the object.
In most objects, the state is organized as a user data type of the type TYPstate
and named USRstate. In some objects, the state is organized as a small object with
its own state, containing variables instead of procedures. The latter technique is
used when the object needs to be fully threadable.
The qbScannerTest program's GUI provides access to the inspect method.
Click the Inspect button (see Figure 5-7) to see the message box informing you
that the test scanner has succeeded.

inspection of the test scanner has succeeded: dck Yes to VIe\V the report: ck:k No to retum to the
main form

Yes No

122
The Lexical Analyzer for the QuickBasic Compiler

This message must appear. I mean it. This is because inspect, in qbScanner
and the other stateful objects shipped, checks for errors that would be the result
of serious internal problems.
Let's look at what it checks. Click Yes in the message box to see the inspec-
tion report, as shown in Figure 5-10.

:rnspect1.on of "qbsc&JUI",,0001 3/2/2004 8: 21: 08 I'M" at 3/2/2004 8: 31 :33 I'M


The object OIU.t be u.able: OK
Eacb token in both the array of scanned tok.t!lna, and the e.rrAy of pendiD9 tokens .. IlUst pa•• the i.n~t
pr~ of qbToun. The eolu!!ns 1.n the 'CAlUloc:\ array .".t be in . ."""dug order w1.th gap. OK but DO
overla.p. No tokeu ' • end indoz: (in either array) aay point beyond. the end of the source oode iP elthe-r the
scanned array, or the pendlnq array: OK
The It"e ".-ber .u.e be greaeer th.... or equal eo 0: OK
The fOX*lt of the line nu.ber index collection au.t be valid: this i. a collection of three ... itea
.ubco11ectio"•. lte8ll) lIU.t be a .tring' oontaining the Itey of the index entry: it_(2) and i_(3) IlU8t
be positive integers: itea(2) can·t be zero: OK I
I f tbe 11onnu11 code is fully scannoc:\. the first tolu!!n· • .tart index .bonld be the ...... AI the po.ition of
the first noDbl&nlc.. cha.r.oter in the sou.roe code. The la.t token '. end index should be tho .aIIe aa tho
position of th6 la.t. nonblank charaoter.

If the code is null and t.n.dicated u fully sCMned the scan count .ust be zero.: OX

Figure 5-10. Inspection report

Four assertions have been tested against the state, the General Declarations
variables of the qbScanner instance. Figure 5-11 is the declaration of the object
state, as the TYPstate structure followed immediately by the state instance,
USRstate. These assertions concern usability, the scanned tokens, line numbers,
and the source code.

I ***** State *****


Private Struoture TYFstate
Dim strName As String Objeot name
Dim booU.able As Boolean , Object usability
Dim strSouroeCode As String , All source
Dim intLast As Inteqer • Index ot last token parsed
Dim objQBtoken() As qbToken.qbToken Some or all tokens
Dim objTokenNext () As qbToken. qbToken • The pending tokens
Dim intLineNU1!Iber As Inteqer • Current line nU1!lber
Dim oolLineIndex As Collection • Line index:
• Key is _lineNU1!Iber
• Data is subcolleotion:
, Itam(l): line nU1!lber
• Item(2): start index
• Item(3): lenqth
Dim booScanned As Boolean • True: indioates a oompleted scan
End Structure

Figure 5-11. The state ofqbScanner

123
ChapterS

NOTE These rules should not fail. If they do, this means one of two things.
Either my ham-fisted original code has a bug (otherwise known as an issue or
feature) or you have modified the code. Ifit is my bug, send me e-mail at
spinozallll@yahoo. com. If it is your bug, fix it.

Usability

The first assertion is that the object must be usable. Usability is part of the core
approach.
Object-oriented design with stateful objects raises an interesting problem.
This problem existed before "objects," but only object-oriented design gives us
a way of facing the problem squarely.
Old programs of the legacy sort often have thousands of variables and con-
ditions. Often, only a few combinations of these variables and conditions are
actually valid. Statistics, along with Murphy's Law, predict that these old pro-
grams might enter a state-a combination of values-which is unexpected, and
of course, they do.
The mentality of a non-object-oriented programmer in a language like C is
that "my program is special and will not enter a bad state-ever." This neglects
the fact that, as hero computer scientist Dijkstra pointed out, anyone program is
best viewed as a set of related solutions that evolves over time. A payroll program
is a member of related solutions, some of which the user wants, some of which
the user would prefer, some of which the user will put up with, some of which
the user will need next year, and so on-you get the picture.
This means that in a non-object-oriented language, it is just too easy to add
variables over a life cycle in such a way that invalid combinations occur. Many of
these combinations are benign tumors in the sense that they don't change results;
others are malignant.
Using an inspect procedure, especially if it is executed automatically or at
a regular interval, can tell the object if it is "sane" and has valid combinations of
values (depending, of course, on how many conditions are actually tested). What
is interesting is what the object can do if it does find a problematic state. The
object, unlike legacy code, knows it is no longer in a state of grace. And, unlike
legacy code, the object can do something with this knowledge. (Note that this
discovery has nothing to do with compilers but everything to do with building
good software.)
It can, as the objects in the compiler do, set a variable, in its state (called
booUsable in the code) to False.
Whenever a Public procedure (property, method, or event handler) is exe-
cuted subsequently, the object can raise an error and return a sensible default,
instead of making life worse for itself and the rest of the world by returning
bogus results, or, worse, doing damage to data outside itself. Therefore, at the

124
The Lexical Analyzer for the QuickBasic Compiler

conclusion of the New constructors for objects with state, we explicitly inspect the
initial state, and, if it is valid, we set usability to True.
Many objects with state contain reference objects that occupy .NET's heap
storage, and we consistently follow a .NET rule. This is to expose a Public proce-
dure (called, consistently, dispose in our code), which the object user is urged,
on pain of 20 lashes with a wet noodle, to use when the object is no longer
needed. The original purpose of dispose was to ensure that reference objects
did not clutter the CLR heap unnecessarily. However, in the QuickBasic com-
piler code, objects consistently self-inspect when dispose is executed as a sort of
global sanity check.

Scanned Tokens

The next assertion is about the tokens that were scanned. Since each is a stateful
object, each must pass its own qbToken. inspect procedure. Also, the tokens must
always form an ascending series of nonoverlapping tokens. They don't need to
be contiguous and won't be contiguous in general, because blanks may appear
between them. If this rule is violated, the entire meaning of the scanner has
been damaged.

Line Numbers
The third rule is trivial compared with the other rules. The line number starts at
one. Therefore, it must start at one. It can't be zero or negative.

Collection Structure
We need a collection structure rule whenever we use the old standby Collection
object to structure a tree or other data structure, by including collections as
members in collections. Here, the collection is an index to map line numbers to
character positions. The key of colLinelndex is the line number, prefixed with an
underscore (because the legacy collection would otherwise treat the line num-
ber as a numeric index and not a key). Its data is the line number, the start
index, and the length of the line. This is represented as a three-item, unkeyed
sub collection.
In developing this collection with structure, we are performing object design
without explicit stateful objects such as "line number index entry" and "collec-
tion of line number index entries." Logistically, it is unnecessary to go crazy and
develop a full-dress object for each and every potential object. Logistically, it cre-
ates some source bloat because a project and a project's files exist for each
possible object. Therefore, it makes sense for simple objects, such as an object
that maps line numbers to character positions, to use a collection in a structured
fashion. However, it does require inspect to check the structured collection for
correct structure, since this is not enforced by an explicit object model.

125
ChapterS

Collections of Collections
You'll see more of these collections-on-steroids in the code; get used to them.
Ordinarily, we think of the collection as a hash table or one-dimension array.
However, the true power of the collection is obtained only when you realize
that its entries can be objects, in particular, collections.
Because collections can contain collections, a collection can represent not
only a simple array of basic data type but also an array of records, as has been
shown here.
Recall the bnfAna1yzer tool introduced in Chapter 4. It stores the "parse tree" of
the input BNF in a collection, which contains collections.
Of course, we can go overboard in using the collection in this way. I may have
done so, in fact, in bnfAna1yzer. It might have been actually better to develop
the parse tree as a separate COM object (recall that bnfAna1yzer uses COM).
However, COM objects don't playas nice as .NET objects do, and I wanted to
get the project done quickly.

Source Code

The final inspection rule tests the source code against the scanned tokens. The
first scan token must coincide with the first nonblank character in the source
code; the last scan token must coincide with the final nonblank characters in the
source code.

The object2XML Method


Another core method, named object2XML, converts the USRstate of the object to
XML. XML allows us to display clearly the current state of the object.
In the qbScannerTest program's GUI, click the button labeled Object to XML
(see Figure 5-7). You will see the zoomedXML display, as shown in Figure 5-12.

126
The Lexical Analyzer for the QuickBasic Compiler

<1--
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 • • • • • • • • • • • • • • • • • • ,. • • • • •

.. f#>Sc:anner

.. The qbSaanner 01.". ScaD. i.aput source code and provide., OD c:ir&a.nd, 5QAnned ..
.. source to.Jc.elU and lines of aourco code. Thi. c1 ••• u.e. "la~y" evaluation,
.. seannioq the source code on.ly when DeCCI.a.ry. and when an UDparSed token t. ..
• roquested.
I
·
• Thi. 01 .... ..... developed _nc:inq on 4/30/2003 by

• !dward G. /lUg".
[email protected]
• bttp://.-ben .• 0<ee1>1.0OlO/edNHge5

. . . . . . . . . . . . , • • • • • • • • • • • • ' . . . . . 6 • • • • • • • ,. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

--> -
<qbSoa.nnor>
<, --
Object instance Daile -->
<NAIIe>qi>SC5l1D""OODl 3/2/2004 8: 21: 08 PH</II.....,.
<1-- Tn.. : objoot h u.abl" -->
<o.ablo>Tno</O.5blo>
<1-- SourOfli code (trnneatod to 100 cbaracter. -->
<SouroeCodo>
bpo.; ''''''P ; : • identifier ''''''I';.00013,_;.00D10+ - • / ( I ; 1
iquot;.trtng'q\lot; 32767 -32761,,-12 , 1 • $ 'q\lot;Tblo otring' is
'q\lot;'quot;fllDc:yiquot;iquot;. 'q\lot;
</Sourcecod,,> !
<t- - L.aat token array entry in u.e -->
<Laot>O</L... t>
<1-- Toltal,. paraod (truncated to 100 tokeD(.) -->

Close

Figure 5-12. The output of the object2XML method

This XML has been formatted for easy readability, and it is heavily com-
mented. In particular, the paragraph that describes the class is also available as
the value of qbScanner's read-only, shared About property (another core proce-
dure, which in most objects, will supply information about the purpose of the
class, as well as my name, e-mail address, and Web site).
Options of the XML object, corresponding to check boxes and text boxes on
the scanner test form, allow you to suppress either the leading box comment or
the line comments that describe each state variable.

The test method

Clicking the Test button in the qbScannerTest program's GUI executes the test
method of the scanner, which presents a test instance with a string containing
all possible tokens and some marginal difficult cases (see Figure 5-8, earlier in
the chapter). It compares the serialized list of actual results with the serialized
expected results. If they match, the object does not complain. If they do not
match, the form will display an error message, and the object will mark itself as
not usable.

127
ChapterS

Is this a bit much, or what?


An alternative, used in many MIS applications, is for a separate tester object
to test the class. This is a good alternative; however, it creates logistical problems
in that it doubles the count of overall projects. Also, this approach fails to paral-
lel the inspect method, which is the static correspondent to the dynamic test.
Additionally, it sends the object into the cold, cruel world without the ability to
self-test. Therefore, when you are examining a suspicious object, in the cold,
cruel server room at 3:00 AM, you don't have this particular resource.
For this reason, many objects delivered with the compiler expose a test
method. The inspect and test methods lower error probabilities, in the same
way my schoolmates and I lowered the probability of errors in Sister Mary Rose's
arithmetic classes. Sister Mary Rose made us subtract the addend from the sum
to see if the other operand of the addition was the result. Sister Mary Rose made
us add the subtrahend to the difference to see if the second operand resulted.
This was a pain, but it was a genuine error check. Given the frequency of errors
in our business, it makes sense to add code such as inspect and test to provide
the same sort of check.
Code checks don't guarantee bugs won't exist, of course. But Dijkstra reminds
us (strangely for his reputation as an ivory tower theorist) that computer science
is applied mathematics. 8 In building a bridge, we don't neglect additional checks
merely because they are extra steps that might be wrong, because in the real world,
they lower error probability. The same goes for programming!

The Event Model

I ain't evil, I'm just good lookin'


-Alice Cooper, "Feed My Frankenstein"

A concern in the scanner (and, as you will see in Chapters 7 and 8, also in the
compiler and the interpreter) is the ability to accurately report progress in scan-
ning, parsing, and interpreting large source programs. I wanted to avoid the
irritating and vague progress reports we sometimes see in Windows.
At the same time, it is a bad mistake to make an object with code that builds
forms, but whose mission is not to draw pretty forms on the screen. This is
because this code must then import and reference System. Windows. Forms, which
bloats it for no good reason, and worse, locks the code into the Wmdows client
environment.

8. Dijkstra was less an ivory tower theorist than someone who actually believed that you cannot
separate theory from practice, high-level design from mindless code, and so on. Perhaps for
this reason, two of his results (structured programming and semaphores) are actually useful
to ordinary slobs.

128
The Lexical Analyzer for the QuickBasic Compiler

If, all of a sudden, you decide that the code in question would make a spiffy
Web service, you are in a world of hurt when the object, like the scanner or the
quickBasicEngine compiler itself, is large and complex. You must go through the
object and find each and every line of code that has to do with presentation and
make this code conditional on the mode of presentation.
You wind up with Frankencode, a dismal monster howling on the blasted
heath for its author's ass, because it knows, as did Mary Shelley's famous mon-
ster, that its life has been destroyed by its very fabrication as "a thing of shreds
and patches." Unlike Alice Cooper, singing "Feed My Frankenstein" in Waynes
World, your code might be good looking but will be evil.
Therefore, we need a way to separate presentation from logic and to have
a way for the nonvisual object to display its progress. One way would be to
have the presentation logic inherit the nonvisual object. This makes some
sense in a language that allows mUltiple inheritance. However, Visual Basic
doesn't allow multiple inheritance, meaning that the presentation logic can
present only one object. Also, it doesn't make much sense to say that a mere
progress report is-a compiler. This, in Shakespearean terms, dresses the
progress report in "borrowed robes."
Instead, we use an event model in the scanner and elsewhere to transmit
events, which can be ignored, used to display progress on a Windows form, or
used to display progress in a Web service. Figure 5-13 shows the event model of
the scanner. Note that to actually obtain these events, qbScanner must be declared
using the WithEvents keyword and inside General Declarations.

, ***** Events *****


Publio Event soanEvent(ByVal objQBtok n As QBToken.qbToken, _
ByVal intCharaoterIndex As Integer,
ByVal intLength As Integer,
ByVal intTokenCount As Integer)
Publio Event soanErrorEv nt(ByVal strMSg As String, _
ByVal intIndex As Integer,
ByVal intLineNumber As Integer,
ByVal strHelp As String)

Figure 5-13. qbScanner event model

The scanEvent event fires each time a new token is found. It provides the token
object, its start index, its length, and the total number of tokens found so far. The
value and type of the token object is found using its properties. This allows the GUI
to extend a progress bar, highlight the code being scanned, or both.
The scanErrorEvent event fires each time a user-related error such as unrec-
ognizable characters occurs. It describes the error, identifies where it occurs by
absolute character position and line number, and, in some cases, provides addi-
tional tips.
I have followed the object-oriented practice described in this section consis-
tently in all stateful objects of the compiler. When I make additions to the core
129
ChapterS

set, I will note them in the book. In particular, further objects will incorporate
a test method, which will, inside an object instance, allocate a test instance as
a local variable and then run a series of prepared tests. This will allow me to
expose on object test forms, similar to the form of qbScannerTest, a Test button
that runs portable regression tests, not inside the form (as is the case here), but
inside the object.

Summary
This chapter described the development of the first major objects (qbScanner,
which has qbTokens) of the compiler for its first task: lexical analysis. We have
used modern techniques to support a legacy language because object-oriented
development makes compiler development a much more visible and less arcane
process.
Before object-oriented development, developing compilers involved a great
number of tables interlinked in complex ways. They had a tendency to get into
combinations of states that resulted in bugs, some of which were exploited by
the compiler's user community and became features.
Object-oriented development does not dramatically increase the speed at
which compilers are developed. In this case, I promised Dan Appleman (this book's
editor) that I would refactor and make more intensively object-oriented the orig-
inal compiler for QuickBasic that I had demonstrated to him at the Visual Studio
rollout festival in 2002. I had decided to do so because it is very hard to explain
a compiler in depth without showing that it is made up of distinct modules and
without using the Windows form to exhibit internal behavior.
Refactoring each object demanded a heavy investment of time, not just in
coding, but also in preparatory documentation. In the preparatory documenta-
tion, I defined the object model, the supporting object state, and the behavior
of each Public procedure. I implemented the core procedures, including inspect
and object2XML. I built a form to show off the object, which is a pain when you're
born to code and not to be a glorified Etch-a-Sketcher.9
Too often, quality is mapped onto time to market. Dan told me he wanted
a quality book, with quality software, and I hope that the use of the object-
oriented paradigm here and in the rest of the book will ensure this. The tlyover
mini-compiler of Chapter 3 and the bnfAnalyzer COM object of Chapter 4 were
merely small applications by comparison.

9. In particular, it made me crazy to have to select label colors. What is the color of a scanner?
What is the color of a variable type? "Colorless green ideas sleep furiously" (Noam Chomsky).
The overall goal is to have each form highlight its labels with a memorable primary or bright
color, like a property card in the game of Monopoly. Thus, the scanner's color is dark blue.

130
The Lexical Analyzer for the QuickBasic Compiler

Challenge Exercise
From the code of the scanner and/ or the text of this book, reverse-engineer the
regular expression that defines each token type supported: identifier, operator,
number, string, and so on.
For example, a string is defined using the regular expression:

"([A"]*(""){O,l})*"

It defines a Visual Basic format string as a series of nonquotes, followed by an


optional doubled double quote (where the bracketed 0,1 makes the parenthe-
sized doubled double quotes optional), repeated zero or more times and enclosed
in quotes.
Whenever you see a regular expression, treat it with suspicion based on what
has been said in this chapter. Does the regular expression contain ambiguity, in
the form of pairs of elements a, b, such that one or characters that can end a can
appear at the start of b? It appears not, since if an element is informally under-
stood to be any sub expression that is a regular expression in its own right, there
are only two elements in the above regular expression:

[A"]* and (""){O,l}

These do not have the problem of ambiguity.


Try this expression out using relab.
Develop the remaining regular expressions for each one of our token types:
apostrophe, ampersand, colon, comma, identifier, operator, parentheses, semi-
colon, string, unsigned integer, unsigned real number, pound sign, dollar sign,
and newline.

Resources
For more information about regular expressions and compiler design, refer to
the following:

Regular Expressions with .NET, by Dan Appleman, (electronic publi-


cation, available from http://www.amazon.com/exec/obidos/tg/ detaill
-/B0000632ZU). This publication provides the .NET rules for regular
expressions.

Mastering Regular Expressions, Second Edition, by Jeffrey E. R Friedl


(O'Reilly, 2002). This book provides an in-depth look at regular expres-
sions, primarily in the Unix world.

131
ChapterS

Advanced Compiler Design and Implementation, by Steven S. Muchnick


(Morgan Kaufmann, 1997). This book gives a comprehensive, in-depth
discussion of high-intensity source and object code optimization for (I
fear) yesterday's high-performance Unix servers (which are being pres-
sured from one direction by Iinux servers and from Redmond by Wmdows
2003 server). These servers are often RISC (Reduced Instruction Set)
machines. Compilers for these systems are fascinating, not only because
they often need to do more work in object code generation for RISC, but
also because for high-performance applications, they need to do a vari-
ety of optimizations.

132
CHAPTER 6

OuickBasic
...
Object Modeling
The law wishes to have a formal existence.
- Stanley Fish

In whatsoever mode, or by whatsoever means, our knowledge may relate to


objects, it is at least quite clear, that the only matter it relates to them is by
means ofan intuition.
- Kant, Critique ofPure Reason

Under the paving stones lies the beach!


- Assorted, if not motley, French students of 1968

IN THE PREVIOUS CHAPTER, you saw how the lexical analyzer, or scanner, trans-
forms the raw characters of source code into a stream of token objects, where
each token object has a start index and a length. In the next chapter, you'll see
how this stream of token objects is converted to a nested structure of BNF gram-
mar categories, as described in Chapter 4, while also emitting output code for
a "machine," which exists purely as a software simulation of the Nutty Professor
machine.
But before we get to the flagship object quickBasicEngine in the next chapter,
we need to build two .NET objects, qbVariableType and qbVariable, to represent
data types and their values. And to do that, we need to model the data, since it's
always a bad idea to develop a language- whether for the .NET CLR or any other
platform-without a clear idea of how to represent the values and types of values
of the target language. We did not need to concern ourselves with these issues in
the flyover compiler of Chapter 3, because all the values in that example are num-
bers that were easily mapped to the double-precision number type (which is able
to handle integers as well as real numbers). However, in scaling up to our QuickBasic
compiler, we need to do some hard work. We want to make sure that no variable
is instantiated in our compiler without complete, strong typing. The payoff is
that all parts of the quickBasicEngine speak the same language about variables.
In this chapter, we will go through the same cycle of design, code, and test as
in previous chapters. The GUIs for testing the implementations of the QuickBasic
variable type model will give you a hands-on demonstration of what is, honestly,
rather dry (but necessary!) material.
133
Chapter 6

The Abstract QuickBasic Variable Type Model


QuickBasic variables have a structure that is unique to Basic, because of the vari-
ant data type and the user data type. The variant data type is not unique to Visual
Basic; prior to .NET. Variants appeared in QuickBasic and other Basic compilers
during the 1980s and the 1970s. The user data type, a collection of subordinate
data types organized into a structure (now known as the .NET structure) also
appeared in Basic compilers, including QuickBasic and True Basic (a version of
Basic from the original developers of the language).
Variants, user data types, and arrays can contain each other in complex ways.
Therefore, how support is provided for variable types and values is a nice case
study in object design, demonstrating once again the utility of the core properties
and methods.
The abstract model we must use for QuickBasic variable types can be repre-
sented by a little language for identifying data types. This includes the names
of simple types (known as scalars), such as integer and double, as well as struc-
tured expressions for complex types, such as Variant, Integer (a variant known to
contain an integer) and Array, Integer, 1, 10 (a one-dimensional array of integers
with a lower bound of 1 and an upper bound of 10).
Here, I present a "requirements analysis," which identifies the features our
representation of QuickBasic variables must support. For us, QuickBasic variable
types come in six different categories: scalars, variants, arrays, user data types
(UDTs), null, and unknown. We'll look at the requirements for each type in the
following sections.

Scalars
Scalars are simple Basic values that can be anyone of the following types:

• Boolean-True or False

• Byte-Integers in the range 0..255

• Integer-Math integers in the range -32768 ..32767 represented as two's


complement values in a 16-bit word

• Long-Two's complement values in a 32-bit word, in the range


-21\31..21\31-1

• Single-Single-precision numbers in floating-point notation

• Double-Floating-point values with a wider exponent and/or a wider


mantissa

• String-Strings of characters
134
QuickBasic Object Modeling

NOTE Older Visual Basic developers may remember that strings in Visual
Basic through release 3 were limited to 64KB, in all probability because the
C-language runtime represented string length in an unsigned 16·bit C integer;
which can range from 0 to 211.16-1 (64KB). Quic/cBasic shared this limit before
release 4, which produced interesting bugs and fascinating hacks for longer
strings. For example, in those pre-object days, I wrote a procedure in a classic
Visual Basic module that stored 64KB chunks in an array. Visual Basic 4 made
it possible to rewrite this as a "long string" object much more elegantly and
without exposing the array, but it also removed any need for the object, since
Visual Basic 4 increased the string limit to about 211.32.

Some scalars involve two's complement or jloating-pointnotation. The integer


value range from a negative, even number to a positive, odd number looks wrong,
since it appears to be one less than it should be. The oddity of the range results
from the fact that most modern computers use two's complement to represent
integers. This notation represents positive numbers, in binary, as you might
expect, padded to whatever the word length happens to be. In this notation, using
a 16-bit word,l the number 4 is 0000000000000010. Negative numbers, however,
are represented by inverting each bit, placing a 0 where you expect 1 and 1 where
ois expected. For example, -4 in a 16-bit word becomes 1111111111111101. It all
works out, in a Rube Goldberg fashion, as long as you remember that the range of
values, for a 16-bit word, is between -211.15 and 211.15-1. The most attractive feature
of this notation is that it avoids the possibility of negative zero, which in a straight,
non-two's-complement notation would be one followed by zeros. That would
complicate both hardware and software operations.
Floating point is the notation used by mad scientists and disturbed engineers
to represent that-which-is-Iarge and that-which-is-small, such as the size of the
universe, Donald Trump's bank balance, the mass of an electron, or my bank bal-
ance. A floating-point number such as -1.2e-3 consists of a sign, a mantissa, and
an exponent (consisting of the sign and value of the exponent). The sign in the
example is negative. The mantissa consists of just the numbers and not the deci-
mal point to the left of the e divider; in the example, the mantissa is 12. We sort of
drop the decimal point, because the exponent takes over the function of the deci-
mal point. The end of the floating-point number consists ofthe letter E (or e),
another optional sign, and an integer number. This is the exponent, and it usually
takes over the function of the decimal point (although the decimal point can be
used, as in the example, in combination with the exponent). Mathematically, the
exponent specifies a power of ten as applied to the mantissa to arrive at a final
value. The value is M(lOll.e), where M represents the mantissa with the sign and

1. QuickBasic shared this surprisingly narrow integer range with Visual Basic releases 1 through 6.
Its narrowness results from the fact that in QuickBasic's salad days, microcomputers still often
worked in words (units) of memory that were only 16 bits long.

135
Chapter 6

decimal point left in, or (when the decimal point is unspecified) implied to the
left of the mantissa. Here is an example: -1.2e-3 =-1.2 * 10"-3 =-.0012.

Variants
You are probably familiar with the variant, which is a variable no longer supported
by .NET. Considered strictly as a type, the QuickBasic or old Visual Basic variant
is a container for another type. The contained type can be a scalar, a null, an
unknown, or even an array or UDT, but it cannot be another variant type. Variants
cannot nest within each other.

NOTE If a lIariant could contain a lIariant, this would raise the hard-to-
model possibility ofmultiple-lellelllariants. Code using such lIariants would
be hard to debug, and in the absence of object-oriented design, complex lIec-
tors and tables would be needed. Howeller. this situation is easy to model in
object-oriented design. The lIariant-containing lIariant would simply contain
an object: a distinct lIariant. One problem would be alloiding loops, presum-
ably in object inspection, where the same object appears more than once and
the lIariant directly or indirectly refers to itself.

We need to model two types of variants for different uses:

• Concrete variant types: These contain a known contained type such as


"is a variant integer" or "is a variant string."

• Abstract variant types: These contain an unknown type, which we can


represent because of our unknown variable type (described shortly in the
"Null and Unknown Types" section).

We need the abstract variant for arrays of variants because, in the model,
we cannot declare that an array type is "array of variant integer." In QuickBasic,
a variant array is declared simply as a variant, and it's not possible to declare that
"my array is of variants that must contain integers."

Laying to Rest Urban Legends About Variants


There is an urban legend concerning the variant: it is said to be slow. Of course,
"the variant is slow" isn't even a grammatical statement. What the sayer means
is code that uses variants is slow, because it uses variants, of course.

136
QuickBasic Object Modeling

In the pre-.NET runtime, each variant had to carry type information, familiar to
coders of APls, and each variant was a vector of storage. As such, some extra code
had to be executed to get to the value of the variant, or to "unbox" the value. And,
of course, the extra bits took space.
But note that experts on software performance, including Steven Skiena in his
1997 book The Algorithm Design Manual, urge programmers to avoid penny-
wise, pound-foolishness. The major determinant of efficiency is, as Skiena shows,
the overall form of the module's execution time formula. For example, if it mul-
tiplies the number of input records by itself in an MIS program, the program
will run fine for small sets of test data, but it will crash when it goes live.
Many MIS programs still have a classic loop form, which means that their exe-
cution time formula is the multiplication of the constant time for processing
a record times N, where N is the number of records. Skiena's lesson is that you
don't want this to be multiplied by N itself. For example, it's obvious that when
sequentially processing a large table, you should not search the entire table or
a table of nearly equivalent size for each record in the table.
The straightforward formula N" K for the efficiency of a program (number of
records times a constant time) will not be substantially changed by replacing
a scalar with a variant, especially when there is a good reason for doing so. The
replacement does not alter the overall execution time formula; it changes only
the time for processing a record by a small, fixed amount.
In COM, there was often a good reason for using variants: they provided a lim-
ited object-oriented capability. They were useful for a primitive, and admittedly
unsafe, form of polymorphism when the COM programmer needed to represent
fuzzy data. For example, a real-world application might exist with an orderQty
(order quantity) in which the user needs to represent a fixed and known num-
ber, a completely unknown value, or a range of values. An example is that some
customers might like a minimum quantity in cases where the order cannot be
fulfilled from the warehouse in full and on time. If the variant effectively and
in context represents this situation-as an integer, a null for the completely
unknown scenario, a string representing a minimum quantity, or even a com-
plex formula (translatable by the compiler itself as a business rule, as I will
show in Chapter 9)-then the variant is a technique for representing an object
in a rather lightweight fashion.
It is also said that variants take too much space. However, depending on the
type, they take a small, constant amount of space. Variants take too much
space only when their additional bytes are repeated in large arrays.
Urban legends about variants can therefore be laid to rest.
However, there are many stressed-out programmers trying to maintain code in
which the overuse of the variant has created a toxic smog. In this toxic smog,
you cannot tell when you look at vntFoobar what it might contain! If you have
experience with older Active Server Pages (ASP) or Visual Basic for Applications
(VBA), you know what I mean-in VBA for ASp, everything was a variant, and it
drives you nuts.

137
Chapter 6

I use polymorphously perverse variants in my remaining COM efforts only after


I've exhausted alternatives and slain a goat. In many cases, I've implemented an
inspect method, which actually checks the variants to make sure they have only
the expected types. For example, the inspect method will run when the object is
terminated and check the example, orderQty, for a number or properly format-
ted string. Of course, this is the cure that might kill, since the inspect method is
more code. It was motivated by my rage to use an object-oriented approach,
which commenced in 1990 when I saw that the UDT of C would not be equal to
the challenges of the coming decade.

Arrays
Arrays should have the following properties:

• Dimensionality: This is unlimited in the model but is usually in the range


1..3. In our model, all nonarray data types should return a dimensionality
of 0 as a quick test to see if the object represents an array.

• Lower bowtd: Although the lower bound can default to 0, in QuickBasic


the lower bound of an array can be any integer, negative or positive. This
is the same rule found in Visual Basic COM.2

• Upper bowtd: This denotes the variable type of each array entry, as a con-
tained qbVariableType (each qbVariableType has a qbVariableType delegate).

User Data Types (UDTs)


UDTs should be modeled as a set of one or more members. Each member
should consist of a member name and its associated qbVariableType.
UDTs can contain the following types:

• Scalars

• Variants

• Arrays

• UDTs

2. This feature has been largely found to be useless, or not useful enough to warrant modifying
the architecture of .NET.. NET architecture is based on the runtime semantics of C, and C did
not include this dubious feature.

138
QuickBasic Object Modeling

Unlike variants, UDTs are recursive and can contain UDTs. Within a QuickBasic
UDT, a COM Visual Basic UDT, or a .NET structure, the As clause can be itself a UDT,
with one exception: it cannot be the same UDT as is being defined.
UDTs cannot contain unknown or null data types.

Null and Unknown Types


The null and unknown types are unique in that their value and type are the
same. Null is used only as the default value of a variant, and it corresponds to
Nothing in .NET.
I added the unknown type, since I am developing a method for interpreting
both full-scale programs and business rules using partially unknown values, in
order to be able to manage the effect of using legacy code and complex business
rules. Unknown represents the fact that the variable's type is not known. It's use-
ful as the contained object of the variant that is contained in an array, where we
do not know the type of all of the array's elements.

Type Containment and Convertibility


In Table 6-1, you can see which types can contain which types, or the valid con-
tainment of types in the model.

Table 6-1. Containment of Types


Type Can Contain
Scalar Variant Array UOT Null Unknown
Scalar No No No No No No
Variant Yes No Yes Yes Yes Yes
Array Yes Yes No Yes No Yes
UDT Yes Yes Yes Yes No No
Null No No No No No No
Unknown No No No No No No

A separate issue is the convertibility of types. The difference between con-


tainment and convertibility is that type a is contained in type b when an instance
of type a can contain a reference to an instance of type b. JYpe a is convertible to

139
Chapter 6

type b when all possible values of type a can, without loss of information or error,
be assigned to a variable of type b. Table 6-2 shows the convertibility of types in
the model.

Table 6-2. Convertibility of Types


Type Can Convert To
Boolean Byte Integer Long Single Double String Variant Null Unknol«!
Boolean Yes Yes Yes Yes Yes Yes Yes Yes No No
Byte No Yes Yes Yes Yes Yes Yes Yes No No
Integer No No Yes Yes Yes Yes Yes Yes No No
Long No No No Yes Yes Yes Yes Yes No No
Single No No No No Yes Yes Yes Yes No No
Double No No No No No Yes Yes Yes No No
String No No No No No No Yes Yes No No
Variant No No No No No No No Yes No No
Null No No No No No No No Yes Yes No
Unknown No No No No No No No Yes No Yes

Note that two arrays cannot be converted to each other in this sense, because
the assignment of an entire array is not supported, and because changing any
attribute of an array makes a distinct type. While the assignment of UDTs is sup-
ported, this is only possible when their list of members is identical, thus no
conversion is involved.
Figure 6-1 shows some examples of how the object should behave. In the fig-
ure, boxes with a heavy border represent full-scale qbVariab1eType objects, with
state; boxes with a light border denote the "lightweight" enumerator (ENUvarType)
that represents the variable categories as one of unknown, null, Boolean, byte,
integer, long, single, double, string, variant, array, or UDT.

140
QuickBasic Object Modeling

Figure 6-1 illustrates four scenarios:

Integer
Array of Integer

lot"" II Dim LB UB

Variant Contains Integer


lot~" II

Array of Variant

lot"" II Dim LB UB

Variant

Figure 6-1. Type object scenarios

• An integer will be represented by a stateful qbVariableType containing the


value of the integer.

• A concrete variant that contains an integer will be represented by a stateful


qbVariantType with a reference to a stateful qbVariantType for the integer.

• An array of integers will be represented by a qbVariantType that includes


dimension as well as a list of lower and upper bounds. This qbVariantType
will reference an ordinary integer box.

NOTE The list of lower and upper bounds for an array is sometimes referred to
as a dope vector. This has nothing to do with the Three Stooges or Five Stupid
Guys. It merely provides the "dope" about the area: the information.

141
Chapter 6

• An array of variants will also be represented by a qbVariantType that


includes dimension and bounds. This qbVariantType will reference an
abstract variant that contains the unknown type.

Type and Value Serialization


Both the qbTypeVariable object and qbVariable object (which contains the type,
value, and name of a QuickBasic variable) will allow full and reversible serializa-
tion of both variable types and variables. Serializing an object refers to creating
an image of the object in the form of ASCII text. Ideally, the ASCII text should
be totally restricted to printable characters (characters available on common
keyboards and displayed in common fonts such as Times New Roman and
Courier New).
Both objects will expose a toString method, which will return the serialized
state, and a fromString method, which compiles the toString expression back to
the object state. This compilation will use a recursive-descent algorithm for pars-
ing the serialized state, as described in the next chapter. For variables and their
types, but not for all objects in general, the serialization should be palindromic,3
such that the fromString of the object state yields a string, from which the object
state can be fully constructed.
As noted in Chapter 5, in the serialization provided by the core method
object2XMl in qbScanner, newlines are allowed, and the serialization is a multiple-
line list. On the other hand, from any method named toString, developers will
probably expect a string without newlines or XML.
Ideally, the serialization should be completely reversible with no loss of infor-
mation. The object exposes an xm120bject method or a fromString method that
accepts XML or a serialized string to re-create the state with no loss of informa-
tion. This ideal is not always useful. For example, the qbScanner has no foreseeable
need to dump its state and then restore it. However, this need is quite foreseeable
in both a qbVariableType and a qbVariable.
The compiler will need to assemble and communicate both types and values
to the object code, and at runtime, it will be important to provide a complete image
for debugging of QuickBasic types and values. Therefore, both qbVariableType and
qbVariable will need to expose toString and fromString in such a palindromic man-
ner that it will always be true that object2. fromString( objectl. toString) creates
a clone of objectl in object2.
This also implies that both objects need to be cloneable and comparable.
Cloneable .NET objects are objects implementing the ICloneable interface of .NET,
and therefore can return a copy of themselves. Comparable .NET objects are

3. What I mean is that it's based on the palindrome-a string like "aha," which reads the same
forward and backward.

142
QuickBasic Object Modeling

those that implement the ICornpareable interface, and therefore return 1 or 0,


depending on whether objects are duplicates, and in this context, clones.
Let's examine the serialization requirements for the type and value objects.

qbVariableType Serialization
We need a language for describing types, since only the simple scalar types can
be identified by enumeration. There are, strictly speaking, an infinite number of
different array types, because each change in dimension or bounds makes a new
array type.
There are also an infinite number of different UDTs. Variants, which can
contain arrays, only complicate the issue. Therefore, the toString method of
qbVariableType and qbVariable should return an expression that, in all cases, is
acceptable to the fromString method arid re-creates a clone of the original object.
For a simple scalar, null, and unknown type, this may be the name of the
type and one of null, unknown, Boolean, byte, integer, long, single, double, or
string. For a concrete variant known to contain a scalar, unknown, or null type,
this can be an expression of the form Variant, type, where type is the name of the
simple type. For example, a variant that contains an integer is represented as
Variant, Integer.
The fun starts when we deal with arrays. We need to identify the fact that we
have an array, identify the type of its entry, identify its number of dimensions (one
dimension is probably used in 90% of all MIS programs, but you never know), and
in QuickBasic (as forVlSual Basic COM), we need to support nonzero lower bounds.
Therefore, for an array, the type expression should be Array, type,boundList. The
type is the type of each entry. The boundList specifies both dimensionality and
bounds, since it will be a list of 2* n entries, where n is the number of dimensions.
Each list entry will be of the form lowerBound, upperBound. For example, a two-
dimensional array of integers might be Array, Integer, 1, 10,0,5 if it contains ten
rows (numbered 1..lO) and six columns (numbered 0.. 5).
The need for the abstract variant arises at this point. As noted earlier in the
chapter, in specifying a Visual Basic COM or QuickBasic array, you cannot say
"this array is restricted to variant integers." Therefore, the toString/fromString
language for arrays cannot allow the syntax in Array, (Variant, Integer) , 1, 10,0,5.
Instead, the syntax must be Array, Variant,l,10,O,5, and the type should be an
abstract variant that contains the unknown type. 4
A UDT is represented as UDT, typelist, where typelist is a comma-separated
series of parenthesized type expressions. Each parenthesized type expression
should be in the form name, type. In this form, name is the member name, and type

4. I created the unknown type because of an ultimate goal to write a compiler and an inter-
preter for full symbolic evaluation of business rules and source programs with partially or
fully unknown values, but I was delighted to find a use in this context for this feature.

143
Chapter 6

is another type expression in the toStringlfromString language. Each parenthe-


sized type expression must be a scalar, a variant, an array, or a UDT. Consider
this UDT:

Public Type TYPexample


intInteger As Integer
strArraY(l To 10) As String
End Type

This UDT will have the following expression:


UDT, (intInteger,Integer), (strArray,Array,String,1,10).

Objects vs. Tables


UDT expressions, unlike variant and array expressions, can go on forever. A UDT
can contain another UDT, which can contain another, ad infinitum (and perhaps
ad nauseum). Here's another indicator of the benefits of taking an object-oriented
approach to compiler writing: we do not need to manage a stack or a table of
nested UDTs.
Management of tables was a glory and misery of non-object-oriented compil-
ers. An important skill was (and remains, although not necessarily in compiler
writing) being able to write a hash table algorithm for mapping a large set of
keys onto limited space.
Object-oriented development actually replaces much of the need for complex
tables because the relationship between what were table entries, and what are
now stateful objects, becomes delegation and inheritance, and the efficient
management of the data structures becomes the runtime's problem.
This is why it saddens me to see the continued popularity of pure C, because it
preserves the necessity of linked lists and tables in applications where these tech-
niques are not needed. The claim is that the centralized object management is
less efficient. The insinuation is the centralized Framework is some sort of gov-
ernment bureaucracy and is slow code written by dull fellows. This does not
seem to be the case; quite the opposite, I would say.

Figure 6-2 shows the BNF definition of the qbVariableType toString/fromString


language. After you have read Chapter 7, check the code and its procedures that
start with fromString to see the recursive-descent parsing of this BNE It is lexically
analyzed by the qbScanner object described in Chapter 5, which means that the
details of lexical syntax are identical to those of the QuickBasic language, as

144
QuickBasic Object Modeling

implemented in our compiler. Note that this syntax allows a variant to contain
an array, as in Variant, (Array, Integer, 1, 10).

typespecification := base Type I udt


baseType := simpleType I variantType I arrayType
simpleType : = [VT] typeNarne
typeName := BOOLEANIBYTEIINTEGERILONGISINGLEIDOUBLEIS~NGI
UNKNOWN I NULL
variantType := abstractVariantType CO~~ varType
varType := simpleTypel (arrayType)
arrayType : = (VT) ARRAY, arrType ,boundList
arrType := simpleType I abstractVariantType I parUDT
parUDT := LEFTPARENTHESIS udt RIGBTPARENTHESIS
udt := [VT] UDT,typeList
typeList := parMemberType [ CO~ typeList ]
parMemberType := LEFTPAR MEMBERNAME,baseType RIGHTPAR
abstractVariantType : = [VT] VARIANT
boundList := boundListEntry I boundListEntry COMMA boundList
boundListEntry := BOUNDINTEGER,BOUNDINTEGER

Figure 6-2. qbVariableType's toString/fromString language BNF

qbVariable Serialization

The goal of the qbVariable object is to represent the value and reference the type.
Therefore, the overall syntax of the fromString/toString language exposed by
qbVariable should be type: value. The type should be an expression in qbVariableType's
language. The colon is a safe delimiter, because if you examine the qbVariableType
BNF shown in Figure 6-2, you will see that the colon cannot occur anywhere in
a qbVariableType expression. Colons can appear in values (for example, inside
quoted strings), but what matters is that the leftmost colon must appear if a type
and value are specified.

NOTE One primary reason for using BNF as described in Chapter 4 is the
ability to examine the BNE by hand or automatically with a tool similar to
bnfAnalyzer, and make sweeping generalizations about the language, such
as that a colon cannot appear in a serialized qbVariableType.

For example, the qbVariable expression Integer: 10 represents an integer with


a value oflO; the expression Variant, String: "ABC" represents a variant with a string
value of ABC. The string follows the rules of QuickBasic and Visual Basic: it must
use double quotes, and internal double quotes represent singly occurring double
quotes.
145
Chapter 6

A complete notation is supported for arrays. One-dimensional arrays are


supported as comma-separated lists of scalar values, where each scalar value
can be a number or a string. For example, Array, Variant, 1, 2: 1, True represents
the array containing the byte value I and the Boolean value True.
Note that the scalar value is always fitted to the narrowest QuickBasic type.
Therefore, we support an additional notation, useful for variants but applicable to
values. This is the decorated notation type(value), where type names a QuickBasic
type and value is a value. To accurately show a variant array containing an integer
and a Boolean, the expression may be Array, Variant, 1,2: Integer (1) , Boolean (True).
The use of decoration overrides the selection of the narrower type.
The asterisk may be used to specify a default value. For example, Boolean: *
specifies a Boolean type and the value False. When an asterisk is used as the value
part of an array, this fills all entries with the default for the entry type. For exam-
ple, Array ,String, 1, 10:* creates a string array with null strings in each entry.
Any expression for an array can follow individual values with a parenthesized
number, to repeat the entry n times. The expression Array ,Integer ,1,1014(10) cre-
ates an array with the letter A in ten entries. In fact, an asterisk may replace the
parenthesized number, and this will repeat the default array value to the end of
a one-dimensional array.
Arrays with higher dimensions are represented as lists of parenthesized slices,
where a slice is the entry at the lower dimension. Array, Integer, 1,2, 1,2 : (1,2) , (3,4 )
represents the type and value of a two-dimensional array. Array, Integer, 1,2,1,2,1,2:
( (1, 2) , (3,4) ) , ( (5,6) , (7, 8)) represents the type and value of a three-dimensional
array.

NOTE qbVariable and qbVariableType expressions aren't very user-friendly.


They are intended for internal production and consumption between objects
of the compiler. They do not form any part of the Quic/cBasic language.

Although the usual syntax of these qbVariable expressions is type: value,


either the type or the value may be omitted. When the type is omitted, it is deter-
mined empirically from the value alone, as follows:

146
QuickBasic Object Modeling

• If the value is a single number, it is converted to the narrowest QuickBasic


type. For example, the number 1 as an expression will convert to a byte;
the number 32767 converts to an integer.

• If the value is a single string in quotes and using Visual Basic rules, it is
converted to a string.

• If the value is a list of scalars (strings and numbers), it is converted to


a one-dimensional array with a lower bound of zero. If the variables all
convert to the same type, this will be an array of that type; if the variables
all convert to different types, this will be a variant array.

• If the value is a list of decorated scalars in the form type(value), it is con-


verted to an array. If they are all the same type, the array will be of that type;
otherwise, it will be an array of variants.

• If the value uses parentheses to specify subcollections in such a way that


it represents an orthogonal collection, so that all sub collections contain
the same number of members at all levels, it is converted to an array with
dimension equal to the depth of parentheses nesting, with a lower bound
of 0 for each dimension and an upper bound at each dimension that is the
number of listed elements, or slices. Again, if the entry types are the same,
the array has the scalar entry type; otherwise, it is a variant. For example,
(1,2), (1,2) specifies a two-dimensional byte array containing two rows
and two columns. Entries can be decorated.

• If the value is a list of valid parenthesized members in the form


(name, type), the type is UDT.

• If the value is the keyword UNKNOWN or NULL, the type is the corresponding type.

Figure 6-3 shows the BNF of the qbVariable toString/fromString language.

147
Chapter 6

fromSt:rlliq fromScn.nqType
fromStrl.nq fromStrl.nq7a ue
fromSt:ru:g :'"' fromScr1ngi~ chValue
fromScrl.I:q fromScrinqType COLON fromScr1nqValue
fromSc:n.nq COLO. fromScr1nqVa:ue
fromSt:r1ngType := baseType udc
beseType := 31mpleType var1enc:ype errey:"ype
sl.mpleType := : 'T t:ypeName
t:ypeName := BOOLEAN I3YTE INTEGER LONG SlNGLE DOOBLE I S!RING I
ON. .<NOWN I NULL
variantTy~e := abst:ract:Var1ant-ype COMMA varT~~e
varType := 31mpleType (arrayType )
arrayType :a ,VI: ARRAY,arrType,bo ndLl.sc
arrType :- 3impleType ab3t:ract:Var1antType parODI
parODI := LEFTPAREN7HESIS udt R:GHTPAREITHESIS
udt := :VI UOT,t:ypeLl.st
typeLl.sc :- parMeroberType : CO!1HA cype:1.3t J
parHemberType := LEFTPAR MEMBERNAME,baseType R GHTPAR
ab3craccVar1anc!ype :- :VI VARIAN!
bound:l.3C :- boundLl.3tEntry boundL~3tEntry C01~1A boundL1.3t:
boundL1StEntry := 90ONDINTEGER,30ONDINTEGER
31mpleType : - :V1:" c ypeName
t:ypeNa.me : = 300!..EAN !lY"!E INTEGER I :'ONG I S!NGLE I !>OUSLE I STRING I
\JNKNOWN NULL
varl.ancType := abscractVarl.ancType,var:"ype
varType : = 31mpleType (arrayType)
arrayType := "VI: ARRAY,arrType,boundL13t
arrType :- 31roP1eType ab3cract:Var1ant:Type
ab3t:ract:VariancType := : '7: VA.!l.lANT
boundL13c := boundL13cEntry boundL1stEncry, boundL~3c
boundL1sCEntry := aOUNDINTEGER,30ONDINTEGER
fromScr1nqValue :- ASTERlSK I fromStrl.nqlondefault
fromStrl.nqNondefault := arrayS l.ce ( C01~ fromScrl.ngVa ue
arraySll.ce := elementExpressl.on (fromStrl.DgNondefault
elemencExpre3310n :- e ement " repeater .
elemenc := scalar decoValue
scalar :- NUMBER VBQOOTEDSTRING ASTERISK I TRUE FALSE
decoValue := qu1ck3asl.cDecoValue netDecoVa ue
QU1ck3asl.cDecoValue := QOIC~ASICTYPE ( scalar )
netDecoValue :- netDecoValue := : SYSTEM PER:OD : _DEtITIFIER
LEITPARENTHESIS ANYTHING RIG:!.!PARENTHESIS
repeater :- LEF7PAR ( lNTEGER I ASTER:SK ) R:;:G~PAR

Figure 6-3. qbVariable's toStringlfromString language BNF

QuickBasic Variables Mapped to .NET Objects


This section addresses the way in which we map a QuickBasic variable to the
.NET object. The .NET object is capable of accurately representing any possible
QuickBasic variable (with two exceptions in our implementation only, as described
in this section). But what's the best way to do this?
The easier, softer way might be to just use the broadest and most general
type: the .NET Object. But we know that we would need to pay for the apparent
simplicity. Payment will be extracted when the Nutty Professor interpreter tries

148
QuickBasic Object Modeling

to do arithmetic or other operations on pure Objects, because the types of the


results will be determined by .NET rules and not QuickBasic rules. Life will be
less difficult when we use the CLR, but it will still be difficult, because the CLR
won't have QuickBasic type information.
Therefore, I propose the mapping shown in Table 6-3.

Table 6-3. Basic Mapping of QuickBasic Variables to .NET Objects


QuickBasic Variable .NET Object
Boolean Boolean
Byte Byte
Integer Short integer
Long integer Integer
Single-precision real Single
Double-precision real Double-precision real

String String
Array Collection

UDT A collection of members, each of which is a qbVariable object


Null Nothing in the .NET object value and the associated type
Unknown Nothing in the .NET object value and the associated types

Of course, this isn't as straightforward as it looks. Let's see how the map-
ping works.

Scalar Mapping
We start off easily, with Booleans represented by .NET Booleans and bytes repre-
sented by .NET bytes. QuickBasic integers are represented accurately in .NET by
short integers. However, .NET integers are 32-bit and cannot accurately repre-
sent QuickBasic integers.

NOTE Of course, all QuickBasic integers in the 16-bit word can indeed be
represented by .NET integers in the 32-bit word. However, code that depends
on the word size for accuracy won't work the same. You may think that code
should not depend on the word size, and it shouldn't in most cases; nonethe-
less. the compiler and runtime must account for this fact.

149
Chapter 6

QuickBasic long integers, which are 32 bits, are represented by .NET integers.
Now things get a bit messier. QuickBasic single-precision reals are represented
inaccurately by .NET singles. The representation is inaccurate because no .NET
floating-point tool, out of the box, provides support for QuickBasic floating-point
values precisely. To represent QuickBasic single-precision reals, we would need to
write a software simulation for this type of floating-point representation (and we
won't do this).
QuickBasic double-precision reals are represented, again inaccurately, by.NET
double-precision reals..NET is more mathematically accurate, but the older inac-
curacy isn't accurately simulated (whew).
QuickBasic strings are represented accurately by strings in the compiler
because a limit of 64KB characters is actually enforced, both to be accurate and
also as a sort of trip down memory lane, back to when strings, outside the C lan-
guage, were severely restricted in length. This retro feature can be suppressed by
a compiler option, but you need to go to the code for details.

Array Mapping
Arrays are represented by a collection with the following constraints, which will
be checked by the core inspect method of qbVariable:

• Each collection member that is not a collection containing a slice of the


array, as described next, is the .NET representation of the QuickBasic
scalar entry value. For example, a one-dimensional byte array would be
a collection of three bytes.

• Subcollections as items represent slices of the array, containing the con-


tents of a lower dimension. This means that a two-dimensional array is
represented as a collection of collections.

It would be a bad mistake to map QuickBasic arrays onto .NET arrays. Each
.NET array has a fixed dimensionality, even when it is dynamically allocated using
ReDim. We would go crazy trying to represent a QuickBasic array using a .NET array,
creating and re-creating arrays.
It would be a simpler matter to just represent any QuickBasic array using
a single .NET array of objects with one dimension, and convert multidimensional
subscripts to a single .NET subscript The math is easy. However, the overriding advan-
tage of the collection approach is that much of the access can be pushed down
into an independent object that is outside the compiler. This object is named
collectionUtilities, which provides several tools for working with the classic col-
lection, including tools for serialization and deserialization, and accessing recursive
collections that contain subcollections. The collectionUtilities tool is available
from the Downloads section of the Apress Web site (http://WtIW.apress.com) , in the
egnsfl collectionUtilities/bin folder.
150
QuickBasic Object Modeling

Since we have represented arrays as collections and higher-dimension arrays


as collections that contain collections, accessing an element is straightforward.
For example, suppose you have the array that is represented in fromString nota-
tion as Array, Integer, 1, 3, 1, 2: «1,2), (3,4), (5,6». This is a two-dimensional
array, and it is represented as one collection with three items. Each item is a sub-
collection with two items. To access this (or any) array, given a list of subscripts,
access the top-level collection and go to the subscripted item. It is either a col-
lection or, at the highest dimension, it is not. If it is not a collection, you're finished,
and the value can be returned through all recursion levels. If the item is a col-
lection, reapply the same probe recursively to the next level down, using the
subcollection. At each level of recursion, you examine one collection-either the
main collection or some subcollection.
This recursion has a hidden loop to get to the appropriate depth. There
would be no looping in the use of a single .NET one-dimensional array, just cal-
culation. However, the looping depends on what is almost always a very small
number of dimensions. Roughly 90% of arrays are one-dimensional, 9% are two-
dimensional, and perhaps 1% will have three or more dimensions (perhaps in
rocket science applications, voyages to the fourth dimension, and high finance).

UDTs, Null, and Unknown Mapping


unTs can be represented by a collection of members, each of which is a
qbVariable object.
Finally, null and unknown types are represented by placing Nothing in the
.NET object value and their associated types, because these types are identical
with their value.

Delegation vs. Inheritance


The qbVariable model provides some opportunities for its state to become invalid,
because it stores the value and the type. The type may be scalar, but the value
may be a collection, or the collection may not have the expected structure of
entries, and so forth. Murphy's law dictates that because this possibility exists, it
will happen, and it requires that the core inspect method of qbVariable check the
correspondence of type and value. The problem seems unavoidable, simply
because there is no one-to-one mapping of QuickBasic types to .NET types.
One design decision I made was to make separate objects to represent just the
type (qbVariableType) and the type and value (qbVariab1e). An alternative would be
to have qbVariable inherit qbVariableType, such that a variable would not have
a type; it would instead be a sort of type-with -more-stuff. If a design decision is
hard to put into clear words, sometimes this is a danger sign. The fact that it
sounds basically more sensible to say that a variable "has ft' type means that it is
probably the better decision. A variable with a type and value is not a subspecies
151
Chapter 6

of a type at all. Furthermore, inheritance would not solve the problem that the .NET
representation of the type might be at odds with the inherited type attributes.
Therefore, I decided to use delegation rather than inheritance. The next sections
describe the details of the qbVariableType and qbVariable implementations.

The qbVariableType Object


The qbVariableType class represents the type of a variable according to the
abstract model described in the preceding sections. As such, this class con-
tains no value information and does not identify the variable by name, with
one exception: when it defines a UDT, it identifies member names only. The
source code of qbVariableType is available from the Downloads section of the
ApressWeb site (http://www.apress . com). in egnsf/Apress/quickBasic/
qbVariableType/qbVariableType.vb.

qbVariableType State
In terms of object taxonomy, it's obvious that qbVariableType will have a state.
Figure 6-4 shows the State section of the code.

• •••• * State * ••• •


Fr~.nd Struoture TYPstate
01111 booUsabl. All Bool.an Object usab~llty
D~m strName All Str~nq Instanoe n .....
D1III anuVar~ableType As ENlJvarType o WAin type
D1III objVarTyp. All Objeot Contained type(s):
o Nothinq (~or a soalar , Unknown or null)
ContAined type ~or Var~ant
Bntry type ~or Array
, Colleotion tor VDT :
key is member nama :
data is J-member suboolleotion:
Item(l) ~s member index :
Item(2) ~s member n .... :
Item(3) ~s member as a qbVar~ableType
01111 colBounds All Coll.otlon No key : data ~s 2-elamant collectlon:
item(l) : lowerBound
itam(2): upperBound
Dim oolTypeOrd.rinq As Coll.otion Type order1nq
Dim booContalned( , ) As Boolean , Type oont~nment
Dim objTaq As Obj.ct U.er object
End Struoture
Privata USRstAt. As TYPstate

Figure 6-4. qbVariabZeType state

152
QuickBasic Object Modeling

Most fields in the state are self-explanatory, but the objTag is a bit of a mys-
tery. It supports the core Tag property shared with a number of the compiler
objects, including qbVariable. This allows us to use code to add objects and data
to a variable type instance in a spontaneous manner to meet extra needs in the
compiler, or in any program that uses qbVariableType as a stand-alone object.
The Tag property is not visible to the user of the QuickBasic system; rather, it is
a convenience for using the object.
The additional fact that the qbVariable object must, in many cases, delegate
when a variant, an array, or a UDT contains one or more types means that a full-
fledged stateful object, using the core methodology introduced in Chapter 5, is
needed. Therefore, qbVariableType informally implements the core methodology
as a stateful object.
At all times, a qbVariableType instance is either usable or not usable. The
instance becomes usable at the end of a successful new constructor call, and it
remains usable until the object is disposed of or a serious internal error occurs.
qbVariableType also has a Name property, which defaults to qbVariableTypennnn
date time to identify the type in XML output and other messages. This doesn't
name the type; it names the object instance.

The object2XML Method


Like qbScanner, qbVariableType exposes a core object2XML method: the start of its
output is shown in Figure 6-5.

153
Chapter 6

<1--
.*.***** ••• * •• ****** •••••••••• ******** •••• _****.*.**** ••• ***.***.*********.*****.*.*
• ..
• variableType •
• •
• •
• •
• This class represents the type of a quickBasioEngine variable, inoluding support ..
.. for an unknown type and Shared methods for relating .Net types to Quick Basio *
* types. *
*
..*
..
.. This olass was developed o~~ncing on 4/5/2003 by
*
.. Edward G. Nilges
*
.. spinoza11l19yahoo.COM
*
* http://members.soreenz.oom/edNilges
*
.* •
*
• *
.. This instance represents the following variable type:
• *
* Type:
*
*
* M~~r1: Variant containing 32-bit Long integer in the range -2**31 .. 2**31-1 ..
.. Member2: Variant containing 32-bit Long integer in the range -2**31 .. 2**31-1 ..
.. Me~r3: Variant oontaining Boolean ..
.. End Type: total size is 3 ..
• *
..* •
*
* CACHE INFO •
*
.. A.cache of recently parsed variable types is maintained to save time: here is •
* the state of the cache.
.. *
* Caohe status: available •
.. Cache maxiM~~ size: 100
* Caohe current size: 7 *
• Cache contains: "Unknown", "Variant, Lonq", "Long" I "Variant,Boolean ll , . . , .
*

•••••• ** ••••••• ** ••••••••••••••• **.*** ••• ** •••• * ••••••••••••••••••• *** •••••• * •••• * ••*
-->
<qbVariableType>
<!-- Indioates the usability of the objeot -- >
<booVsable>True</booVsable>
<1-- Identifies the objeot instance -->
<strNa~e>qbVariableType0001 3/4/2004 6:25:56 ~K/strName>
<.-- Identifies the variable's type -->
<enuVariableType>vtVDT</enuVariableType>
<1 -- Identifies the type of a contained variable -->
<objVa>::Type>
(1
~quot ;"lemberl~quot;
Va>::iant,Long)
(2

Figure 6-5. qbVariableType.object2XML output (beginning)

As shown in Figure 6-6, this XML includes type information when the
instance contains a type as part of the objVarType tag. In all cases, this can be just
the toString method output (with commas changed to newline for readability)
because it contains all the information about the embedded type, rather than

154
QuickBasic Object Modeling

the complete XML for the embedded type. In the example, a UDT is shown with
three members as seen in the comment block.

<qbVariableType>
<!-- Indicates the usability of the object -->
<booUsable>True</booUsable>
<!-- Identifies the object instance -->
<strNama>qbVariableTypeOOOl 3/4/2004 6:51:17 AM</strName>
<!-- Identifies the variable ' s type -->
<enuVariableType>vtUOT</enuVariableType>
<!-- Identifies the type of a contained variable -->
<objVar'l'ype>
(1
'quot ; Nember1'quot;
Variant , Long)
(2
'quot;Member2'quot;
Variant , Long)
(3
'quot : MAmber3'quot ;
Variant , Boolean)
</objVarType>
<!-- Identifies the bounds of an array type -->
<colBounds>L~ptyCollection</colBounds>
<!-- Identifies type ordering -->
<colTypeOrdering>noColleetion</colTypeOrderino>
<!-- Identifies type containment -->
<booContained>Unalloeated</booContained>
<!-- User's tag -->
<objTag>'quot ; 'quot ; </objTaq>
</qbvariableType>

Figure 6-6. More qbVariableType.object2XML output

The inspect Method


The inspection rules applied by the core qbVariableType. inspect method are
shown in Figure 6-7.

~
IlIapeotlon of "qbVarlableTypeOOOl 3/4/2004 6:57:44 AM" (1lDT. (_rl,VarilUlt.Lonq). (Mellber2,Variant,Lonq) .~
,(HMberl,VArlant,Boolean) ) at 3/4/2004 6:57:50 AM
I
Object iostance au.st be usable: OK
Type au,t be compatible with QODtai.ned value and/or: bouDCb: OK
Type auat be cooapatible .nth contained value and/or bounda: OK
Since tho container i:l it. 0D'f', t.he contai.n~ ~ "bould be a c.ollection of ..mer.
Contai...Ded vari&b.lt! type(s) 1II.u.t ~. their own inspection: OK
I

I -
l
.t..
I Close I
Figure 6-7. qbVariableType inspection

155
Chapter 6

Note that inspection is two layers deep, because a UDT containing three
scalars as seen in Figure 6-5 is two object levels, and qbVariableType always
inspects its constituent objects. For each object, the instance must be usable.
The type must be compatible with the contained value and bounds. For exam-
ple, a user type must have a nonempty collection of members.
Inspection also clones the object to make sure the clone returns a toString that
matches the original toString. For all objects, object2. fromstring(objectl. tostring)
must create the clone of objectl in object2, and the compare method for the two
objects must return True. Both assertions are checked in the inspection.

A Note on Inspection
Multilevel inspection is time-consuming, and objects that may have multiple
levels are inspected when they are disposed. This may turn out to be exces-
sively time-consuming in a production compiler. In a lab compiler, however, it
is a benefit.
As I scale up, I may need to eliminate, or make optional, the inspection that
now occurs in qbVariableType (and most statefui objects) when the object is
disposed. For a large object, it will clone the object and its constituents, and
this could be too time-consuming even in the lab.
But I would rather preserve the inspect routine and just limit its scope when it
"notices" that the object is "large," because software objects should be reliable.
Unreliable objects give object-oriented programming a bad reputation.
Hardware is built with all sorts of self-checks. I do not understand why any
form of checking in production is suspicious in the real world. But perhaps in
reaction, I sometimes overdo it.

qbVariableType Testing
The .NET application qbVariable1'ypeTester.exe is available from the Downloads
section of the Apress Web site (http://WIWJ.apress.com) , in egnsf\apress \qbVariable \
qbVariableTester\bin. You can use this application to test the qbVariableType object.
Run it to see the screen shown in Figure 6-8 (after an introductory and one-time
screen).

156
QuickBasic Object Modeling

Variable TyPe Expr ....... n. boldlaaB t"'ltoable type _ normal fac& ~


IvnJcnoom
Create Varoable Typo I _____-"-_-J _ _ _ _ _"'-_ _- - ' _ _ _::""::''':;;;''_ _-J RIlndam Type I
RandomIZe I

Venebl. types: click 10 ""loci: doubIB cick '" eet _ aIIfDc:t Cleer(olQ Creel. RIlndom Types

Slat us
* Finally. tha "Sere~8 Te~tft bueton will 3~re3~-tese this object. Click Test, ~i~ •
~ back, and .A~ch ehe progreD~ of te5cing in ~he ~ea:u~ box. This will provide
• same assurance i! you'~e chanqed the ~ource code .
••••••• _~._ •• 6.* •• ~ • • __ • • • ~ • • • • • • • • • • • • • • • • ~ • • ~.* ••••••••••••••••••••••••••• •••••••
~

.
3/4 / 2004 8:07:16 AM end of Abou~ information
3/4/2004 8:07:17 kH Loadin
: complet,..

S.veSet~_ Resl"'" Setllnqs CIeor Selllnqt Contl!lIInment tester

Figure 6-8. The qbVariableTypeTester program

Creating Test Objects


Enter Variant,Integer: in the text box at the top of the window, and then click
the button labeled Create Variable 'TYPe. The object is created and inspected for
validity. Click the Describe button to see an explanation of the type.

qbvariabJei:y~Tester

The variableType Forml has this description:

Variant containing 16-bit Integer in the range -32768 •. 32767

OK

Click the Inspect button to see the internal inspection report, as shown in
Figure 6-9. This applies a series of assertions about the state of the object to the

157
Chapter 6

state, as we've seen. Here, the contained type is a full-scale type in its own right
and cannot (as in Figure 6-7) be represented as a toString.

'------------" -'<.
-,-.~-
.~~---~-
~- --

iD.tance aust be usable: OK


be compatible with contained value and/or bounds: OK
ideu~ic.u. toString v.u.ue to origi....l object: OK
usod as f=-Suing' prociuc... identic.u. objects: OK
ICOlntau.", varlAh1e type ,.) must pa... thai..:r own tn..pectiOD; OK

••• rn.pectiOD of CODtaiDod data type object qbVariableType0002 8:14:41 AM ....... . ~/4/200'
InspeoUon of "qbVui.ableType0002 3/4/2004 8:14:41 AM" (lnteqer) at 3/4/2004 8:19:54 AM •

• Objeot iD.eAnoe 1Il".t be ".able: OK


• Type au.t be ca.p<>tihle .<ith contaiDod yaJ.ue and/or bouncU: OK
CloDe return. identical toStriDq value to or1qiDal objftOt: OK
• toStriDq ".od a. fraaString' produce. identical objects: OK
.*'l1li111 . . . . . . . . . . . . . . . . . . . . . . . . . \III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , • • • • • • • • • • • • • • • " • • • • 1 ill . . . . . . . . . t • • • • • t l .

Close

Figure 6-9. A qbVariableTypeTester inspection report

Dispose of the object by using the Dispose button.


Next, enter Array,Integer,l,lO,l,lO: and click Create Variable '!ype. Then click
Describe to get a description of the array data type.

qbVariablif ypeTester

The vanableType Fonnl has thIS descriptk:m:

2-<limenSIOnal array W1th 10 rows (from 1 to 10) and 10 columns: (from 1 to 10): each element
has the type 16-bit Integern the range -32768 .. 32767: total siZe is 100

OK

Note that each time you create a test object using this interface, the object will
be inspected. If an internal error is found, areport will appear. These reports won't
appear for syntax errors. For syntax errors, asimple dialog box will appear, and
the object will not be created.

158
QuickBasic Object Modeling

Converting to XML
Click the object2XML button to convert the object state to XML. Figures 6-10
and Figure 6-11 show the results.

<1--
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 • •

• varhbleType

• Tbi. alu. repre.ent. the type of a qoiokB4.ic!.ng1.ne vuiable, t..Qc1udinq luppoZ:t. ..


• for en unl<nown type end Sbarod _tbod.o for relatinq .Net types to Quick B... io
III types.

• Tbia cla . . waa developed oo.>oDcinq OD 4/5/2003 by

• Edward G. Hilq ...


• 'pino ... Ul1iyaboo. COM
• bttp: / / _ n .•creon •. c_/edHilqe.

• 2-dt.ena1onal array with 10 row. Ifroa 1 to 10) end 10 co1U1111ls: Ifr.,. 1 to 10):
• ....ch e1-..t baa the type 16-b1t Inteqer in the ranqe -32168 .• 32167: totel .ize
• i . 100

.. A CAche of reoently paraed variable types t. aai.ntained to save t.u.e: bere 1_


• the sute of the caabe .

.. Cac:be .tatu.: avallable


,. C«che .... tm. .iJ.e~ 100
.. cache current .ize: 3
III cache containa: "unknown" f "ArrAY. lDteqer . .. n, ":rnteqer"

....................................................................................
Close

Figure 6-10. The XML description of the data type starts with a comment block.

159
Chapter 6

<qbVariableType>
<1-- Indicates the usability of the object -->
<boousable>True</booUsable>
<!-- Identifies the object instance -->
<strName>qbVariableType0001 3/4/2004 1:02:57 PMk/strName>
<!-- Identifies the variable's type -->
<enuVariableType>vtArray</enuVariableType>
<!-- Identifies the type of a contained variable -->
<objVarType>Integer</objVarType>
<1-- Identifies the bounds of an array type -->
<coIBounds> (1,10) , (1,10)</coIBounds>
<!-- Identifies type ordering -->
<colTypeOrdering>noCollection</colTypeOrdering>
<!-- Identifies type containment -->
<booContained>Onallocated</booContained>
<!-- User's tag -->
<objTag>&quot;&quot;</objTag>
</qbVariableType>

Figure 6-11. The XML description of the data type contains the object state.

Note that the object2XML output of Figure 6-10 shows a cache. The cache is used
to avoid unnecessary parsing of fromString expressions. Each time a new fromString
expression is presented to the object, the object parses the expression and saves it
in a keyed Collection, such that the key is the fromString expression. The object is
saved by fonning its comprehensive, or "deep," clone. Later on, when a fromString
is presented, the object can check the cache quickly for a copy of the required
object.
The XML comment describes the state of the cache. In the example, four
fromString expressions have been parsed and cached. Up to 100 fromString
expressions can be saved in this way. Caching saves time. For example, running
the nFactorial QuickBasic program using the Nutty Professor interpreter intro-
duced in Chapter 8 takes about 60 seconds on a contemporary system from start
to finish when no caching is performed; it takes about 40 seconds with caching.

Stress Testing

You can conduct a comprehensive stress test of most of the functionality of the
object. Click the Stress Test button to see a progress report, as the object exe-
cutes about 50 self-tests. On completion, click Yes to see the test report shown
in Figure 6-12.

160
QuickBasic Object Modeling

SeU-test of qbVuiableType0001 3/4/2004 1:02:51 PM at 3/4/2004 1:01:59 PM

CreatiDq internal test object


::J
T... tinq froastriDq (Unknown)
froaStrinqCUnlulown) bas :lucceeded I
••••• ML dtBlp of te.t object at 3/4/2004 1:07:59 iM •••••••••••••••••••••••••••••••••••

......................................................................................... ~

·. ·.
.. <I ... ... •

I
·.
... .. vA.r1ableType

I
••
!
·.
... ... Thi. cla •• represents the type of a quickBaaicE:nq1.ne variable, inoluding' .upport ...

·.
• • for an unknown type and Sbared _tbocb for relatin<J . Net type. to Quick Ba.t"

·.
.. .. types.

• • This class .... developed o..-e"ging- 011 4/5/2003 by


I
I
·.
... ... Edwa.rd. G. NUges
... .. .pinozalll18yahoo . CCM
• ... http://1DGaber•. acreenz. co./edNUqe. I
I
·.
... .. This wtance: represent. the folloviD9 vAriAble type:

.. .. OnJc.nown : represents an unknown type; lll1d/or value I


••
Close

Figure 6-12. qbVariableType test report

Scroll down through the test report to see a series of random (but repeatable)
tests that exercise the functionality of the object and the syntax of its fromString
expressions. You can also click the Randomize button for nonrepeatable tests. The
object is continually self-inspecting during these tests.S

qbVariableType Shared Methods


The qbVariableType object is an appropriate place for several Shared methods
for working with variable types. These methods can be used without creating an
instance of an object. They include the following:

• containedType(typel, type2) tells whether the qbVariableType in typel is


"contained" in type2 (as specified in Table 6-1, earlier in the chapter).

5. While passing these tests proves nothing, in the sense that nothing proves software correct,
the tests have been invaluable to me in regression testing while changing the source code.
They will be of equal value to you if you change the source code.

161
Chapter 6

• defaultValue(type) provides the default .NET value for a type. For exam-
ple, it will return 0 for an integer type and a null string for the string type.

• mkRandomType constructs a random but valid toString/fromString expres-


sion for a type, in the event you want to play around. Its serious purpose
was to supply test cases to the test method.

• netValue2QBdomain supplies the QuickBasic type that will be used to repre-


sent a .NET value.

All of these shared methods and all the unshared properties, methods, and
events of qbVariableType are fully documented in the qbVariableType reference
manual (see Appendix B).

The qbVariable Object


The qbVariable object contains the type, value, and name of a QuickBasic variable.
Again, the source code is available from the Downloads section of the Apress Web
site (http : //lWM.apress.com), in egnsf/apress/quickBasic/qbVariable/qbVariable.vh.
All variables (including constants) are "boxed" inside qbVariable objects.
About 75% of the work of qbVariable is accomplished inside its qbVariableType
delegate. The remaining tasks are as follows:

• Record, modify, and return the value

• Record the variable name (different from the object instance name in the
Name property)

• Expose core object procedures, including toString/fromString expression


translation

qbVariable State
The value of qbVariable is recorded in the generic .NET object, objValue, as part
of the object state, as shown in Figure 6-13.

NOTE The objTag is the same as in the qbVariableType state, described earlier
in the chapter. It supports the core Tag property shared with a number o/the
compiler objects, including qbVariableType. This allows using code to add
objects and data to a lJariable instance in a spontaneous manner to meet
extra, unforeseen needs. For example, the compiler stores the lJariable's index
in its collection o/lJariables in the lJariableTag.

162
QuickBasic Object Modeling

. • •••• seate •••••


Pzivaee Struoture TYP~tate
Du. booVsable ~ Boolean • Object u*ability
Dim .trN~~ ~ String t Instance name
Du. .trV~1ableName ~ Str1nq t Variable name:
DLm boovariableN~efau~t. ~ Boolean · True: v~ri~lo naMe he. dofau~t va~ue
, ral.e: vaziablo name ha.s been changed.
Du. objDope ~ qbV&riablerype.qbVaziabloType · Variable type
Dim objValue ~ Object , Variable va1ue: .Net sOAlar or oollection
Dim objraq ~ Object I V.or object
End Struoture
Private US~tate ~ TTP.tate

Figure 6-13. qbVariable state

The objValue object has the following values, which are checked for consis-
tency with the qbVariableType delegate, which is in objDope (so-called because the
variable type gives us the "dope" about the variable). This check is performed by
the qbVariable. inspect method.

• If the type is unknown or nothing, objValue is Nothing.

• If the type is QuickBasic scalar, objValue contains its corresponding


mapped .NET type.

• If the type is variant, objValue is a distinct qbVariable delegate, which is


never a variant. It can be a scalar qbVariant, null (this is the default),
unknown, or an array.

• If the type is one-dimensional array, objValue is a collection containing


.NET values for the array elements.

• If the type is n-dimensional array, objValue is a collection. This collection


must be orthogonal-either a collection of noncollection items or of col-
lections. In the latter case, each collection must have the same number of
items as all the other collections at its level, and each collection must be
recursively orthogonal.

• If the type is UDT, objValue is a nonorthogonal collection of qbVariable


objects, one for each UDT member.

It would be very glamorous in the academic sense to make the scalar entries
of arrays qbVariable values instead of .NET values. This would appear to reduce
multiple objects to one object. But if a large array collection consists of a mas-
sive collection of stateful qbVariable values, each will burden the CLR heap, and
each will take time to allocate, create, and access. To avoid this overhead, array
elements are .NET scalars. They are protected against outside tampering by the
array qbVariable.

163
Chapter 6

Variable Modification and Value Return


To modify any variable, including subscripted array entries, qbVariable exposes
the valueset method. valueset(value) can set any nonarray/non-UDT to a .NET
value object.
For one-dimensional arrays, valueset(value,index) modifies the value at the
index. valueset (value, indexl, index2) does the same in a two-dimensional array.
For any array, valueset(value,indexes) sets the entry, where indexes provides all
subscripts as a comma-separated list.
For UDTs, valueset(value,member) identifies the member and sets it to the
value. Note that a UDT member can be a UDT. Therefore, member can be in the form
namel. name2 ... , a sequence of period-separated names down to the last member to
access nested UDTs. If the UDT member is an array, value Set (value, member, indexes)
can access the UDT member.
valueset cannot change the preexisting type of the qbVariable. Only
fromstring(e) can set the type and value, if e is an expression, as described in
the "qbVariable Serialization" section earlier in this chapter.
The value method of qbVariable returns values of qbVariables. It has the same
syntax as valueset, without the value parameter. For example, objQBvariable. value
returns a scalar value. objQBvariable. value ("memberol. membero2") retrieves
membero2, in the UDT that is memberol, inside the variable.
qbVariable also exposes most of the core procedures that are exposed by
qbVariableType, including inspect, test, Name, and object2XML. For a comprehen-
sive reference manual, see Appendix B.

qbVariable Testing
The qbVariableTest.exe application, available from the Downloads section of the
Apress Web site (http://www.apress.com) , in egnsfl apressl quickBasicl qbVariablel
qbVariableTest/bin, will test the qbVariable object and allow you to enter expres-
sions in qbVariable's tostring/fromstring language that specify value and type.
Figure 6-14 shows the program's interface.

164
QuickBasic Object Modeling

~ __· __'_ .· _·_


· _,'-]31~~:~"",,,
Variable Expression. boldfDce iI ....riabie axisb: nonneIlece o4herwise
fariant, Inteqer : vtlnteger(32767)

em Clone
I
-=J
Inspect Cleer R8<&I Ot9pose GoIV.lue

Stress Tesl XML Oescnbe Dope loSlnng Sol Vel""


I
Variables: click \0 selea: double dick 10 1181_ objacl Cle6r(.I~ I C....18 Rondom V"""bles

Status Zoan
3/4/2004 1:34:00 PM Loading
3/4/2004 1:34:00 PM Loading compleee
3/4/2004 1:35:16 PM Creaeion of the variable type Variane,Ineeger:vtlnteger(32767)
3/4/2004 1:35::6 PM Creat~on of ve comoleee
/4/2004 1:35:16 PM Inspect10 lick OK to v~ew report" cl~ck C

Resl"'" SelUngs C1e<I, SelIlIIgS R6ndom1Z9 About 0051Hloo'1 SII'I8 Sllltlngs

Figure 6-14. The qbVariableTest.exe program

In the text box at the top of the window, enter Variant,Integer:vtinteger


(32767), as shown in Figure 6-14, and click the Create button. Then click the XML
button to see how the object, along with the type delegate, appears in XML, as
shown in Figure 6-15.

165
Chapter 6

<1--

If cFVar lMle

.
.. 'Fbi. cl . . . repre ••nu the t.ype and value of • Quick B.ulo variable ..

If Thi. clu. wu developed co..encinq on June 24 20Gl by


.
.. Edward o. NUqe.

.
~ qinozal111tyahoo . CCM
., bt tp: II~r• . eoreenz . COIa/odHil90.
,-
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . "" • • • • • • • l1li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

-->
<qbVutoblo>
<I -- lncl.lc.at.e. the uaablilty of the obj&Ct. -->
<boo11.oble>True<!boofJ.oble>
<t -- Ideaotlfiu the object a.tanee -->
<.trN.... >qbV.. riableOOOl l/4/2004 1: 35: 1115 PM</ilt:dfame>
<!-- ldentifiea the vubJ,le --> I
<.trV&rt.abl.e1~aao>'VaZIateqeroao4<I.trV&ri.ableNUIe>
<!-- True tndic&t4Ia that tbe variable OUle baa def.ult value -->
<booVariableNa.eDef aul t s>True</booVar iableN~f.ult.>
<! -- Dope a. " qbV&t'iableType -->
<Obj~>
<'--
I

.. varhbleType

Close

Figure 6-15. XML representation a/the qbVariable

Note that in both qbVariable and qbVariableType, the toString serialization is


almost as comprehensive as the XML and more compact (should you decide to
use these objects, as free software, in any project where you need to represent
QuickBasic values and their types) .

Using XML to Capture Types and Values


I've used XML as a notation primarily to show the documentation of variable
types and their values in a readable fashion. For example, this application of
XML might be useful if you're working with a legacy Basic program that creates
output for database consumption, and you need to accurately record, in addi-
tion to data values, the types of data. In comparing data from multiple source
databases, the values may mean more in comparison to the limits expressed in
their types-a I6-bit integer can mean something different when it is converted
to 32 bits. The selection of a I6-bit representation might be a pure accident, and
the user may want values outside the range -32768 through 32767, or it may
mean that the value is limited to the range, and a value outside the range is an
error. For example, "number of employees reporting to a manager" is definitely
a small integer (unless your company manages by horde) .

166
QuickBasic Object Modeling

In effect, a data type is, in many cases, a business rule. Whether the CEO wills
it or not, restricting gross pay values to a 16-bit integer representation means
(1) all employees must be paid in whole dollars and (2) no employee will earn
more than $32,767.00 per pay period.
qbVariable and qbVariableType allow you to obtain one-dimensional serializa-
tion in the form of toString/fromString expressions, or more self-descriptive
and two-dimensional XML tags. Either can be stored in a database field without
loss of fidelity. However, these objects do not support an xm120bject method.
I have never found a fully satisfactory XML parser (commercial parsers tend to
enforce goals I don't necessarily share). XML is easy to parse using recursive
descent, but the issues are very complex when you consider the size to which
XML can grow and the resulting need for incremental parsing of part of an
XMLfile.
This book could, I suppose, include a chapter on "XML parsing for fun and
profit," because an XML parser, like a classic compiler, can consist of a lexical
analyzer and parser. The problem is that there may be no profit in getting
involved in XML wars. It's also no fun to fight with the boss over which parser
to select.

Under the CLR Lies the Beach!


I'm ambivalent. I love and scorn these object-oriented digressions, for in them,
"the law wishes to have a formal existence," as our academic friend Stanley Fish
(a literary critic, turned administrator at the University of illinois) says. In object-
oriented design, the absolute need for an object has a tendency to haunt designers,
especially if they neglect to implement the object, and does so with pure code in
procedures.
My ambivalent love and scorn is based on the sheer amount of work it takes
to create a halfway decent object-oriented program that meets the promises of its
detailed design. I love to code, but at the same time, I sometimes would rather
read Kant, play basketball, or go to the beach. Object-oriented design doesn't
really free you to go to the beach, but it does let you do more work, because it
makes new objects possible. In fact, sometimes these objects seem to force you
to create them, because they have a cruel beauty. You love them, but they don't
love you back (this is great preparation for your real love life, if you're a total
masochist and assuming you have one).
The original QuickBasic compiler I designed simply mapped all values to
a .NET object. This worked easily for simple scalar values including integers and
strings. As you've seen in this chapter, it took some more effort to make it work
for arrays. I could have just told you, "to represent the array, I stick a .NET collec-
tion in the value object at such and such a place, and you are now free to go to

167
Chapter 6

look at my wonderful code, and good luck." But that wouldn't be very helpful.
Instead, I needed to describe the data architecture of the compiler.
Managers may tear their hair when programmers break a problem down into
unforeseen components; the user wanted X, not X. Y. Z. But one lesson that can be
derived from any number of disasters is that objects, and before them reusable
procedures, want very much to have a formal existence.
In actual programming, it is common for a programmer to see needs beyond
the formal requirements written by nonprogrammers, and it is important to be
able to communicate these needs. For example, the programmer may understand
that a problem is best solved by a language that describes instances of the prob-
lem and write a compiler for that same language.
No tedious recitation of individual procedures and what they do can replace
the broad understanding afforded by describing what an object is. "This object
represents all possible QuickBasic variables" summarizes an intuition and is, for
this reason, better than a list of procedures, all of which must run in harmony to
provide the unmentioned variable object.
In this chapter, you've seen the design, development, and testing of the
qbVariableType and qbVariable objects. Like the qbScanner object introduced in
Chapter 5, creating these objects involves a disciplined methodology for require-
ments analysis, detail design before coding, and developing the core procedures
(including Name, inspect, and object2XML). This takes time.
I have found the payoff for this object-oriented approach to be large, and one
that fulfills the unmet promise of the structured methods of old. Structured pro-
gramming promised, but did not (for the most part) deliver, chunks of code that
would snap together with a satisfying "thunk," "ka-chunk," or "bada-bing." The
modular and structured methods often failed to obtain the desired productivity
gains, because they only seemed to make bad designs worse through incorrect
problem analysis. Object-oriented design, done right, seems indeed to deliver the
hoped-for "ka-chunk" sound. Because the prerequisite for delivering the object is
so much analysis, it has a tendency to concentrate the mind (kind of like the
prospect of a hanging).
But both structured programming and object-oriented development require
a big time investment. Because of this, management put the structured methods
on a very back burner in the 1970s and now gives object-oriented development
low priority. For this reason, it is best to practice the arts described here, and ear-
lier in Chapter 5, in secret. In particular, it is just wrong to describe this art as
"better," when at your office, better means faster.

Summary
This chapter has been a real forced march, and I do hope you are as tired as
I am, but not tired of me.
We have done a requirements analysis as mere programmers of what it
means to fully support a classic variable, whether in QuickBasic or old Visual
168
QuickBasic Object Modeling

Basic, as a value or as a container for another simple value, an array of values, or


a collection of disparate values.
In this analysis, we designed two little languages, produced by
qbVariableType. toString and qbVariable. toString, and consumed by
qbVariableType. fromString and qbVariable. fromString. The toString/fromString
language of qbVariableType specifies all possible types, and it is a subset of the
language of qbVariable. In the code, lexical analysis uses the scanner described
in Chapter 5 and coded-by-hand, recursive, top-down descent to set type and
value objects to the specified values. This approach to the tactical use of parsing
is fully described in the next chapter. As in Chapters 4 and 5, we then ran a test
GUI for the objects to see how they run and self-test.
The object-oriented approach shows how to factor the very large problem of
building the front end of a compiler. The tables for this model would be very com-
plex, whether implemented as linked lists or tables in a relational database. Linked
lists are an excellent technique in pure C, and it might be a very good idea to cre-
ate compiler tables as an Access, SQL Server, or Oracle database, thereby using
SQL to retrieve data that would automatically persist between invocations of
your compiler.
Objects, however, increase the ratio of nouns to verbs, since they tend to
change a description of the code from the description of actions (such as "search
the identifier table" to a description of data (such as "the variable object, which has
a variable type"). This may make the discussion more understandable, because
instead oflisting tables and verbs (procedures), the object becomes a storyboard.6
But we need to cut to the chase. In the next chapter, we'll get to the
quickBasicEngine itself. This object uses all the objects of Chapters 5 and 6-
qbScanner, qbVariableType, and qbVariable-to scan, parse, and interpret (using
an onboard Nutty Professor interpreter) QuickBasic code.

Challenge Exercise
Translate the following variable types and variables to the fromString/toString
notation of qbVariableType and qbVariable. Make sure you can create and inspect
the variable types using qbVariableTypeTester. Make sure you can create and
inspect the variables using qbVariableTest.

• As a type: an integer

• As a value: an integer that contains 10

• As a type: a variant that contains a long integer

6. Of course, you may have other ideas. Please e-mail me your thoughts at
[email protected].

169
Chapter 6

• As a value: a variant that contains the long integer 32768

• As a value: a string that contains a single double quote (hint: remember


that our strings follow Visual Basic rules)

• As a type: the two-dimensional integer array with rows numbered 1..10


and columns number 0.. 3

• As a value: the array in the previous item with zeros in all entries, repre-
sented using the shortest possible fromString

• As a type: the UDT consisting of an integer and a string

Resources
For more information about data modeling for compilers, refer to the following:

Compiler Construction for Digital Computers, by David Gries (John WIley,


1971). This book is out of print, but of historical interest. It is about writ-
ing compilers principally for IBM mainframes and spends a good deal of
time on table management.

The Algorithm Design Manual, by Steven S. Skiena (Telos Press, 1997).


This book is an excellent reference for the theory of "complexity," which
in computer science, means efficiency metrics.

170
CHAPTER 7

The Parser and Code


Generator for the
OuickBasic
... Compiler
Cut to the chase.
-Old Hollywood saying

IN HOLLYWOOD'S TERMS, this is the chapter where the widow in arrears is tied to
the railroad track by the heinous landlord, young Jack overtakes the onrushing
locomotive to rescue the distressed widow, gets the widow a second mortgage
on the Web, sends the villainous landlord to rehab, and organizes a men's retreat
for the boys back at the ranch.
Or, if you prefer, this is where Luke Skywalker defeats the Dark Side of the
Force and finds that dad is Darth Vader, which just goes to show you.
After much object development at the mother ship, we have arrived at the flag-
ship, and indeed the largest object in our compiler object fleet, quickBasicEngine.
A brilliant editor of mine has said, in so many words, that programmers are homeys
who be chillin' when they see code. What I think he meant was that I need to
supplement a theoretical discussion with at least a mad dash through the over-
all solution architecture of quickBasicEngine and its roughly 10,000 lines of code.
I will postpone discussion of the onboard Nutty Professor interpreter that is
included in quickBasicEngine until the next chapter. Here, I will cover the parser
algorithms as generated from the BNF definition of the language described in
Chapter 4.
This chapter will show how the parsing procedures can be manually, but
rapidly, cranked out as multiple-algorithm implementations using a simple set
of rules. You will see how the compiler generates individual instructions as
objects and how this allows us to associate as much data as is appropriate to
each instruction, including data that ties the instruction to the source code to
aid in debugging. Just because we're implementing a legacy language, there is
no reason for using retrograde methods from the dawn of man.
I will also introduce the fascinating topic of compiler optimization, demon-
strating how constant expressions are evaluated by default during parsing and
how the compiler eliminates unnecessary operations in a safe manner. Finally,

171
Chapter 7

I will present an end-to-end example of the compilation and execution ofa very
simple program ('Hello world,' everyone's favorite).

The Recursive-Descent Algorithms


The word Algorithms is plural in this section's title because a different algorithm is
needed for each production in the BNE A separate algorithm must be constructed
for each production in the BNF grammar. Each parser method in our compiler
needs to pass a series of parameters expressing state or reference state in Common
Declarations, and the most important single fact in the state is the position of
the next token.
Before we look at the "meta" algorithm for actually coding individual
recursive-descent procedures for the grammar symbols in your BNF, let's
review recursive-descent algorithms in general.

Recursive-Descent Approaches
Recursive descent is one of the oldest parsing algorithms. It is not a compiling
algorithm, per se, because it has little to do with scanning or code generation.
It has to do with recognizing the language.
Hero computer scientist Niklaus Wirth, the creator of the Pascal language, 1
said that recursive descent must be used for block-structured languages such
as Pascal. This is extreme, but as a manual parsing method, it is the most
understandable.
Thro general approaches to parsing exist: top-down and bottom-up. In the
top-down, or goal-oriented, method of recursive descent, you decide on an over-
all goal or task and break it down into smaller tasks. In bottom-up algorithms,
you instead run through the series of scanned tokens with auxiliary data struc-
tures, and enter a variety of higher states as these symbols are seen to build
higher structures. Basically, in top-down algorithms, you start with program and
go down to token; in bottom-up algorithms, you start with token and go up to
program.
Both approaches can be automated by parser generators, but on the whole,
bottom-up is better automated because ofits complex data structures. Top-down
recursive descent is easier to understand, and, as a tactical solution to quick pars-
ing, it is nonpareil. So, top-down is the method used for our compiler's parsing
procedures.

1. Nildaus Wrrth was an early proponent of safe, as opposed to merely efficient, computing.

172
The Parser and Code Generator for the QuickBasic Compiler

Cain's Amulet-A Recursive Descent Story


One of the oldest discussions of recursive descent was in terms of fathers
and sons.
The first man (Adam, if you like) is told, "Compile a program," and as you'll
recall from Chapter 4, program: = source Program I immediateCommand. In other
words, a program is a source program or an immediate command.
The first man has two sons (Cain and Abel, if you like). Since Cain only knows
how to compile source programs, Adam tells Cain to try to compile a source
program. (All of this predates the messier business between Cain and Abel.)
Cain goes off, and he and his descendants compile the source, or not, depending
not on their ability, but on whether the source is valid. Cain and his descendants
pass among themselves a magic amulet, which at all times tells them the next
source token to be examined.
Cain knows that a source program is one or more option statements (that must
appear at the start of the code and nowhere else), followed by a list of newline-
separated statements. He passes the amulet to his eldest son, OptionParser, who
checks for either no option statements or a small list. If OptionParser finds option
statements with errors, he has to report failure to Cain and also give Cain back
the amulet. Cain then goes back to Adam and reports failure. Adam can then
hand the amulet to Abel to see if an expression exists.
The goal at the end happens to be that the amulet points one past the end of
the source code as presented to Adam. Success, but an incomplete parse, is
actually failure, because it means (as in the case of the simple program I=O###)
that there are invalid sequences beyond the end.
My Cain and Abel exposition ignores some important details, the most important
being the fact that Cain and Abel will need to use the same "descendants" in
many cases. Think of these as laborers on their farm, and not as physical descen-
dants, who move between the two farms as journeymen. The most important
journeyman is the expression parser, which is needed to compile both source
programs and expressions.

Recursive-Descent Procedures
If you know how to code individual recursive-descent procedures as methods in
Visual Basic .NET or another language, then using a generator may be overkill
for small to medium parsing tasks, including parsing common languages.
The simplest case occurs when the production is of the form a : = b, where b
as the right-hand side (RHS) may be complex but does not contain any direct or
indirect recursive reference back to a; that is, a does not occur in b, nor does any

173
Chapter 7

part of b break down into an RHS that includes a. This simple case can be handled
by a series of checks for terminals and grammar symbols. In quickBasicEngine, for
example, each grammar symbol method is a Boolean function that returns True
or False. On success, each advances the token index one symbol beyond the end
of the context parsed.
In quickBasicEngine, each terminal is checked by calling the procedure
compiler_ checkToken_. This procedure checks for either a literal terminal value (such
as a comma when expected) or a class of terminal values (such as an identifier).

NOTE As part of the coding convention for the compiler, all Private methods
are terminated with an underscore. All members that are called exclusively by
a higher procedure, including all procedures called exclusively by the Private
method compiler_, are prefixed by the name of the caller. For example, the
name of the procedure responsible for checking both expected token values
and expected types is compiler_checkToken-, with two underscores after
compiler. The first underscore shows that compiler_is private; the second sep-
arates its name from its suffix.

Basically, in b, which can be a complex sequence as long as it does not refer


to a directly or indirectly, operators can be separated by spaces (which indicate
implicitly that they must follow each other left to right) or by alternated sequences.
Operators that are separated by spaces must each appear. Operators that are
separated by the alternation operator (the vertical bar) are alternate possibilities,
of which one or the other should appear. Consider the production a: =b I (c d),
where the grammar symbol a consists of either a b or a c that is followed by a d.
The code for this would be as follows:

If b Then Return(True)
If Not c Then Return(False)
If Not d Then Return(False)
Return(True)

This simplified code ignores "Cain's amulet"-the index of the next token that
is known by all compiler procedures and available for modification by all compiler
procedures. In the actual code of quickBasicEngine, the index is passed by refer-
ence to the compiler procedures as intIndex. Nearly all modifications of intIndex
(the "amulet") take place at the lowest possible level, when compiler_checkToken_
is called to check for a token.
In quickBasicEngine, nearly all procedures can assume that a successful parse
of a grammar category has advanced intIndex exactly one token beyond the end
of the code that corresponds to the grammar category, and that an unsuccessful

174
The Parser and Code Generator for the QuickBasic Compiler

parse will leave the index unchanged. For this reason, many of the more complex
procedures start with the declaration and saving of intIndex, so that they can
reset it cleanly on a False exit.

NOTE I emphasize precision in handling the index because, for error mes-
sages and debugging, it is important to be precise as to the scope of anyone
grammar symbol!

Revisiting the BNF Design


As you learned in Chapter 4, dangers arise when the grammar construct can
appear directly or indirectly on the RHS of its own production. For example, there
is a problem when a binary expression such as a Or b can be replaced by one that
contains recursive instances of the same grammar category, as in a Or b Or c.
The key is to design the BNF to avoid the infinite looping of left recursion seen
in orExpression : = orExpression Or orExpression, or the more tricky problem of
associativity seen in orExpression : = or Factor Or orExpression. Changing the first
production directly into code will loop when it tries to parse the first orExpression.
The second production will work for Or operations but will evaluate the RHS first. If
converted to subtraction or division, as in add Factor := mulFactor multiplyOrDivide
addFactor, this will create wrong answers at high speed.
The general form of any expression factor of a binary operator is seen in
the production for addFactor (a factor of an addition or subtraction operation) :
addFactor : = mulFactor [addFactorRHS]. We know that any mulFactor (anyexpres-
sion valid as the factor of a multiplication or division operation) can also be an
add Factor, although the reverse is not true (1+1 must be parenthesized to work as
the factor in (1+1)*2). We also know that it can be foUowed by any multiplicative
operator. Therefore, the production addFactorRHS expands to addFactorRHS : =
mulOp mulFactor [addFactorRHS]. And, from Chapter 4, we know the right sort of
recursion here is one that will not loop. By the time we parse mulOp and mulFactor,
we know that we have increased the index, and this is because we know that nei-
ther mulOp nor mulFactor can ever be satisfied with null strings. As predicted in
Chapter 4, this is how we can produce the code for a solid BNE

Examining a Parser Procedure


BasicaUy, aU the compiler parsing procedures have the same general structure.
Here, we'U examine the addFactorRHS procedure as an example of how these pro-
cedures work. Figure 7 -1 shows the procedure header for addFactorRHS.

175
Chapter 7

. addFaotorRKS :- mulOp ~~'actor [addPactorRHS)

Private runotion compiler addFaotorRKS (ByRef lntlndex ~ lnteger,


- - 8yVal objScanned A.8 qbSoanner.qbSca.nDor .. _
ByR.l!!lf oolPoU.-b kt c.ollt!lloti.on,
8yVal lnt'!.n.dlnclex N tntoqer, -
Byval s~rsouzoeCode ~ Str1n9 .. -
ByRef objCoDstantValue As qbVarIable.qbVariable,
ByRef aolvariab1e8 Aa Co11~otion, -
ByVal intCount.Prev1.ou. g Inteqor,-
ByRef boosldeEffects ~ Boolean, -
ByVal intLevel M Integer) IU Boolean

Figure 7-1. addFactorRHS procedure header

Note that the methods in a subset of the Private methods of quickBasicEngine


have a common form starting with compiler_addFactor. They all start with
compiler_, and in place of a sentence or phrase describing purpose, their
comment header is a BNF production. These procedures are recognizers for
productions. They are passed the same set of parameters, including intIndex
(the next token) and the scanned code in objScanner. They attempt to recognize
the production in their header comment, returning True on success and False
on failure. On success, they solemnly guarantee that intIndex (which is passed
ByRef) will point one beyond the end of the successfully recognized context. On
failure, they make an equally solemn covenant that intlndex will have its value
on entry.
These procedures could have been generated by an automated tool (with
minor changes to the generated code), but were instead generated by an ordinary
slob (me), using the very good editor in Visual Studio in a few hours. One flaw is
that making systematic changes to these procedures is tedious and error-prone.
Another possible flaw happens to be the pile of parameters you see in the
procedure, commencing with the ByRef token index intIndex. It exists because
the original version of the code was written in a non-object-oriented style: one
single object not factored into objects for scanning and for variables. This
object did not have a single state in Common Declarations containing all change-
able object-level variables. For this reason, information about the compiler
state, including the token position, was passed as a combination of reference
and value parameters.
In the current version of the compiler, the complete state of the compiler
is in one place (the OBJstate object in the Common Declarations section of the
quickBasicEngine class), and all parser procedures could access this one place.
However, I decided to keep the older standard of passing items as parameters
and not referring to the state in most compiler procedures, because it shows
clearly what each procedure is interested in evaluating and changing. Having
said this, however, in my next language engine, I will refer instead to a common
state, or a state passed as a reference parameter.

176
The Parser and Code Generator for the QuickBasic Compiler

Figure 7 -2shows the beginning of the procedure body. The first step raises
an event that indicates to the GUI that a parse is starting. Note that the parse is
tracked using three events: parseStartEvent, parseEvent, and parseFailEvent. In
the "The Dynamic Big Picture" section later in this chapter, you will see how the
GUI uses these events for progress reports.

raiseEVent :!'paI'seStartEvent", "a.cldFa.otorRHSU)


If intrndaX > int.EndInclex Then Return canpller.....J><U"SeFall_I"addFaotorRBS")
Dim in tlndexl As Integ ..r - intInclex
Dim st.rM.IlOp As String
If Not. oonpl1.. r 1lU10p lobjScannaci,
- - intIndex, -
strSOurcecOde,
strlolllOp,
int.Lavel "-1) Then Return oo:rpiler....J>ilrs ..Fall_I"addFaotorRHS")
Dim objConstantvaluaRHS 1>.5 *Varlable . qbVarlable
Dim booSideEffeotsLBS 1>.5 Boolean - boOSideEffeots

Figure 7-2. addFactorRHS start

The next step is to check whether the token index is beyond the end of the
code, and the end of the code is passed by value as intEndlndex. It is not equiva-
lent to the end of the source program or immediate expression, because when
compiling inside a parenthesized sub expression, the end index for the expres-
sion parser, of which addFactorRHS is a part, is the position of the closing
parenthesis. The expression parser calls itself recursively when it finds a paren-
thesized subexpression. This step is not needed in recognizers that can assume
when called that the main index is not beyond the end of the context. But here,
addFactorRHS is called in a recursive loop when it finds a multiplicative operator
and a multiply factor (see the BNF in Figure 7-1) . On entry, it needs to know
whether it is past the end. If this index is past the end, failure is indicated by
calling the compiler_parseFail_ procedure, which calls parseFailEvent and
returns False.
Then compiler_mulOp_ is called to check for a multiplication operator (aster-
isk, forward slash for normal division, backward slash for integer divide, or Mod).
Note that we have, in terms of American baseball, struck out if a multiplicative
operator is not found and must call the parse failure routine in this case. This is
because the BNF requires that the addFactorRHS start with a multiplicative operator.
The next two Dim statements exploit an elegant new feature of .NET: its allo-
cation of local (Dim) variables just in time. The first Dim both declares a save area
for the token index and initializes it to intlndex. The next Dim just declares a string
work area.

177
Chapter 7

NOTE Previous editions ofVisual Basic 6 allocated all local variables on entry
and deallocated them on exit, and their name scope was the complete proce-
dure. In Visual Basic .NET, variables are assigned storage and their default
value (or the value assigned in the Dim statement) when the Dim statement is
encountered. If the Dim statement is encountered in an If .. Then .. Else •. End
I f structure, or inside a loop, the variable may be referred to inside that
structure. However, if the variable is "Dim (1" between I f and Then, it may
not be referred to after the Else and before the End If. The variable loses its
place after control leaves the structure. For practical coding, this means
that variables can be placed near the point of use, which makes code more
readable. The only execution penalty occurs when the variable is defined in
a Do or For loop, and for this reason, variables should be allocated outside
Do and For loops.

As shown in Figure 7-3, the next step checks for a multiply factor to match
the multiplicative operator. If this isn't found, the compile fails. Otherwise, we
can emit code in compiler_binaryOpGen_, which will emit the interpreter's
Multiply opcode.

If Not compiler mulFactor (intIndex,


-- - objScanned~
colPolish,
intEDdIndex,
strSourceCode~
objConstantValueRHS,
colVariables,
bOOSideEffects~
intLevel + 1) Then
Return ccmpiler--yarseFail_ ("adciFactorRHS")
End If
objConstantVaIue - compiler binaryOpGen (st~uIOp,
-- - booSideEffeotsLBS,
ObjConstantValue,
booSideEffeots, -
objConstantValueRaS,
colPolish,
intIndexl

Figure 7-3. Next step in addFactorRHS

Because of the optimization features of constant evaluation (folding) and


lazy evaluation (discussed in the "Code Optimization" section later in this chap-
ter), we need to call the code generator through a wrapper for all binary operators.
This wrapper will not only generate code, but it will also test for the opportunity
to combine constants, as in the expression a+1+2. This will take care of associative

178
The Parser and Code Generator for the QuickBasic Compiler

operators, including addition, subtraction, multiplication, and division, where


all operators to the left must be evaluated first.

NOTE It's true that addition and multiplication are "symmetrical" such that
they can be evaluated either way. But quite apart from the issue of Mod, it is
bad practice for compiler writers to depart from the language specification,
which typically will tell them how to evaluate and in what order. This is
because of the finite precision ofcomputer numbers, which will gilJe different
answers if the elJaluation order is changed.

Then we can proceed to the rest of the RHS, shown in Figure 7-4, for the
case where there are several operators of multiplication precedence, such as
a*b/ c Mod d. We note the current position and recursively call ourselves. If this
returns False, and the noted index is the same as intIndex, this means that we
have gone past the end of the RHS and are finished. Whatever lies to the right
(probably a newline) is the concern of another grammar symbol (probably the
statementBody).

Dim intIndex2 As Integer - lntlndex


If Not compiler addFaotorRHS (intIndex,
- - obj Scanned-;-
colPolish ,
intEndIndex-;-
strSourceCode-;-
objConstantvalue,
colVariables,
intCountPrevfo;is + 1,
booSideEffeots,
intx.evel + 1)
AndAlso
intIndex2 <> intIndex Then Return canpiler-yarseFail_("addFaotorRBS")
corrpiler-yarseEvent_ ("addFaotorRBS", _
False,
intIndeil,
intIndex - Int I ndex 1 ,
lntcountprevious,
oolPolish. Count - Intcountprevious,
' -
intLevell
Return (True)

Figure 7-4. End ofaddFactorRHS

For unusually long expressions, such as a*b/c/d/e*f, the compiler_addFactorRHS


calling itself for each RHS will load the stack proportionally. However, deep stack

179
Chapter 7

programming is no crime, 2 as long as the underlying runtime efficiently pushes


and pops the stack.
If at any time the recursive call yields False but the index changes, this is
an error, and the procedure returns False, which is the value of the parseFail
procedure.
The final step fires the "parse event," which is emitted by parser procedures
when a grammar symbol has been successfully parsed. quickBasicEngine uses
events for reporting its progress (as does qbScanner, described in Chapter 5). The
compiler produces a progress report that identifies the grammar category and
the start and end of the corresponding source code during parsing.
The initial compiler referenced the Wmdows Forms object and had extra code
to display forms it built on the fly with progress information. However, after I con-
verted this edition of the compiler to a Web service for a potential client, I realized
that it was quite wasteful to have an engine that referenced both Web GUI tools
and Windows forms, not only in terms of total size, but also in terms of software
maintenance. Therefore, this version avoids any references to GUI namespaces.
Instead, it produces events to show its progress, and these events can be used to
show progress on Windows or the Web, or ignored in a server.
The final step is to return True, showing success.

Why Not Use an Automated Tool Instead of Manual


Recursive Descent?
I cranked out all of the compiler parsing procedures manually in a day or so,
using the excellent GUI of Visual Studio.
An automatic code generator using top-down recursive descent can be built;
I wrote one in the Hexx language. However, I think that if the programmer has
a good editor, developing a parser generator doesn't save enough time to be
worthwhile. Indeed, commercial compiler developers have, in place of simple
parser generators, more comprehensive compiler kits, which handle a variety
of tasks, including lexical analysis, parsing, and code generation.

2. Saul Rosen's 1968 anthology of compiler papers, Programming Systems and Languages, showed
it was clearly the Europeans who liked the stack, while Americans like John Backus were opti-
mizing small sets of registers, swapping data in and out. Later on, Calvin Moore's Forth
language showed that in terms of expressivity, deep stack languages were better than
register-oriented languages, including Fortran, for complex languages. In 1979, I imple-
mented a compiler and interpreter in lKB on a programmable calculator for the stack
language Mouse, a simplified Forth. This language had the semantic power of Visual Basic,
including recursion, but used single characters for operations to save space.

180
The Parser and Code Generator for the QuickBasic Compiler

Manual recursive descent is no more efficient than an automated tool, since


these tools use best practice. Therefore, I do not justify the manual method as
more efficient. Instead, it demonstrates in code how a parser works. The best
reason for learning to write a manual recursive descent parser quickly is that
quick parsers for small languages can be developed rapidly, as you've seen in
the toString/fromString languages of qbVariable and qbVariableType,
described in Chapter 6.
Visual Basic programmers of the world unite, for I think I have shown you that
you can write the most critical parts of a compiler simply. This shows that lan-
guage snobbery in computer science is almost as bad as any illusion that any
human language is better than another.
Uyou desire to write compilers for a living, you must learn C++, yacc, Bison, and
other tools of the trade. This is because so many existing compilers are written
in C++. But you can obtain an understanding of what goes into building a com-
piler, which is critical to your success as a compiler writer, in Visual Basic or
C#. To see what I mean, convert the entire compiler to C++ and C#, or any lan-
guage you do not know, in order to learn the language and the structure of
quickBasicEngine at one time.

The qbPolish Object


As you'll see in the next chapter, the virtual machine for executing compiler code
is included in quickBasicEngine as the Nutty Professor interpreter. In the code, it
comprises all methods that start with interpreter. Like the simple virtual machine
for arithmetic expressions presented in Chapter 3, this machine obeys Polish
instructions (so-called because the operators follow the operands in Polish logic)
that interact with an interpreter stack to produce results. Recall that in Chapter 3,
we compiled expressions into simple Polish tokens to calculate expression values.
Here, we will examine the qbPolish object for representing operation codes
usable by the Nutty Professor interpreter. The qbPolish object consists of data and
is only a carrier of this data. Other than logic to support its core methods, the
qbPolish object doesn't do anything to its data. For this reason, a test interface (as
seen in Chapter 6 for qbVariable and qbVariableType) hasn't been provided for
qbPolish.
The core object2XML method shows the state of all Polish objects emitted for
the simple program Print (Hello world,' as shown in Figure 7-5. (ThisXMLis
generated by the quickBasicEngine.object2XML method, available on the extended
GUI, discussed in the "The Dynamic Big Picture" section later in this chapter.)

181
Chapter 7

<!-- Polish collection -->


<colPolish>
<qbPolishOl><qbPolish>
<!-- Indicates usability of object -->
<booUsable>True</booUsable>
<!-- N~es the object instance -->
<strName>qbPolish0002</strName>
<!-- :dentifies the op code -->
<enuOpCode>opPushLiteral</enuOpCode>
<1-- Identifies the first token responsible for this op code -->
<intStartIndex>2</intStartlndex>
<1-- Identifies how many tokens are responsible for this op code -->
<intLength>l</intLength>
<!-- Identifies the operand (1£ any) -->
<objOperand>~quot;Hello world~quot;<lobjOperand>
<1-- Canments the opcode -->
<strCOrnment></strComment>
</qbPolish></qbPolishOl>
<qbPolish02><qbPolish>
<booUsable>True<lboOUsable>
<strName>qbPolish0003</strName>
<enuOpCode>opPushLiteral</enuOpCode>
<intStartIndex>3</intStartIndex>
<intLength>l</intLength>
<obJOperand>String:vtString(ChrW(13) ,amp; ChrW(lO»</ObjOperand>
<strComment></strcomment>
</qbPolish></qbPolish02>
<qbPolish03><qbPolish>
<booUsable>True</booUeab e>
<strName>qbPolish0004</strName>
<enuOpCode>opConcat</enuOpCOde>
<~tStartlndex>3</intStartIndex>
<1ntLength>l</intLength>
<objOperand~quot;~quot;</ObjOperand>
<strComment~/strcomment>
</qbPolish></qbPolish03>
<qbPolish04><qbPolish>
<booUsable>True<lbooOsable>
<strName>qbPo ishOOOS</strName>
<enuOpCode>opprint</enuOpCode>
<intStartIndex>3</intStartIndex>
<intLength>l</intLength>
<objOperand>,quot;,quot;</ObjOperand>
<strComment></strcomment>
</qbPolish></qbPolish04>
<qbPolishOS><qbPolish>
<booUsable>True</booOsable>
<strName>qbPolish0006</strName>
<enuOpCode>opEnd</enuOpCode>
<intStartIndex>3</intStartlndex>
<intLength>l</intLength>
<objOperand>'quot;,quot;</ObjOperand>
<strComment></strcomment>
</qbPolish></qbPolishOS>
</colPolish>

Figure 7-5. qbPolishXML

Each opcode is a stateful object, rather than a mere opcode enumerator value
as in the example in Chapter 3. Each opcode exposes the operation defined as an
enumerator, an operand in some cases, a comment, and its source as the start
and the end of the source code responsible for the opcode.
Figure 7-5 shows the following for each of the five opcodes generated for the
Print 'Hello world' program:

182
The Parser and Code Generator for the QuickBasic Compiler

• Each instance is usable (it better be), and its name is qbPolishnnnn, where
nnnn is the sequence number of the object, as generated within one com-
piler invocation.

• The opcode is specified next. The first opcode is pushLiteral, which causes
the Nutty Professor interpreter to push its operand (the string "Hello world")
on the interpreter's stack.

• It is important for debugging tools that the qbPolish object model support
linkage of the object code back to source. Therefore, the next two tags spec-
ify the first scanned token responsible for generating this opcode and the
total number of scan tokens responsible. Only one scan token- the quoted
string "Hello world"-is literally responsible for emitting pushLiteral.

• The next XML tag provides the literal operand for pushLi teral.

• The last XML token for each Polish opcode is a comment that can be asso-
ciated with the opcode.

NOTE To keep the display of the Polish code in XML to a manageable size,
I used the GUl to set an option in the compiler (on the GUl's Tools ~ Options
menu). It suppresses the generation ofPolish opcodes for Rem statements and
the addition of comments to Polish opcodes. However, by default, the compiler
generates Polish opcodes for Rem statements and comments opcodes to allow
the Polish code to self-document at some cost to its speed. When the option to
suppress this material is not in effect, the last XML token's value will be push
string constant.

As you can see, a collection of Polish opcodes is created by the compiler, and
it can be converted, along with the rest of the compiler's state, to XML.

Code Optimization
Ever since the first compilers, compiler developers have noticed that compilers
can assist the programmer in generating efficient code. In this section, I'll
demonstrate two entry-level techniques for code optimization. In MSIL code
generation, more advanced techniques, such as the global analysis of blocks and
their structure, are not important in the front end of a compiler, since JIT parsers
do a lot of optimization behind the scenes.

183
Chapter 7

quickBasicEngine does simple optimization in the form of constant folding


and lazy evaluation.

Constant Folding
A surprisingly large number of programs contain expressions like 32767-2, where
a piece of the expression or the entire expression consists of constants, and it's
pretty obvious that this expression is mathematically and computationally
equivalent to 32765. The reason can be clarity of expression, the use of symbolic
constants, or the generation of code automatically.

NOTE A common example is found in the quickBasicEngine code itself,


which contains concatenated strings segmented for readability on different
lines. "A" & _ <newline> "B" is reduced in scanning to the obvious constant
expression "A" & "B", and the compiler reduces this, by the method described
in this section, to one string.

I suppose you could write a preprocessor to simplify the source code. However,
this would complicate your life. That's because a decent preprocessor would need to
parse the complete source program. Even if you could cleverly factor the job and
reuse the same parser in both your preprocessor and the compiler, there still would
be two passes in the old style, and the optimization pass would be a waste of time
for most programs not containing constant expressions. Instead, it's relatively easy
to perform such calculations during parsing.
Take a look at Figure 7-1 again. One of its numerous parameters (which,
as I mentioned earlier, could be part of state) is objConstantValue, a qbVariable
described in Chapter 6.
In the example of 32767-2, on entry to addFactorRHS, objConstantValue will actu-
ally be 32767. This is because addFactorRHS is called exclusively from add Factor, and
add Factor's first BNF step (in its BNF addFactor : '" mulFactor [addFactorRHS]) is to
check for a mulFactor.
The check for a mulFactor will descend to compiler_term_, whose job it is
to check for the basic term of any expression. This can be a number; a string;
a subscripted or unsubscripted identifier; a function call; or a complete, inner,
parenthesized expression.
In the scenario, however, compiler_term_ will find the scanner token corre-
sponding to the number 32767. Because it has found a constant, compiler_term_
will set its by reference objConstantValue parameter to a qbVariable of the most
appropriate type (QuickBasic integer) for the token value, which in this case, just
fits into a QuickBasic integer, represented by a .NET short integer.

184
The Parser and Code Generator for the QuickBasic Compiler

When addFactorRHS moves on past the multiplicative operator to find another


constant, it then has constant values other than Nothing in objConstantValue (the
left-hand side value 32767) and objConstantValueRHS (the right-hand side value 2).
The procedure compiler_binaryOpGen_ is responsible for emitting code for binary
operators-including multiply, divide, and Mod-and it finds that the operation
can be performed on the values.
But there's a tricky aspect to this. The compiler_binaryOpGen_ method cannot
use .NET arithmetic to perform the evaluation, since this will give different results
from QuickBasic in some cases. Instead, it must use the Nutty Professor inter-
preter to perform the evaluation, or the compilation with constant folding might
give different answers than compilation without this optimization. Furthermore,
the binary operator generator must use the settings of quickBasicEngine. Suppose
that you were to add a setting that affected the way in which arithmetic was per-
formed (not likely, but possible). The compile-time evaluation must perform
exactly as the engine has been set. Fortunately, a nifty method of quickBasicEngine
is available: the evaluate method, described in the next section.

NOTE You can tum off constant folding by using the Constant Folding prop-
erty of the quickBasicEngine object.

The Evaluate Methods


quickBasicEngine supports three functions along with corresponding methods:
evaluate, eval, and run. You can include the functions in source code, and you
can execute the methods from .NET programs.

• The evaluate method evaluates an expression using all the options and
data of the engine running evaluate. The expression may be a single
expression or a series of assignment statements, each prefixed by the key-
word Let, followed by an expression using the assigned variables.

• The eval method is "lightweight." It creates a new instance of


quickBasicEngine to ensure that the expression is evaluated using the
default values of all object properties. eval is lightweight because it is
Shared. While quickBasicEngine.evaluate(string) must be run on
a New object, qUickBasicEngineval(string) may be run on a code-only
object that has not been created.
• The run method runs complete programs consisting of one or more exe-
cutable statements.

185
Chapter 7

The evaluate, eval, and run functions are not available in their full glory in
Visual Basic and other commercial products, for a very good reason. If a prod-
uct with the power of Visual Basic exposed the general ability to interpret the
code of the language submitted as strings, developers could rather easily
license copies of Visual Basic to others, by using Visual Basic to build a GUI.
Of course, corporate developers are allowed to extend the power of the VBA
language engine to users with no fees for internal applications, but shrink-wrap
vendors cannot do this. One of the joys of developing software for open release
is the fact that you don't need to worry about giving away the store, since you
have already done so.

Evaluation
The evaluate method takes a string consisting of a QuickBasic expression, com-
piles the string, and returns its value as a qbVariable. In fact, an evaluate function
is provided as part of the language- evaluate(string) will return its value as
a QuickBasic type. This function has been used in several Basic compilers, and it
was supported in the Rexx language. It is very useful because it allows the devel-
oper to extend to the user the ability to specify logic as data and business rules.
Here, compiler_binaryOpGen_ can call compiler_constantEval_ to do the eval-
uation using the current settings of quickBasicEngine, using an internal evaluate
method. compiler_binaryOpGen_ also implements the second form of optimiza-
tion, known as lazy evaluation.
Lazy evaluation is the elimination of mathematical, logical, and string oper-
ations known to be unnecessary. Examples include A+O (always the same as A),
B And False (always False), and C & "" (always the same as C).
Lazy evaluation is related to math arcana in the form of the theory of groups.
In math, a group is a set of elements (such as numbers, Boolean values, or strings)
and a set of operations defined over those elements, usually an additive opera-
tion and a multiplicative operation. Also, groups have a unity element and a zero
element. The unity element of a group is characterized by the fact that whenever
it is applied to another element using the multiplicative operator, the value of
the second element is unchanged. The zero element has the same effect when it
is applied using the additive operator. The unity element of the group is one. The
zero element is, of course, zero. The unity element in the group that consists of
the Boolean values (True and False) and their operators (Or and And) represent
truth, whereas its zero element is falsehood.
The group consisting of strings has, strictly speaking, no multiplicative oper-
ator, but it has addition cognate to string concatenation. Therefore, although
strings have no unity, their zero is the null string.

186
The Parser and Code Generator for the QuickBasic Compiler

This means that eompiler_binaryOpGen_ can apply the same logic to the binary
opemtor when one ofits pammeters objConstantValueLHS or objConstantValueRHS is
the unity, or zero, element of their group. Here are some examples:

• When either is zero and both are numeric, the code for stacking the alter-
nate element can be generated instead of addition.

• When either is one and both are numeric, the same code can be generated
instead of multiplication. In fact, this code replaces division when the
RHS is one.

• When either is False and both are Boolean, the code for False can be gen-
erated instead of And.

• When either is True and both are Boolean, True can be generated
instead of Or.

• When either are zero-length and both are strings, the non-null string can
be stacked instead of using a concatenation.

Optimization in the form of constant folding and lazy evaluation can be


applied by the recursive-descent parser inline in place of a separate pass over
either the source code or the object code, and this may be one benefit of devel-
oping the recursive-descent parser by hand. 3

The Architecture of the Compiler


The overall architecture ofthe compiler as a Visual Basic .NET object solution
is illustrated in Figure 7-6. You can see the delegated relationship of the com-
piler objects. Notice how it shows that an object "has a" collection of delegates
with a double line. There is only one scanner, for example, which has a collec-
tion of tokens.

3. Of course, there is no reason why in yaee the optimizations could not be inserted in tags, but
overall, the manual method provides a little more insight into what's going on under the hood.

187
Chapter 7

quickBasicEngine np Interpreter

Figure 7-6. quickBasicEngine object overview

The Nutty Professor interpreter is embedded in the code, and therefore more
closely bound, or bolted on, the engine. This is mostly an artifact of scheduling
pressures; ideally, the Nutty Professor interpreter would be a separate object.
Figure 7-7 shows the solution architecture of the testing GUI, qbGUI. Most of
the projects in the solution should be either familiar or understandable. qbGUI is
the startup project and the only form-based project. You will recognize old friends
described in previous chapters, including qbScanner, qbVariable, and qbVariableType.
Also notice that documentation can be attached to a .NET project. Here, each
project has a readme.txt file, which describes the project goals, changes, and
open issues. The solution as a whole has a solutionReadMe.txt project with the
solution goals, changes, and open issues.

1;1, Solullon 'qbGur (13 proJects) - ~ qbSc~nner 8 j,iI qUIck8aSlcEnglne W wlOdowsUblrties


F @ collecttonUtllitJes ~ References References '" References
ll:;
i:.iI References :!l AssemblyInfo.vb ~ coliectlonUbhtles ~ AssemblyInfo.vb
AssemblyInfo.vb ~ qbSc~nner.vb -0 qbOp !!l
Re~dme.beI
coliectionUtlhbes.vb ]) re~dme.beI 00 qbPohsh EI
wlndowsUblitJes.vb
EJ coliectlonUtlhbesX.vb - @qbToken .CJ qbS~nner @zoom
.;: IW qbGUI r+ .:>iI References qbToken References
References ~ Assemblylnfo.vb qbTo enType ~ AssemblyInfo.vb
EJ AssemblyInfo.vb ~ qbToken.vb qbVar1~ble ID
zoom.vb
eventlogForm~t.vb !!J re~dme.beI •.:;) qbVar1~bleType Solu on Items
- Forml.vb ~ qbTokenType • System GJ solutlonRe~dMe.beI
- op~ons.vb References oCJ System.D~tlt
[!J re~dme.beI EJ AssemblyInfo.vb oCJ System.XML
- run.vb :!l qbTokenType.vb .CJ u I,tles
qbOp Wi qbvanable Assemblylnfo.vb
~ .., References • '" References ~ qu'ck.B~!>cEnglne.vb
EJ AssemblyInfo.vb ~ AssemblyInfo.vb l!J re~dme.beI
EJ qbOp.vb EJ qbv~n~ble.vb ~ Utlhties
(jjl qbPollsh :!J readme.beI ~ • References
References 8 ~ qbV~n~bleType ~ AssemblyInfo.vb
EJ AssemblyInfo.vb • .::21 References ~ commonRegulerElcpreSSlons.vb
EJ qbPolish.vb ~ AssemblyInfo. vb !IIreadme.1X!
!il re~dme.beI .:!J qbVariableType.vb ubhtles.VB
~ re~dme.beI

Figure 7-7. qbGUI solution architecture

188
The Parser and Code Generator for the QuickBasic Compiler

The following projects in the qbGUI architecture may not be familiar:

• collectionUtilities is a project for the collectionUtilities.dll. This object,


which is stateful but lightweight, consists of a set of utilities for dealing
with collections, including converting them to and from strings. It is useful
with collections containing subcollections, and the compiler uses this type
of structure to represent trees.

• utilities generates utilities.dll, a stateless collection of utilities for math


and string handling. It is independent of any presentation environment
such as Windows or the Web.

• windowsUtilities generates windowsUtilities.dll, a stateless collection of


utilities for Windows presentation.

• zoom generates a visual object for "zooming" in on Windows controls and


expanding their contents to a read-only text box, from which they can be
copied to the Clipboard. 4

In the next section, we'll run the compiler, which includes the ability to
examine its operations. This will give you the "dynamic" big picture.

The Dynamic Big Picture


As noted in the previous section, qbGUI is the testing GUI for quickBasicEngine.
Obtain the code for this book from the Downloads section of the Apress site
(www.apress.com). if you haven't already done so. Run qbGUI.exe, which will be in
the bin me of the folder labeled qbGUI. The first screen that you see (the Easter
egg) appears only the first time you start the program, as shown in Figure 7-8. 5

4. In general, I try to stay away from developing a lot of my "own' visual controls, so forms don't
become too "welcome-to-my-world." However, the zoom project seemed to fulfill a genuine
need.
5. You'll notice the quotations on the qbGUI Easter egg. I first saw quotations in Bill McKeeman's
excellent, now out of print, book, on how to write a parser generator in PLlI, A Compiler
Generator. In introducing the need for a formal BNF notation, he quoted American poet
Emily Dickinson: "After great pain, a formal feeling comes." It was a reminder that the ulti-
mate guarantor of software correctness and efficiency is the person behind the machine.

189
Chapter 7

Edward G. Niges' SImulation of Mtrosoft Qutk Basic

ThIS applcaoon and form IS an Interface to the qulCkBasiCEnglne class

This dass comples and interpretiVely runs a subset of the Qulck BasIc language. Note that the
ph rase Qulck Basic Is the Intelectual property of the 1lcrosoft corporation.

ThiS dass was developed by

Edward G. Niges
splnoza11l}@yahoo.COM
http://members.screenz.com/edNlges

To Darlene, EddIE! and Peter (junglee Peter): for In dreams begin responSlbt!t1es.

"computer science is no more about computers than astronomy is about telescopes."

- Edsger Oijkstra 1930.•2002

"But the man who knows the relation between the forces of nature and action, sees how some
forces of Nature work upon other forces of Nature, and becomes not their slave.·

- Bhagavad-Gita

"I could be bound n a nutshel and count myself long of absolute space."
- Shakespeare, Hamlet

OK

Figure 7-8. Easter eggfrom qbGUI

NOTE qbGUI will use your registry with sensitivity. It will create one folder in
the proper place (VB and VBA Program Settings) labeled qbGUI. At any time,
you can reset the product simply by deleting qbGUI.

190
The Parser and Code Generator for the QuickBasic Compiler

Click OK to see the simple qbGUI window shown in Figure 7-9. This screen
is meant for the public. Since you've come this far with me on the arcana of the
compiler, you might as well click the More button to see the full monty.6

~ Edward Nilges' Version of Quick sasic;,a Microsoft I ••

fie Toots Help

Evaluate I Run r I VleYl


output More II Close
Status Zoom

Figure 7-9. Simple interface

Once you have the expanded display, click the Replay check box in the
lower-left corner of the window. Then select File ~ Load Source ~ Code, navi-
gate to egnsf/Apress/QuickBasic, and obtain the file helloWorld.bas, to see the
display shown in Figure 7-10. This will allow you to examine a simple compile
operation step by step.

6. Full monty is British slang for "the whole thing." The term became more well-known after the
release of the film called The Full Monty, in 1997.

191
Chapter 7

Print ttRB.110 world» Cuslom8rEnglO"~ZQna

r Voew
""'pot
less Close
Zoom

~
X~L Inspect I Test Ir Test !MInt Ioct

Scenned T akens Zoom ParseOulline Zoom RPN Zoom Stock Zoom Storage Zoan

Figure 7-10. Thefull monty, obtained by checking the Replay box in the lower-left
corner and loading hello World. bas

The full display contains the code for the famous 'Hello world' program.
Click Run to see the screen shown in Figure 7-11. The most obvious effect is the
"green screen" output of running 'Hello world,' but more interesting is the his-
tory at the bottom (obtained by checking Replay).

192
The Parser and Code Generator for the QuickBasic Compiler

::.. F.dward ill!J(!s'v~ 01

Custom .. Englneetmg Zme

3/1212004 g, H, 31 AN Runnir.,. code &~ II' 2


3/12/2004 9:44:31 AH Running code .'to IP 3
3/12/200i 9,44,31 AM Running code 00 II' 'I
3/121200'1 g, H, 31 AM Runnin code n II' 5
XML I Inspecl I Tesl Ir TGSlevenl10a
Sconned Tok_ Zocm Porsa Ouln Zoom RPN Zoom SIIId! Zoom Slonoge
~:,cq=&:'D;: .source: coa.~ t.::a=. ... ~o ......
!lQ:J~c:e:pr09::&.II: eource c011e fro z opP",ut-":'1ceral sette,:
.o:JceF::-OQr~cly: ~o~:ce cod 3i o;c.ct;cac
op'@nCede = .so~:c:e code f:o& ! t opPrir.t
.lt~..:::eePl'oor .."t3O<1YI 8ouz:ce c 5 cp!r•.d
,,::::at.e:.oe:.t30C1.; .c~:c:t: CcC1r
...:nccr..dlt1onal: 30\src..e: ccd~
p:unt.: .o:.:..::c:. co4e tro= :.
e:x&zre:3.ltlo.t...:...l..!I:': .!IO:.lrce c~

Figure 7-11. Effect o/running the 'Hello world' program

Let's review this history to get an overview of how a very simple program is
handled by the QuickBasic compiler.

Scanning the Tokens


Click the Reset button at the bottom of the form. Then click the Step button twice.
This will scan the two tokens in the program using qbScanner, the Print identifier,
and the "Hello world" string, as shown in Figure 7 -12. qbGUI handles scanner
progress events with displays in its list box. Click the Zoom button at the top
right of the Scanned Tokens list box to see the full scan.

193
Chapter 7

lScli nnedTot _ Zoom


t.cte:r.'I'ypelder.t.1.U,er on

I
tokenTypeldentifier on line 5 at 1 to 5
tokenTypeString on line 13 at 7 to 19
1
Figure 7-12. Scan result

Essentially, two qbTokens exist in a collection in the state of the quickBasicEngine.


Confirm that their start and end indexes, as serialized, correspond to the start
and end in the code.

Viewing Parsing and Code Generation


Click the Step button several times and watch the parsing and code generation.
The first change to the screen will be the emission of code, because the compiler
generates a leading comment to the object code. Then the compiler will outline
the parse as it finds higher and higher level constructs, to arrive at the parse and
RPN code display shown in Figure 7-13.

Zoom RPN
p:ogrut: .o~ree eor:ie: t:CrI .:. c ....
SQu::c:eProQ':a:m: .o".;rc.e code t. c 4: opPuah!.l.u:ral "It_,. 1
",ou:coePro;-=alt3ody: source- c d 3; opPu!ll".:'l.t.e-ral "S'trl.
ope~C.ode! I: oa.rce c:04e trca .:.
aOUl'ce:Prcq:ra..a.Body: I:o~ee c- "cpirlr.t ........ : apPrl
oItat.va:ntSody: ",ource cOd 6 o~nc:l .......".: ~e=.t
u::condlc.1.cnal: .o-.;rce co e
prl.n~: so..:.:ce code from 1
exp: e aslcnL18t: • .o-.ree c,.::j

1 opPushLi.teral "Hello world"


2 opPushLi.t era1 String:vtString(ChrW(13) & ChrW(lO))
3 opConcat
4 opPri nt
5 opEnd
program: source code from 1 t o 19 : object code from 1 to 6
sourceProgram: source code from 1 to 19: object code from 1 to
sourceProgramBody : source code from 1 to 19: object code from
openCode: source c ode from 1 to 19 : object code from 1 to 5

Figure 7-13. Complete parse and code

194
The Parser and Code Generator for the QuickBasic Compiler

The parse outline may look intriguing. Click the Zoom buttons. From the
Zoom box, copy the parse outline to a Notepad or Word file (with Courier New as
a monospace font) to see the outline, as shown in Figure 7-14.

program: soorce code from 1 to 19: objeot code from 1 to 6


soorceProgram: soorce oOde from 1 to 19: objeot code from 1 to ~: Expression
sourceProqra.Body: ~OQrce code from 1 to 19: obJect code ~rom 0 to 4
openCOde: soorce code trom 1 to 19: objeot code tram 1 to ~
soorceProgramBOdy: sooree code trom 1 to 19: object code tram 1 to 4
statemene&ody: 50urce code trom 1 to 19: object code rrom 2 to 5
nDcondltlonal: soarce code from 1 ~o 19: object code rrom 2 to ~
print: sonrce code from 1 to 19; object code trom 1 to ~: Print . .
expresslonList: source code from 7 to 19: objeot oode fro. 2 to 2
expre55ion: source code from 7 to 19: object code from 1 to 2: No Or occors
orFactor' sonrce code from 7 to 19: object code from 2 to 2
andFactor: soarce code trom 7 to 19: object code from 2 to 2
Dotfactor: soorce code from 7 to 19: object code from 2 to 2
llkeractor: soarce code trom 7 to 19: object code trom 2 to 2
eoncatPactor: soarce code from 7 to 19: object code fro 2 to 2: No RHS
relractor: soorce code from 7 to 19: object code from 2 to 2: No RBS
addfactor: soorce code from 7 to 19: object oOde trom 1 to 1
molf4otor: soorce code from 7 to 19: object code Crom 2 to 2
powP4ctor: soorce code from 7 to 19: object code Cro 2 to 2
term: source code trom 7 to 19: obJect ood~ tram 2 to 2
IIPRlNT": soarce code t-rom 1 to ~: object code t"roJD. 0 to -1
tI PRINT": ~onTce code fro-. 1 to .5: object cOde :from 0 to -1

Figure 7-14. Parse outline

Now the outline just looks strange. What's going on here?


Notice that this resembles an outline for a paper, written by a graduate stu-
dent a hair from the edge, who forgot to number the lines, but who is, in general,
too conscientious. An outline happens to be mathematically equivalent to a tree,
and this is the complete parse tree for the 'Hello world' program.
The compiler per se does not produce this output. Instead, as explained earlier
in the chapter, it fires nonvisual events, which can then be displayed by the GUI.
You may have noticed while watching this display being built that it actually
appears in reverse, with the lowest element (term) appearing first. This seems to
contradict our description of the algorithm as top-down. But, in fact, it did oper-
ate top-down. It set itself the goal of program and kept calling procedures, down
to term. However, the lowest procedure (compiler_termJ was the first procedure
to find anything when parsing the expected material (an expression) to the right
of the Print keyword, so it was the first to fire an event. Therefore, the produc-
tions were added to the list box bottom-up. Also, Print , as a separate parse tree
element, is after the information to its right in the parse tree. This is because it
was parsed before the expression.
The outline shows how even a simple program has a somewhat complex struc-
ture. It is a program, which can be a source Program. A sourceProgram consists of one
or more Option statements, followed by a sourceProgramBody. A sourceProgramBody
consists of an open source (source code that isn't contained in a function or

195
Chapter 7

subroutine),1 mixed with functions and subroutines. Each statement is an optional


numeric or symbolic label, followed by a statement Body. A Print statement is an
example of an uncondi tionalStatement.
You get the picture. This is a "descent" right down to the term, which is the
string parsed for us by qbScanner, "Hello world."
The algorithm seems inefficient, but do not confuse efficiency with the fact
that you have to think. In fact, the time taken is in a linear relationship with the
nesting depth of the program-its If .. Then nesting, its loop nesting, and the
complexity of its expressions. As this overall complexity increases, the algorithm
takes more time, but in steady and nonexponential correspondence with the
growth of complexity. Therefore, even a silly statement with excess parentheses,
such as A = « « (5))))), won't generate an explosively increasing number of
steps, just a silly parse tree.
Note that this particular compiler never actually constructs parse trees,
although many compilers do. Many compilers will construct the parse tree in
memory, and this can be both time-consuming and space-consuming as pro-
grams get large. But our parse tree is lightweight because it literally exists only
as events fired by quickBasicEngine. The GUI can reconstruct the parse tree for
presentation completely from parseStartEvent, parseFailEvent, and parseEvent,
which you saw in the code for addFactorRHS earlier in the chapter. Using an idea
from MIS programming, this demonstrates separating the business logic (pars-
ing) from the presentation tier, because the business of a parser is parsing.
The reason for the linear relationship between source program size and time
is that there is little backup or lookahead in recursive descent. The only place look-
ahead occurs is in compiler_term_. When this method finds a left parenthesis at
the start of the candidate term, it must use a simple parenthesis balancing loop
to find the right parenthesis.

A Note on Tactical Parsing


"Tactical" parsing is the design and parsing of a little language for data and
business rules. Tactical parsing is useful in many hard problems such as this
one, where a large or infinite number of cases occur.
Many MIS projects have come to grief when the designers or programmers dis-
cover that the user did not want to handle a small number of cases but a large
number of combined cases. The usual strategy is more work, as the program-
mers chase their tails to code Case statements for the possibilities, until they
realize that they are engaged in a voyage measured in parsecs rather than yards.

7. A major difference between QuickBasic and Visual Basic is that in QuickBasic, executable
statements, as opposed to module-level declarations, can exist outside functions and subrou-
tines and form part of an implicit main procedure. I call this ·open" source (not to be
confused with either free software or free beer).

196
The Parser and Code Generator for the QuickBasic Compiler

Everyone then piles on the user to convince her that she was wrong in wanting
too much. Iwould ask her, like Leonard Cohen in Bird on a Wire, "Why do you
ask for so much? Why not ask for more?" That's because a user with many ele-
ments that combine may want, without being able to express it, not a set of
cases, but a language for describing cases, and programmers who can design
a language.
Of course, giving people a new language is a venture fraught with hazards. My
experience is that they are never grateful, and like Shakespeare's Caliban (in The
Tempest) are likely instead to say, "Thou taught'st me language, and my profit
on't, is I know how to curse." To avoid this, you can actually hide the language
in a GUI, or as we have done, make it strictly an internal language for produc-
tion and consumption by objects. The beauty of this gesture is that it typically
generates a more powerful system that is easier to debug and maintain.
For example, I worked at one firm that was trying to debug a program that ana-
lyzed phone records for billing purposes. The problem was that conference-calling
and other features interacted to produce an unlimited number of cases. I devel-
oped a language that specified the state transitions of the underlying switch
and an interpreter that simulated the switch in Cobol by reading the state tran-
sitions. I got the engineers to approve the state transitions, and then produced
bills by essentially Simulating the calls. Case closed.

Examining the Generated Code


Next, zoom the RPN box to examine the generated code, as shown in
Figure 7-15.

1 opPushLiteral "Hello world"


2 opPushLiteral String:vtString(ChrW(13) & ChrW(lO»
3 opConcat
4 opPrint
5 opEnd

Close

Figure 7-15. Generated RPN code for 'Hello world'


197
Chapter 7

The following instructions are generated:

1. The first instruction is the opPushLi tera1, which pushes its operand as
a qbVariab1e onto a stack found in the interpreter_ method.

2. Since the Print statement was not followed by a semicolon, by the


arcane (but perfectly formalizable, as we have seen in Chapter 4) rules
of QuickBasic, we must insert a hard Windows newline. Therefore, the
next statement pushes a qbVariab1e with the nonprintable value dis-
played, qbVariab1e. toString having converted the value to a display.

3. The next statement simply concatenates the string and the newline.

4. The string is then printed. Of course, for quickBasicEngine, which doesn't


know anything about either Wmdows or the Web, this is a can of worms.
It has no place, as a pure server, to place the string! Therefore, just as in
the case of progress, the engine fires an event with the print string, let-
ting the GUI worry about whether to display the text in a control (as we
have, in fact, done), format it using HTML, or even print it to a daisy-
wheel printer.

Finally, click the Step button a few more times to see the execution of the
program and its effect on the stack. There will be more about actual interpreta-
tion in the next chapter.

converting the State to XML


Next, move to the Customer Engineering Zone area, which allows you to con-
vert quickBasicEngine's state to XML and inspect the state for internal errors,
since quickBasicEngine implements the core object methodology described in
Chapter 5.
Click XML, and select and copy the results. Paste them in a maximized Notepad
file. The start of the XML will look like Figure 7-16.

198
The Parser and Code Generator for the QuickBasic Compiler

<!--
***************************************************************
*
* quickbasicEngine
*
*
*
* Tbis class compiles and interpretively runs a subset of the **
* Quick Basic language. Note that the phrase Quick Basic is *
* the intellectual property of the Microsoft corporation. *
** This class was developed by
*
*
** Edward G. Nliges
*
* *
*
[email protected]~t
http://members.screenz.com/edNilges
*
*
* *
***************************************************************
-->
<quickbasicEngine>
<!-- Indicates object asabi~ity -->
<booOsab~e>Trae</booOsable>
<!-- Object instance's name -->
<strName>quickbasicEngineOOOl 3/12/2004 9:58:29 AM</strName>

Figure 7-16. Start of quickBasicEngine's XML

In Notepad, scroll down to see the XML of the scanner delegate and the collec-
tion of qbPolish instructions. The end of the XML will show a variety of properties,
as you can see in Figure 7-17. Of course, for a simple program, most of these values
are default.

<!-- Collection utilitles -->


<objCollectionUtilities>collectionUti ... </objCollectionOtilities>
<.-- IndIcates compiled status -->
<booComplled>True</booCompiled>
<'-- Indicates assembler status -->
<booAssembled>True</booAssembled>
<!-- Source code type -->
<enuSourceCodeType>program</enuSourceCodeType>
<!-- Constant rOldtng -->
<booConstantFolding>False</booConstantFolding>
<!-- Removal of "degenerate" operations -->
<booDeqenerateOpRemoval>False</booDeqenera~OpRemoval>
<,-- Value of recent immediate expression -->
<objImmediateResult>&quot;&quot;</objImmediateResult>
<!-- Indicates wbetber Option Explicit 15 in errect -->
<booExpllcit>False</booExplicit>
<._- Subroutine/runctlon table -->
<~srSubFunction><subFunctionTable></subFUnctlonTable></usrSubFnnction'
<.-- Subroutine/function table index -->
<colSubFunctionlndex>emptyCollection</colSubFunctionlndex>

Figure 7-17. Part of the XML of quickBasicEngine

199
Chapter 7

The XML for the 'Hello world' program will consist of the usability and name
of the object, followed by the XML of the scanner state, as described in Chapter 5.
It will contain a null collection of variables (with the XML name colVariables),
because the 'Hello world' program doesn't contain any variables. It will contain
a Polish collection of opcodes identical to the one shown in Figure 7-5, and it
will end with a miscellaneous set of values for the engine, such as its queue of
information for the legacy Read Data statement.
XML allows us to capture not only a set of business rules as a QuickBasic
expression, but it also can capture execution properties that will change, subtly
or in the extreme, their evaluation. Using XML, we have a shot at capturing some
logic, including its execution environment, and placing it in a file.

The Non-Mercator Projection


What I am trying to master here is the non-Mercator, or nondistorting, "projec-
tion" of logic to data-its serialization. As you know, a Mercator projection map
distorts information, whereas a globe doesn't distort it.
Recently I had to fly from Chicago to Hong Kong. At first, my fellow travelers
and I were rather confused, because we went due north, and the route map
projected on the video screen showed us going a roundabout way. Of course,
we were going the shortest polar route in almost a straight line due north and
then due south. The trip featured spectacular polar views and the north coast
of Siberia, far more interesting than the movies on tap.
When we left the Canadian Arctic islands, the Mercator projection of the world
map made it appear that we needed to hang a left to get to Siberia, but a globe
makes it clear that you need to go straight.
Similarly, when a database that contains numbers that are in 16-bit integer for-
mat for a reason is converted to a database with 32-bit integers (because the
numbers were converted from I6-bit to text, and then by a C program to C 32-
bit integers), a business rule associated with the data for a reason (this field
must range between -32768 and 32767) has been lost.
The concern began when I realized that in popular databases, including SQL
Server, the insertion of a trigger or stored procedure could change the meaning
of a field. This means that statically comparing two editions of a database
would yield wrong results.

Inspecting the State


Next, click the Inspect button on the full monty display to carry out a rather
comprehensive inspection of quickBasicEngine's state, as shown in Figure 7-18.

200
The Parser and Code Generator for the QuickBasic Compiler

customer Engineering Zone


fl :he ob ect ~ust be usable: OK
~~~ck3asicEng~ne •
The ob ect ~~st pass ~ts own ~nspect~on:
scanne~
Inspect~on of nqbScannerOOO: 3/ 12/ 2004 9:58:5: ~
:he ob~ect m~st be usable : OK
Each token ~n both the arra of scanned tokens, !
:he ~ne nur.ber must be g~eater than or equal to
:he format of the l~ne nurrher ~ndex collect~on ml
:f the nonnul code ~s ful.y scanned, the f~rst t

If the code ~s nul and ~nd~cated as ful y scann!


:he Pol~sh co ect~on must be a collect~on of qbI
Inspect~on of qbPo ~sh ~nstance "qbPo ~sh0006" at
:he ob · ect must be usab e: OK
The operat~on code can't be the Inva id enumerat,
:he start ~ndex of the source code for the op mu!
Start ~ndex ~s 3: engt: is 1

XML Inspect I Test Ir Test event loa

Figure 7-18. Inspection ofquickBasicEngines state

The inspection is a monster, without apology. Some of its complexity is caused


by the inclusion of the results of inspection of delegates. For example, as you can
see in Figure 7-18, inspection first checks the scanner delegate after making sure
that quickBasicEngine hasn't marked itself unusable. In conformance to our core
methodology, the inspection checks for internal errors. These are errors caused
by my boneheaded coding or your own ham-fisted changes to the source code.
We have, as your manager would say, "drilled-down" through the levels of
quickBasicEngine. But what's amazing is that this engine is, in turn, supported by
the giants at Microsoft who developed the .NET framework. The fact is that the
engine can run code with reasonable efficiency, while producing an amusing
display of progress and monitoring its own health. What this means is that the
.NET Framework is very efficient, and there is no excuse for not adding qUality.

201
Chapter 7

Error Taxonomy
Three types of potential errors are recognized and handled by the
quickBasicEngine object:

Errors in logic of the code: There are two subtypes of this type of error:
bugs in the code I have developed and any errors you may add while
modifying the compiler. Errors in the logic of the code detected by this
code itself (such as in the inspect method) will result in calls to the low-
level errorHandler utility exposed by the utilities object, which typically
displays a message box. These errors will usually mark the instance object
as not usable, so it does not damage your data.

Errors in using the object interface: This type of error could be in a GUI
or an object using the QuickBasic engine as a Web service. Mistakes in
calling the object also result in calls to utilities.errorHandler, but do
not mark the object as unusable.

Errors in QuickBasic coding and logic: Errors in coding QuickBasic and


errors in logic are handled by a userErrorEvent for GUI display.

Summary
We have, I trust, cut to the chase. You have learned how to generate recursive-
descent parsers for a sizable language, as well as how to generate code, including
optimized code that eliminates unnecessary runtime computation.
We then ran the qbGUI program to step through an exceedingly simple program
and see how it is parsed. The same overall approach was used as in Chapter 3's tly-
over compiler, but here, the object model means that the elements handled are
themselves reference objects on the .NET heap, rather than simple values on the
.NETstack.
I hope I have shown that writing a compiler is a nontrivial task, but one that
is doable; for the dialogue in programming is always between simplicity and
complexity. In order to give the user a simple experience, we have to wrestle with
complexity.
The methods are not unique to Basic and can transfer to your own language
development. You may want to develop a language for the disabled and "parse"
their gestures. You may want to develop a language for dancers and "parse" their
leaps. You may want to teach language to gorillas in the wild. You may want to
develop a language to dodge responsibility and spin events to your best advan-
tage. You may have some killer ideas for a programming language, as did the
developers of Ruby and Python. Or you may wish to help a user who needs to
compile very old source code, for which the compiler has disappeared ("retro"
computing, or computing for old guys).
202
The Parser and Code Generator for the QuickBasic Compiler

If a sequence of gestures has a logic, it has a grammar. And if the grammar


can be formalized in BNF, a nonvisual language engine may do your altruistic job.
In the next two chapters, I will complete the picture by showing in more detail
how the compiled code can be assembled and executed. Chapter 8 will describe
how to develop an assembler and a software interpreter that uses .NET to simulate
an appropriate target machine.

Challenge Exercise
Run the qbGUI program. What happens when you click the Test button in the
Customer Engineering Zone of the full monty display?
Try testing some simple expressions. If you enter a math expression and
click the Evaluate button, the "green screen" will show its value. Try entering
a real program and clicking the Run button.

Resources
See Compilers: Principles, Techniques and Tools, by Alfred Aho, Ravi Sethi, and
Jeffery lTIlman (Addison-Wesley, 1985) for much more information about parsing
and compiler optimization. This is the famous "dragon" book referenced in ear-
lier chapters.

203
CHAPTER 8

Developing Assemblers
and Interpreters
No, I'm not interested in developing a powerful brain. All I'm after is just
a mediocre brain, something like the President of the American Telephone
and Telegraph Company.
-Alan Turing

TURING'S HOPE has not been realized; the CEO of AT&T was a pretty smart cookie.
Furthermore, if Dijkstra is right, the nature of computer intelligence is constituted
in the ability to manipulate symbols and not conscious choice and awareness.
However, Thring made a discovery in 1936 all the same. Obeying an algorithm
is itself obeying an algorithm...which manages to be obvious, or profound, or stu-
pid, or all three: "To follow the rules, follow the rules for following the rules."
In the last chapter, you learned how the quickBasicEngine generates code.
In this chapter, I will discuss the details of assembling code with jumps and with
Go To instructions and related issues having to do with assemblers.
The compiler generates qbToken objects and stores them in a collection. These
qbToken objects reference each other using labels, and in a rather clerical (but rather
tricky) operation, these labels must be translated to numeric addresses.
A more exciting operation, discussed in the second half of this chapter, is the
simulation of a computer by an interpreter. Rather surprisingly, even a complex
computer architecture can be completely imitated by another computer architec-
ture (even one less powerful or less complex) using software. In most cases, the
simulation will be slower than a native implementation, but this is not necessarily
the case when the computer doing the imitation is several orders of magnitude
faster.
This chapter will discuss assembly in the context of the simple assembler
embedded in the QuickBasic compiler. I will then discuss the design of the onboard
Nutty Professor interpreter, a software machine for executing the qbPolish objects
emitted by the compiler.

Assemblers
Let's take a look at assemblers in general and in their historical context, and then
examine the simple assembler embedded in the quickBasicEngine.
205
Chapter 8

Assemblers, in General
Assemblers have been around since the earliest days of computers, although para-
doxically, assemblers may postdate compilers. This is because the earliest computer
scientists, including Charles Babbage, John von Neumann, and Konrad Zuse, did
not work as lowly programmers. Instead, they prepared equations for the earliest
programmers to enter by keying or setting switches.
Of the three pioneers I have mentioned, Konrad Zuse also developed in the
early 1940s the PlanCalcul, which was a prototype of a high-level "compiled" lan-
guage, and this predates the first assemblers: the first compilers predated the first
assemblers (cf. A History ofModem Computing, Second Edition by Paul E. Ceruzzi
[MIT Press, 2003]).
Later in the 1940s, Grace Murray Hopper (an early ENIAC programmer and an
officer in the United States Navy) started to "reuse" the code for common equations
by borrowing tapes containing ENIAC codes and lending her own code in return.
This activity related more to the early compilers than to assemblers, but Hopper's
team was, as noted in Chapter 1, the first to see the economic value of saving pro-
grammer time.
The first assemblers were developed by working programmers to avoid hav-
ing to code in straight binary machine language. In fact, John von Neumann (the
Hungarian emigre mathematician who, at Princeton's Institute for Advanced Study,
is credited with the stored program concept) did not think that an expensive and
rare computer should be used at all to make its programmer's life easier.
Nonetheless, the earliest commercial mainframes of the 1950s were shipped
with assemblers after managers discovered that programming was much more
time-consuming than originally thought, and because compilers were harder to
develop at the time than assemblers (modem compiler theory not yet having been
developed) .
These early assemblers required the early programmer to specify actual
machine operations, but allowed him or her to identify storage locations with
mnemonic names. Such assemblers took over the job of aSSigning numeric
locations to the names.

From Machine Language to Assembler Language


In January 1970, I took one of the first computer classes offered for academic
credit at Roosevelt University, Chicago, which was taught by the great Max
Plager, still on the math faculty at Roosevelt. As I mention in Chapter 1, this
class was conducted during a lot of university upset and chaos.
Max had us students code our first program in actual machine language so we
would appreciate assemblers and Fortran more.

206
Developing Assemblers and Interpreters

I sat down in Northwestern University's then new library on a cold January


day and coded my first program, a program to make change for a ten dollar
bill, in machine language. Fortunately, the target machine was the decimal
IBM 1401, whose architecture allowed addresses to be specified as three-digit
decimal numbers.
These digits were in fact 6-bit characters in the 1401 64 character set. A rather
strange, but logical, system was used to "address" memory past location 1000
that was, over the lifetime of the 1401, expanded to several kilobyte models.
When writing in machine language, I had to create a careful flowchart and keep
track of the position of all variables, writing the code on a sheet of graph paper.
A philosophy major, engaged in an elaborate draft-dodging scheme, I was
impressed by the parallels between machine language and symbolic logic,
and math.
After coding, I then brought the program to the lab. Max showed us how to
punch it on cards and place the resulting deck behind a two-card loader. Max
and I watched as my program proceeded to load itself on top of the loader, for
I had forgotten that I wasn't supposed to use the memory used by Max's loader
between 333 and 400.
I fixed the problem and the program worked, but I certainly learned the joys, and
the miseries, of machine language programming. My lesson came in handy a year
or so later when I debugged the Fortran compiler, as described in Chapter 1.
Later on, as a 1401 programmer for the university, I would occasionally build
quick, one-time utilities by toggling in machine language from the auxiliary 1401
console. This was an era when many universities and corporations placed their
mainframe computers on display; Roosevelt's 1401 was behind plate glass win-
dows on street level at Michigan and Congress, and one block north a Burroughs
system was on display.
However, although the Burroughs programmers were men in gray flannel suits,
I killed the serious image Roosevelt wanted to project with a scruffy appearance
and shoulder-length hair. In a way, that has become standard; the administra-
tion put up with me to get results (including Fortran support as I've described)
as a one-man skunk works.

The writing of a basic assembler is easily mastered (if you ignore efficiency)
in any language that supports keyed collections.
The assembler must scan each line of assembler source code for constituent
parts: usually including an instruction label, a mnemonic op code, one or more
operands, and comments describing the operation. Older assemblers (including
the IBM 1401 "Symbolic Programming System") that I used forced the program-
mer to put these fields in fixed columns, usually on an IBM punch card.

207
ChapterB

Paper tape and newer assemblers (commencing with IBM 1401 Autocoder
and IBM 360 BAL) gave programmers more freedom because they allowed fields to
be separated with blanks or commas and used a primitive form of lexical analysis
as described in Chapter 5 to separate the individual tokens.
Operators were typically looked up in a fixed table of operators and their
numeric codes, typically using the well-known algorithm called a binary search.
The data labels were slightly more complex to find because a fixed table
could not be used. Instead, the best programmers of assemblers built tables of
the operands used and employed a hash method to access these tables.
In a hash method, a large number of names has to be mapped onto a limited
space for fast retrieval. Of course, if efficiency does not matter, you can simply
build (for example, using Redim in Visual Basic) a table that grows as more and
more distinct names are found and use a linear search.
However, the execution time formula grows rapidly as the number of variables
increases. Each time a variable is used, it must be looked up with, on average, n/2
probes of the list of variables. The execution time as a factor, not only of the num-
ber of variables, but also of the number of occurences of all variables, is m*n/2. As
m or n grow, the execution time slows dramatically.
Early assembler programmers therefore developed variants of hash tables,
and this technology was used by Microsoft to build the collection with a key.
The best hash algorithm will consider an identifier as a number and take
. some part of this number, which is bounded by the space available for the hash
table (like Hamlet, quoted in the qbGUI Easter egg of Chapter 7, the hash table
can be bound in a nutshell but counts itself king of infinite space). For example,
if 256 table entries are allocated, a fairly good (by no means optimal) hash algo-
rithm might take the last byte of the operand name as the hashed index; the last
byte is probably better than the first byte for most input programs to the assem-
bler, as the first byte might have distinctly nonrandom prefixes (many identifiers,
for example, might start with the letter I, and the use of systematic prefixes for
identifiers, sometimes known as Hungarian notation, will tend to create many
identifies with the same first letter).
However, because more than one operand can have the same last byte and
thus the same index in the table, the algorithm needs a plan for a "collision." It
turns out the most effective plan is simply to proceed to the next empty entry
of the table (wrapping back to the start of the table as needed) and use this as
the entry.
At worst, a small linear search, usually restricted to one or two entries, results.
One further complication occurs when entries have to be deleted in an assembler
or compiler that allows symbols of limited scope, which have to be thrown away
when their context is compiled; for example, a compiler that supports variables
local to procedures like the Visual Basic compiler must throwaway local vari-
ables. The Visual Basic 6 compiler had to throwaway all local variables at the
end of the procedure; the Visual Basic .NET compiler must throwaway variables

208
Developing Assemblers and Interpreters

at the end of each block as well as at the end of the procedure, because Visual
Basic .Net supports the declaration of variables inside For loops, Do loops, With
clauses, and other blocks.
The deleted symbol's hash table entry must be located and tagged as free;
but it is not quite the same as it was before it was used. This is because when
searching for a symbol that hashes to a location between the hash for the
deleted symbol and the deleted symbol, the freed entry doesn't stop the search.
Ordinarily, in searching a hash table just to find a symbol, the first unused
entry encountered shows that the search is complete, and has failed, because if
the symbol hashing initially to entry n was in the table, it would be in the first
available entry to the right of n, wrapping around to the beginning. But if
a deleted entry is found, it may have hashed to another initial starting location;
therefore, the search must continue.
The solution is to mark deleted entries specially so that they are distinct
from empty entries. For example, a .NET solution might be to set deleted entries
to a blank while making sure empty entries are Nothing. Many of these tech-
niques were discovered by early writers of assemblers.
But keep in mind you may never need to create a hash table for your parsers
and language tools because the classic collection of VB and other languages and
the .NET Framework collections solve the problem.
You can write a better-performing collection than those provided in the
.NET Framework by taking advantage of recent research in hash algorithms, or
by encapsulating knowledge of the keys to be hashed. But my experience in
doing this produced at best only a marginal speed advantage of about 15%.
One problem with the collection, whether used in Visual Basic 6 or in Visual
Basic .NET, is that it is a collection of untyped objects. This means in practice
that code that uses collections (of any type) might be altered, erroneously, to
contain objects of the wrong type. Also, retrieval is slowed because the pure
objects used by the collection need to be converted to the right type.
There are, however, many solutions for this problem.
The collection can be an object that inherits the collection member as
a base type. Or, you can wait for the next edition of Visual Basic .NET, which will
allow you to use "generic types." Or, you can bite the bullet and implement
a strongly typed hash table as an array with a strong type. Or, you can use the
solution of the quickBasicEngine, which is to use collections and to inspect them
for correct types.
Assemblers often have features to make programmers' lives easier. For
example, it was a chore to have to use a literal, such as the number one, by nam-
ing it and defining its position as a labeled instruction. Therefore, early common
assemblers allowed the programmer to use literal values, usually numbers, and
the assembler took over their assignment to storage.

209
ChapterB

A significant development was the macro assembler, which allowed the pro-
grammer to define sequences of defined opcodes as a new opcode. And it was then
a short step to the conditional macro assembler.
Conditional macro assemblers select sequences of code for assembly based
on conditions and the values of symbols. In fact, the Visual Basic preprocessor
statements that commence with the pound sign (such as #If, #Then, #Endlf) rep-
resent a conditional compiler version of this facility.
Conditional macro assembly was used mostly by manufacturers to ship cus-
tomizable and modifiable source code. For example, the IBM mainframe system of
the 1970s and 1980s, Virtual Machine, Conversational Monitor System (VM/CMS),
was shipped to large clients in the form of source code.
The client would set symbols in a special area or through a primitive GUI,
and the assembler would then use these values to select the actual source code
for that client's installation.
The modem C and C++ preprocessor is an almost complete conditional macro
"compiler" that supports the definition, assignment, and computation of compile-
time values in addition to traditional if .. then .. else statements.
Some conditional macro assemblers include the ability to branch to labels,
which meant that the engine underlying the conditional macro assembler was in
fact a general-purpose, simulated computer available at assembly or compiler
time that could engage in complex calculations to determine the final source
code presented to the assembler.
In fact, some of these products were not even used to generate code to the
assembler at all. Instead, they generated code for other environments or even, in
some cases, documents.

NOTE Working on an early cellular system, I used the conditional macro


assembler on the IBM mainframe to generate Z-80 assembly code for this
early microprocessor. It wasn't my idea, but it worked.

The nearly extinct language PLlI, developed for IBM mainframe program-
ming, extended all computational power to the macro writer with all the power,
and obfuscatory potential, that this implied. Basically, proprietary software ven-
dors don't deliver source code to their customers any more; therefore, there is no
reason to deliver, as before, highly and generally customizable source to cus-
tomers. But this may change. There is increasing interest in obtaining source
code instead of object code because of the greater quality and safety of the former,
whether as "open" source or as a commercial product. This may cause a return to
the shipment of source that can be customized, using a preprocessor.
Also, writing a full preprocessor would be a useful exercise and would create
a product unlinked to anyone programming language, because there is no reason

210
Developing Assemblers and Interpreters

why the macro processor has to care about the language it processes. It would
provide the ability to have a single source image of a large software system rather
than multiple copies with changed code, with one limitation: it might have trou-
ble with the fact that today, source code is created not as flat files but as project
"trees."
For example, a large Visual Basic source could be used with a preprocessor
to generate either Visual Basic 6 or VB .NET source code.
Here is an example. The following code uses a C preprocessor to condition-
ally generate a debugging statement:

#if (debugMode)
#define DEBUGCproc,msg) MsgBoxC"Debug message from ...
#else
#define DEBUG(proc,msg) , No debugging
#endif

The DEBUG symbol is a macro symbol. When debugging is in effect, the DEBUG
symbol is replaced here by a MsgBox that includes the parameter names proe and
msg: when debugging isn't in effect, the DEBUG symbol is replaced by a comment.
The preceding example is not usable inside the Visual Studio GUI (because
the C preprocessor is not called forVB .NET programs), but you could create an
external build system that would work if you needed to.
The advantage of the approach is that one source representation can sup-
port debugging after the system is placed into production, with no runtime cost.
Macro assembly and preprocessors, especially the c++ preprocessor, however,
have a terrible reputation, and programmers who use the C and c++ preprocessor
have been known to be punished by 20 lashes with a wet noodle.
There are two reasons for this. One was pointed out to me by my son several
years ago when he taught me about object design. Many of the jobs that were for-
merly performed by macro processing are now accomplished in a cleaner and
safer way using 00 concepts such as overloading and encapsulation. If (for exam-
ple) some customers want version A of a method, which exposes an extra parameter,
and others don't want this parameter to be exposed, it might make more sense
to use overloading to provide both versions rather than using a preprocessor to
generate the desired signature.
Another reason for the unpopularity of the preprocessor is the way in which
extensive use of macro processing creates unnecessary complexity.
However, I happen to disagree with the many writers on this topic (such as
the very droll Bill Blunden, who has written Software Exorcism: A Handbook for
Debugging and Optimizing Legacy Code [Apress, 2003), a guide to maintaining
code and pronouncing curses upon the original authors) who feel that using
extended definitional facilities is always and everywhere the sign of a flawed char-
acter. That's because the whole point of this book is that at times it makes sense to
develop a language for a problem solution. In fact, important solutions have been

211
Chapter 8

developed using the C and c++ preprocessor, including Bjame Stroustrup's first
c++ compiler, which was written using C preprocessor statements.
Although the era of macro processing in programming languages may be
over, the technique is still important in areas including text processing, and
may represent for you a problem solution when you have to process text with
substitution.

Assembly in the quickBasicEngine


This chapter will use a simple nFactorial program in Figure 8-1 to demonstrate
assembly and interpretation concepts. This program calculates the value of any
integer times all of its predecessor integers down to 2.

, ••••• CALCULATION OF N FACTORIAL •••••


DIM N
DIM F
PRINT "ENTER N"
INPUT N
IF N<>INT (N) THEN
PRINT "N VALUE " , N , " IS NOT AN INTEGER"
END
END IF
IF N<-O THEN
PRINT "N VALUE n , N , .. IS NOT A POSITIVE NUMBER"
END
END IF
F - 1
DIM N2
FOR N2 = N TO 2 STEP -1
F s F . N2
NEXT N2
PRINT "THE FACTORIAL OF .. , N , .. IS n , !'

Figure 8-1. The nFactorial program

A simple onboard assembler named assembler_ is included in


quickBasicEngine.vb to translate the code generated by the compiler to code
acceptable code for the Nutty Professor interpreter. The assembler is needed
for two reasons:

1. The quickBasicEngine generates forward-jumping Go To instructions


with symbol labels. The NP interpreter actually needs the index of the
qbPolish object in the collection in order to perform the jump.

2. The compiler "decorates" the output code with remarks to show what
instructions are generated from which lines of source code, as shown
in Figure 8-2: this decoration needs to be removed when a user option
(available on an options form in the GUI as I will describe) is set.

212
Developing Assemblers and Interpreters

1 opRam 0: --*** , ***** CALCULAT70N OF N FACTORIAL *****


2 opRam 0: -**** DIM N
3 opRen 0: ***** DIM F
4 opRen 0: *.*** PRIm "ENTER N"
5 opPushLi teral "ENTER N": Push string constant
6 opPushLiteral String:vtString(ChrW(13) , Chrw(10)): Terminate print line
7 opConcat : op<:onoat(s,s): Replaces stack(top) and stack(top-1) with "tack(top-
1) &stack (top)
8 opPrint : opPrint(x): Prints (and rEmOves) value at top of the stack
9 opRen 0: ****'* INPIJ'l' N
10 opInp.1t : Read fran standard input to stack(top)
11 opPop 1: Pop the stack to N
12 opRem 0: u*,** IF N<>INr (H) THEN
13 opNop 0: Push lValue N contents of memory looation
14 opPushLitera1 1: Push indirect address
15 opPushlnd1rect : Push contents of /!eIIory location
16 opNop 0: Push lValue N contents of memory location
17 opPushLiteral 1: Push indirect address
18 opPushInd1reat : Push content.. of mamory location
19 OpInt : Round to integer function
200 PushNE: lace stack to b opPushNE(stack(top-1), stack(top»)
21 0 Ju: Z L8Ll: J'Um to False code
22 opRam 0: ..**** PRINT "N VALUE " , N , " IS NOT AN IN7EGER"
23 opPu"hLiteral "1'1 VALUE ": Push string constant
24 opNop 0: Push IValuo N contents of memory looation
25 opPushLiteral 1: Push indirect addre"s
26 opPushlndirect : Push contents of memory location
27 opConcat : Replace stack (top) by opConcat (stack(top-I), stack(top»)
28 opPushLiteral " IS NOT AN INTEGER": Push string constant
29 opConcat : Repl co stack (to ) by: opConcat (stack(t -1) stack(to

Figure 8-2. Assembler code list, prior to assembly, o/part o/the nFactorial
program

For now, ignore the details of the individual instructions. As you can see
from the "decoration" consisting of opRem instructions and opNop instructions
(neither of which have any effect on execution and both of which can be
removed by the assembler), the compiler has generated many instructions for
each line of source code. I'll discuss what these instructions do in the next sec-
tion of this chapter.

213
Chapter 8

The listing in Figure 8-2 for the nFactorial QuickBasic program starts with
four remarks, the first of which repeats the header comment, the second and
third of which include declarations, and the fourth of which heads the generated
assembler code to print the prompt for the value of N. The assembler should
remove these lines.
Take a look at line 21 in Figure 8-2. opJumpZ examines the top of the "stack"
maintained by the interpreter (a LIFO stack similar to that seen in Chapter 3) for
zero, and goes to LBLl when the top of the stack is zero.

NOTE You may wonder why we use Go To: is not Go To a thing ofdarkness?
The answer is that although Go To is not absolutely necessary in machine
language, I use it here because so many languages at the machine level do so.
At the end of Shakespeare's Tempest, Prospero says, "This thing ofdark-
ness I acknowledge mine."

It is the assembler's job to track down all pseudo opcodes of the type
opLabel, record their position in a keyed collection (where the key is the label
and the data is the position), and replace each occurrence of each label by its
position. This job is complicated, of course, by the fact that when you remove
labels (and sometimes the comment decoration), the pOSition of the label
changes and has to be adjusted; this is the "tedium" of the assembler's clerical
task I mentioned in the previous section.
The compiler's assembler code is found in the quickBasicEngine Private
method assemble_. The assembler makes two complete passes over the input
source, which is in the collection of qbPolish tokens named col Polish.

NOTE I will discuss the qbPolish object in more detail in the next section.
Here, understand that it is an object with state that represents one instruction
to the Nutty Professor interpreter and which is called a Polish token in honor
of the Polish logicians mentioned in Chapter 3.

214
Developing Assemblers and Interpreters

The first pass is the most difficult because it must remove labels, opRems and
opNops from the code while tracking the effect of removal on the value of labels;
opRems are instructions that do nothing but contain compiler comments, and opNops
are instructions that do nothing, period.
The first pass therefore has the form of a Do while loop that proceeds through
the code, and inside this loop you find a For loop. The For loop's job is to advance
from the current point to the next "real" instruction that is not a label. If this For
loop finds labels, it must record their position and value in a temporary labels
collection.
Each time the For loop finds a real instruction, a separate For loop, also inside
the Do loop, backs up to the previous instruction; deletes the labels, remarks, and
no operations; and deletes each one. The deletion process consists of executing
the dispose method of the qbPolish object and removing it from the col Polish
collection.
Then all new labels found inside the first inner For loop are added to the
real label table. It is very possible that during the pass through the first loop,
you did not know the position of a label. However, because you moved the
labels to a temporary area during the first For loop, all the labels found now
have a known address.
This is the surprising tedium of assembly I mentioned. Assembly, and code
generation inside a compiler, rather resembles DLL hell both in the tedium and
because you are close to the goal when the tedium occurs.
The quickBasicEngine also allows you to assemble without removing any
labels, remarks, or no operations. The qbGUI application gives you access to this
option as shown in Figure 8-3. Assembly is, of course, simpler when the option
to keep labels is retained.

215
ChapteT8

'ITTIil :tl: ~. ~

r-Optimizatlon

r Constant Foldinq
P' Remove comments & labels during assembly
r: Remove degenemte operations
ri Inspect compiler objects
r-parse Display r-Tmcing

I
r No perse dISplay [j Source tmee

,
(;' Outline parse display r; Object trace
.

r XM L formatted parse displBY n Parsetmce


r- MiscellBneous
n Event Log It Cancel
I
ri Inspect Quick Basic Engine
r Stop Button Close ]
Figure 8-3. Use Tools ~ Options on the main menu ofqbGUI to see this form.

Pass two is straightforward by comparison. It does not remove code and


instead only replaces the Operand property of each qbPolish instruction when it
recognizes that the qbPolish instruction is a jump style operation. It looks up the
Operand in the label table by key and replaces the Operand with the value of the key.
Figure 8-4 shows the result of assembling the code in Figure 8-1 (with
removal of comments and labels enabled).

216
Developing Assemblers and Interpreters

1 opPusbLt tera.l "ENTER NIII 40 opPop 2


2 opPtubLlteral Strtng:vtStrlog(CbrW(13) , ChrW(1D» 41 opPushL1. teral 3:
3 opCoa.ca.t .2 opPoJhLl toral 1
, opPrlot .3 opPashlndl rect
, oplnpat U opPopIo41 rect
6 opPop 1 " opPa.hLI teral 2
7 opPtubLI teral 1 46 opRotate 1
8 opPosbIDch rect. 47 opPa.bLI teral 0
9 opPu.bLltera1 1 4& opPaobLI toral 1
10 opPo.bI.a.dJ rect 49 opS1lbtract.
11 opIne o opRotate 1
12 opPuJOhN8 1 opForTest 60
13 opJaoopZ 24 '2 opPaobLlteral 2
14 opPnSbLt tera.l IIIN VALUE .. .53 opPasbIDd1reot
" opPa.hL t teral 1 " opPuobLI teral 3
16 opPttsblnd1 rect !I opPa.bIo4lrect
17 op(:oDcat " opHalttply
10 opPa.bLI teral • IS NOT All INTE(reR" !17 opPop 2
19 opConoat 8 opForIDoremea.t.
20 opPa.bLlteral Strlog:vtStrlng(CbrW(13) , CbrW(lO» !l9 opJlDop '1
21 OpCoOClllt 60 opPopOft
22 OpPrlDt 61 opPopOU
23 opEncl D 62 opPopOft
24 opPtubLl teral I 63 opPaobLl toral "THE FACTORIAl. OF •
2' opPushIncl1. reot 64 OpPu.bLI teral 1
26 opPaohLI toral D 6!1 opPaobIacll reat
27 opPa.bL!l 66 opConoat
2& opJ_Z 39 67 opPaobLl toral " IS •
29 opPu.hLl teral "'N VALUE .. 68 opConcat
30 opPa.bLI toral 1 69 opPaobLi teral 2
31 opPaoblMI rect o opPushI.ndl recto
32 opConcat I opConcat
33 opPaObLI toral " IS NOT A POSI'I'rvE NOM8ER" 2 opPa.bLlteral StrIDQ:vtStrloQ(CbrW(l3) , Cbrw(IO»
34 opConcat 73 opCoocat
3:1 opP'llnLttoral Strlog:vtStrtng(Cbrw(13) 'CbrW(IO» 4 opPrtnt
36 OpCODCAt. !I opBncl
37 opPrlnt
38 opBncl 0
39 opPasbLt teral. 1

Figure 8-4. Assembly results without comments or labels

Take a look at line 13 (the JumpZ, jump on zero, instruction). Ignoring until later
its meaning in context, just note that it now jumps not to LBL1, but to instruction 25,
which is the first real instruction after LBL1.
The compiler's graphical user interface, qbGUI, available at the Apress Web site
(http : //www .apress.com) as an executable, allows you to run to the end of assembly
using menu commands to see the result. Bring up qbGUI and, using the File menu,
navigate to egnsf\apress\quickBasic and load the file nFactorial.BAS to see the
window shown in Figure 8-5. (Because you've already run the compiler, it should
expand to the More info screen; but if it does not, click the More button.)

217
Chapter 8

~ t dWrlrd NJcjf~r V(,l'"dOn of Qllick Ra~k-~ a Mic:ro~ft Product ,. _ CJ ~

Ae Tools . .

I .*.** CALCULATION OF N FACTORIAL ***** ... Cu.tomer Englneermg


DIM N
DrM F
PRINT "ENTER N"
INPUT N
IF N<>INT (1'1) 'l'HEN
P.RrNT "lit VALUE U " N , II IS NOT AN INTEGER"
END
END IF
IF IK-O THEN
PRl"NT liN VALUE 11 , N , ., IS N01' A POSITrvE

I
Evaluate! Run r:: ~'"';.t less

XMl Inspect Test Ir:i Tast ......nt 100

Scanned Tok.... Zoom Porse Outline Zoom RPM Zoom Stadt

r Repl.y

Figure 8-5. nFactorial example

Using the Tools menu, click Compile to get a compiled version of the RPN
object code, and if you use the Zoom button on the RPN label after compiling,
you will see the assembly code in Figure 8-1. Again, using the Tools menu, click
Assemble to get the assembly with labels, remarks, and no operations removed.
This section has explained how assembly basically creates an efficient machine
language representation of code. The next step is to see how it is possible to
"build your own computer" without soldering computer parts together, starting
fires, or chipping your nails on top of a real computer by crafting an interpreter.

Interpreters
In this section, I'll cover interpreters in general and historical context, and then
examine the Nutty Professor interpreter embedded in the quickBasicEngine.

218
Developing Assemblers and Interpreters

Interpreters, in General
Interpreters may have been invented in Alan Thring's 1936 paper "On Computable
Numbers with Applications to the Eintscheidungsproblem." The formidable inser-
tion of a monster German word in the title ofThring's paper is reflective of the fact
that prior to WWII, German was like English is today: a lingua franca of science
owing to the prestige of German science and mathematics. The "Eintscheidungs-
problem" was the problem of the decidability of mathematics, and whether or not
all mathematical statements could be proved and I or whether mathematics was
even consistent.
Turing's concern in the paper was to formalize the notion of following a rule,
that is, executing an algorithm. He developed the ultimate paper computer, the
ultimate Nutty Professor computer, and the ultimate Reduced Instruction Set
Computing (RISC) machine because of its simplicity. This computer is called
a Turing machine, and you read about it in Chapter 4. Turing machines were an
important discovery in the history of software, for without them, we would not
be nearly as free to represent logic as data.
Back in the real world, interpreters came into commercial use when early
models of new computers were being designed and programmed before the
hardware was fully available.
When IBM introduced the IBM System/360 in 1964, a large number of busi-
ness customers had invested quite a lot of effort in coding programs for older
IBM 7094 and IBM 1401 architectures. A form of interpretation came to their res-
cue courtesy of hardware-assisted interpretation... also known as emulation.
Firmware in the 360 caused the operation codes and operands of older
machines to be unpacked and simulated in native 360 instructions, often at
a faster rate than the original computer executed its native instruction set.
This allowed 1401 developers to migrate to the 360 without losing their exist-
ing investment in software.
There is, in other words, a range of interpreters, commencing with slow and
simple interpreters, more complex interpreters such as the Nutty Professor inter-
preter for the quickBasicEngine, and virtual machines.
Slow and simple interpreters have a well-deserved reputation as being ineffi-
cient, and, in fact, the Nutty Professor interpreter belongs in this class.

The Onboard Nutty Professor Interpreter


The Nutty Professor interpreter simulates a stack-oriented machine with an
architecture designed for QuickBasic.
I call it the Nutty Professor interpreter because one way of academically
exploring computer architecture happens to be designing machines and writing
software interpreters for those machines, and doing this is an excellent way to

219
Chapter 8

learn low-level machine language programming (not, of course, in the sense of


a job skill, but conceptually).
TIrls interpreter is based on objects of type qbPolish. Each instance of a qbPolish
object represents a single instruction to the interpreter, and each instance con-
tains an opcode, the associated operand (or Nothing), an optional comment, and
the token start index and length of the source code that generated the opcode.
The opcodes listed in Table 8-1 are available. The operators supported are defined
by the stateless class qbOp. Table 8-1 identifies each operator supported.

Table 8-1. qbPolish Op Codes


Op Op Description
opAdd Replaces stack(top) and stack(top-l) with stack(top-l)+stack(top).
opAnd Replaces stack(top) and stack(top-l) with stack(top-l) And
stack(top). Note that the And operator is not short circuited: the
operator expects two values at the top of the stack.
opAndAlso Replaces stack(top) and stack(top-l) with stack(top-l) AndAlso
stack(top). Note that the AndAlso operator is short circuited. If the
value at the top of the stack is False, the second value isn't created
or stacked. Instead, the code that develops this value is skipped.
opAsc Replaces stack( top) by its ASCII value.
opCeil Replaces stack(top) with first integer n > stack(top).

opChr Replaces stack(top) with its ASCII character value.


opCircle Draws a circle on the graphic screen; stack(top- ) is the x coordinate,
stack(top-l) is the ycoordinate, and stack(top) is the radius.
opCls Clears the simulated QuickBasic screen.
opCoGo Computed GoTo/GoSub.
opConcat Concatenates stack(top) and stack(top-l).
opCos Replaces stack(top) with its cosine; expects numeric.
opDivide Replaces stack(top) and stack(top-l) with stack(top-l)1
stack(top ).
opDuplicate Duplicates the value at the top of the stack.
opEnd Stops processing immediately; expects nothing on the stack.
opEval Evaluates stack(top) as a QuickBasic expression using lightweight
evaluation. A new quickBasicEngine with default options is used to
evaluate stack(top).

220
Developing Assemblers and Interpreters

Table 8-1. qbPolish Op Codes (continued)

Op Op Description
opEvaluate Evaluates stack(top) as a QuickBasic expression using heavy-
weight evaluation. A new quickBasicEngine with the same options
as the current engine is used to evaluate stack(top).
opFloor Replaces stack(top) with first integer n < stack(top).
opForlncrement Increments or decrements the For control value in a For loop.
opForTest Jumps to the for exit when contents of control variable location
are greater than final value (when step value is positive); jumps to
the for exit when contents of control variable location are less.
opIif Replaces stack(top-2) .. stack(top) with stack(top-l) when
stack(top-2) is True, with stack(top) otherwise.
oplnput Reads a number or a string to stack(top) by generating the
compiler input event.
oplnt Replaces stack(top) with integer part.
oplnvalid Invalid marker op intended in certain contexts to flag the opcodes
with a deliberate error; not used in the current compiler.
opIsNumeric Replaces stack(top) with True when stack(top) is a number,
False otherwise.
opJump Jumps to location in operand; expects integer.
opJumplndirect Jumps to location identified at the top of the stack.
opJumpNZ Jumps to location when stack(top) <> 0 (pops the stack top).
opJumpZ Jumps to location when stack(top) = 0 (pops the stack top).
opLabel Identifies position of a code label or statement number; inserted
by the compiler and removed by the assembler.
opLCase Replaces the string at stack(top) with its lowercase translation.
oplen Replace stack(top) by its length as a string.
oplike Compares two strings at the stack top for a pattern match, replac-
ing them by True or False.
oplog Replaces stack(top) by its natural logarithm.
opMax Replaces stack(top) and stack(top-l) with the maximum
value found.
opMid Replaces stack(top-2) .. stack(top) with the substring of
stack(top-2) starting at stack(top-l) using the length at
stack(top ).

221
Chapter 8

Table 8-1. qbPolish Op Codes (continued)


Op Op Description
opMin Replaces stack(top) and stack(top-i) with the minimum
value found.
opMod Replaces stack(top) and stack(top-i) with the integer division
remainder from stack(top-i) \ stack(top).
opMultiply Replaces stack(top) and stack(top-i) with
stack(top-i)*stack(top)
opNegate Reverses the sign of the value at the top of stack.
opNop Does nothing.
opNot Replaces stack(top) with Not stack(top).
opOr Replaces stack(top) and stack(top-i) with stack(top-i)
Or stack(top) (Or is not short circuited).
opPop Sends stack(top) to a memory location.
opPopIndirect Sends stack(top) to a memory location at stack(top-i);
removes stack(top), leaves stack(top-i) alone.
opPopOff Removes stack( top) without sending it to a memory location.
opPower Replaces stack(top) and stack(top-i) with
stack(top-l)Astack(top).
opPrint Prints (and removes) value at top of the stack, strictly by
raising the PrintEvent (does not display).
opPush Pushes the contents of a memory location specified in its
operand.
opPushArrayElement Replaces elements at the top of the stack by an array element:
expects n+l entries at the top of the stack. The entry at top-n is
the high-order array subscript down to the low-order subscript
at top-i, and the top of the stack should be the qbVariableType
to which the new element needs to be converted before it is
pushed. For example, a reference to the first row, second
column of the intArray integer array (intArraY(l, 2») that
needs to be converted to a Long will compile to the stack
frame 1, 2, Long.
opPushEQ Replaces stack(top) and stack(top-i) by -1 when
stack(top-i)=stack(top), 0 otherwise.
opPushGE Replaces stack(top) and stack(top-i) by -1 when
stack(top-i»=stack(top), 0 otherwise.

222
Developing Assemblers and Interpreters

Table 8-1. qbPolish Op Codes (continued)


Op Op Description
opPushGT Replaces stack(top) and stack(top-l) by -1 when
stack(top-l»stack(top), 0 otherwise.
opPushlndirect Pushes the contents of a memory location indexed at stack (top),
replacing the index.
opPushLE Replaces stack(top) and stack(top-l) by -1 when
stack(top-l)<=stack(top), 0 otherwise.

opPushLitera1 Pushes a literal string or number.

opPushL T Replaces stack(top) and stack(top-l) by -1 when


stack(top-l)<stack(top), 0 otherwise.

opPushNE Replaces stack(top) and stack(top-l) by -1 when


stack(top-l)<>stack(top), 0 otherwise.

opPushReturn Pushes the subroutine's return address.

opRand Seeds the random number generator to unpredictable values.

opRead Reads from the data statements to stack(top).


opRem Equivalent to a NOP.

opRep1ace Replaces all occurrences of the string at stack(top-l) by the


string at stack(top) in the string at stack(top-2). Replaces all
entries by the translated string.

opRnd Pushes an unseeded random number on the stack.


opRndSeed Pushes a seeded random number on the stack (seed is
stack(top), and is replaced).
opRotate Exchanges stack(top) with stack(top-n); when n=O this is a NOP.
opRound Rounds stack(top-l) to stack(top) digits.
opSgn Replaces stack(top) with its signum (0 for 0, 1 for positive,
-1 for negative).

opSin Replaces stack(top) with its sine.

opSqr Replaces stack(top) with its square root.

opString Replaces stack(top) and stack(top-l) with ncopies of the


character at stack top, where n is at stack(top-l).

opSubtract Replaces stack(top) and stack(top-l) with


stack(top-l)-stack(top).

opTrace Changes trace settings.

223
ChapterB

Table 8-1. qbPolish Op Codes (continued)


Op Op Description
opTracePop Restores trace settings from a UFO stack.
opTracePush Saves trace settings in a UFO stack.
opTrim Replaces stack(top) with trimmed string Oeading and trailing
blanks removed).
opUCase Replaces the string at stack(top) with its uppercase translation.

The interpreter in the method interpreter_is a very large case statement that
moves through the Polish collection and jumps to individual support routines.
A stack collection keeps the working elements in the form of qbVariable objects.
The interpreter is rather slow because it imposes a "strongly typed" frame
on top of the stack. For each operation as defined with its description in qbOp,
the stack frame expected is specified in a string form that lists the expected
types of the operation: for example, the expected types of the Add operation
are "number, number". A pop routine obtains the expected stack as an array, and
returns it to the caller.
About the only advantage of its slow rate of execution is that you can sit back
and watch how it works in the GUI, or go to the fridge for a beer, or read Motor
Trend. Also, as in the case of the flyover compiler in Chapter 3, you can replay the
interpretation, as well as the scanning and compilation, in the GUI.
To see how the interpreter executes the nFactorial program, call up the qbGUI
application, and load the nFactorial.BAS program. Being sure that the More screen
is shown, go to Tools ~ Options to set up the options shown in Figure 8-6. In this
form, enable the Object trace option in the Tracing box.

224
Developing Assemblers and Interpreters

.~~ :F.t'f.j: '~:J

r Optlmization

r Constant Foldinq
P' Remove comments & IBbeis during assembly
r Remove degenerate operations

r Inspect compiler objects

- Parse Display - Tmcing

r No pBrse displBY r Source tmce

r. Outline parse display IP' Object trace I


r XML fenTIatted parse display r Persetrace
-
- Miscellaneous

r Event Log
Cancel
I
r Inspect Quick Basic Engine

r Step Butten Close


I
Figure 8-6. Options for testing the nFactorial.BAS application

Click the Run button to watch the nFactorial program execute. It will prompt
you for input when the interpreter sends the GUI an input event: try 5. You should
see the screen in Figure 8-7.

225
Chapter 8

........... ............. ..................... "' ........................ .


...... ............. ....... .................. ........ "............................... .
..,.

........... IP: 15 ..................................... .


. . Opcode:: opEnd

...... op£nd: St.op.s prcce!ls.1ng 1suoediately .. ...


• ..... " ....... ++ ............................................. ...

.. .. . .... Storage ............................. ...


... 1 ~ Var1ant,Byte:vt.8yte: (5,
4 .. 2 F' Villlnant,8yte :vtByte: (120)

.... 3 N2 Varl.ant,Double:vt;Oouble(l) ..

3/H/2004 1: 00: 36 PM Running code e~ yP 72 .......... "' .................. ..................... .


'

3/H12004 1:00:37 PN Runnin9 code u IP 73


3/14/2004 1: 00: 37 PH Running code a~ IP 74
3/14/2004 1 :00: 37 PM Runnin code at 1P 75
XML Inspect Tesl Ir T""I""onlloa
Zoom Porse Ou1l... Zoom SIadr. Zoom Slorage Zoom
1 N V.:tl.nt~ Byte:
't ok'e nTypeOpe r a'Cor cn 8curcePrOQxam: .so.a.z:ce eode. tr 2 CpPu8hLlte:rAl. Strll :2' r Va r un'C,B~ e:
1;.okenTypeOpe:a'Cor 00: sourceProq-ramBotly: aO:Jrce cod S opConcat
eokenType~r.'tor en cpenCode: 500l: ce code fre 1 • 0pPElnt.
'tonnTypetClen'tlf1er « aouree:Proqra.m.8ody: souree c S opln;:ut.
tOlrenType:lde.ct.1fle:: ( cpe::cocse:: source code: ~rOll 6 opPop 1
u>kenTypeIc1t:nt1t:u:: c IOI.l.z:ce:Proq%U\Sody: source: .,
opPu,h.L1teral 1
oppuahln4inct
...iJ
opeI!lCOClt:. : .ouree CO<Ie. fr 8
~:::~::~~:~!~e:n~-=.I 8ourc.eProQraa30d.y: .cur.:J '!I opi'U.hLu:eral, 1

Figure 8-7. qbGUI screen after execution of the nFactorial program with Object
trace enabled

Check the result, which should be 120. Then, click the Zoom button on the
Storage list box to see the strongly typed storage of the quickBasicEngine, which
is contained in the col Variables collection of the quickBasicEngine state (see
Figure 8-8).

1 N Variant,Byte:vtByte(S)
2 F Variant,Byte:vtByte(120)
3 N2 Variant,Double:vtDouble(l)

Close

Figure 8-8. Storage after execution of the nFactorial program

226
Developing Assemblers and Interpreters

Because the Dim statements for N, F, and N2 (the input number, the factorial,
and the "work" copy ofN used in the For loop) are untyped, N, F, and N2 are each
variants. Note that the displays of each storage item are the output of the toString
method for qbVariable as described in Chapter 6.
Because you selected the Object trace as shown in Figure 8-6, the upper-right
hand of the screen will contain trace blocks like those in Figure 8-9. Figure 8-9
shows the trace block for the concatenate instruction that finishes the display of
the result.

***** IP: 73 ************************************


* Opcode: opConcat *
* *
* ********************************************* *
* * opConcat(s,s): Replaces stack(top) and * *
* * stack(top-l) with stack(top-l)&stack{top) * *
* ********************************************* *
* *
* ***** storage ********************* *
* * 1 N Variant,Byte:vtByte(S) * *
* * 2 F Variant,Byte:vtByte(120) * *
* * 3 N2 Variant,Double:vtoouble(l) * *
* *********************************** *
* *
* ***** Stack ************************ *
* * string:vtString(ChrW(13) & Ch . .. * *
* * string: vtString ("THE FACTORIA... * *
* ************************************ *
*************************************************

Figure 8-9. Trace information

Figure 8-9 shows the interpreter's situation just prior to the execution of the
opConcat opcode to join two strings; the strings being joined are "THE FACTOR-
IAL OF 5 IS 120" and the newline that terminates the Print operation. The trace
block shows the opcode, and it documents the function of the opcode.
As in the case of the flyover compiler of Chapter 3, you can click the Replay
check box at the bottom of the screen to save and store each scan, parse, and
execution step, and replay the complete process in detail.
I'd like to show you one final feature of the compiler at this point: its test
method, which tests the complete compiler after options or source code is
changed. It exercises most of the functions of the compiler.
Recall from the discussion of core methodology in Chapter 4 that I prefer to
write complex objects with their own test method so that they can be rapidly
tested after a change or in installing the software.
Click the Test button in the customer engineering zone. You may get slightly
different results than the ones you see in Figure 8-10 ifI add more tests between

227
Chapter 8

this writing and installation of the compiler software on the Apress Web site, but
you should see the number of test cases actually run, the total time for the tests,
and the important message "The test succeeded".

9 test cases took 4 second(s)

The test succe ded

Testing the expressJ.on "1+1 "

expected result: «2 " : actual result : " 2 "

Testing' th.e expressi.on 1' '' 1I 00qa '''' , .. 11 Chukka l • U ..

expected result : "OogaChuli:ka" : actual result : "OoqaChukka "

Tasting the exprasdon "5478/3+21-«4+1)*8) + . 1 "

expected result; "1807 . 10000000149 "; actual result; "1807 . 10000000149"

Figure 8-10. Test report

The report will start with three expressions, and it will contain several pro-
grams including Hello World and nFactorial. Note that the Print statements don't
affect the screen of the qbGUI because the test method creates a quickBasicEngine
and intercepts its Print events to test them against expected results.
Thus like qbScanner, qbVariableTypeTester, and qbVariableTest, qbGUI can test
the underlying object if you alter the source code.

Summary
In this chapter, you've seen how to "assemble" the code by translating symbolic
labels inserted by the compiler to numeric indexes and optionally removing labels,
source code information, and comments. You've then run the Nutty Professor
interpreter to simulate the quickBasicEngine on your system, and you've seen
(depending on your hardware's capabilities) why the interpretation process is slow.
You also learned how to continually test the quickBasicEngine.
Therefore, let's see if you can generate more efficient code by using the
Common Language Runtime, which I'll discuss in the next chapter.

228
CHAPTER 9

Code Generation
to the Common
language Runtime
We all live in a virtual machine, a virtual machine, a virtual machine.
-Sung at IBM Share conferences in the era of
Conversational Monitor SystemlVirtual Machine

Whats worth doing well is worth doing slowly.


-Gypsy Rose Lee

I FEAR THIS CHAPTER may be a bit of an anticlimax, and this is because in this book
I have deliberately focused primarily on the front end of a compiler, including
language design, language specification using Backus-Naur Form, lexical analy-
sis, and parsing.
In order to avoid the distracting issues of the Common Language Runtime
and nasm, I even defined the onboard NUtty Professor interpreter-a stack-based
virtual machine tuned towards the needs of QuickBasic-with machine language
that directly supports QuickBasic needs.
It was also important to develop data types as objects because I wanted to
avoid a common pitfall of the tyro language designer-defining a language that
encourages the use of untyped variables including the .NET object or the COM
variant.
There are many sources of valuable content for using the Common Language
Runtime and its associated tools. At the entry level, Vzsual Basic .NET and the .NET
Platform: An Advanced Guide, by Andrew Thoelsen (Apress, 2001) is still solid on
basic interaction through the Reflection types with the CLR.
At a more advanced level, Serge Lidin, the actual developer of Microsoft's
nasm (which assembles CLR code into machine language), has written Inside
Microsoft .NET IL Assembler (Microsoft Press, 2002); this book describes not only
how to get started writing assembler code, but also how to write real code, since
the author gives comprehensive reference information on the opcodes and the
important issue of the loader, which combines multiple assemblies and links
them into a run unit.

229
Chapter 9

In this chapter, I will simply describe how the quickBasicEngine is able in a pro-
totype sense to generate CLR code ... which runs much faster than Nutty Professor
code, though without the benefit of allowing you to see its inner workings.
What I've implemented transforms a subset of possible QuickBasic expres-
sions, the part that does math with constants, into CLR instructions for fast
execution.
For the rest, Ishall have to use the sleazy academic practice ofleaving the
fun part of coding full code generation to you, the reader. I do expect that
because you have fully documented source code through the Apress Web site
(http://www.apress.com).this will be a relatively easy task, and I've dedicated the
final part of this chapter to showing how this can be done.

CLR Generation in the quickBasicEngine


Unlike the full-featured compiler, I've kept the CLR generator very simple.
Try it out. Run the qbGUI executable (make sure the More display is in effect),
and type in a math expression using addition, subtraction, multiplication, and
division, exclusively.
Compile the code (click the Compile item of the Tools menu) and then zoom
the RPN box to see the window shown in Figure 9-1.

.. ClHltOnHH Engineenng.zan.

1 opPushLiteral 23: Push numeric constant


2 opPushLiteral 8: Push numeric constant
3 opMultiply : Replace atack(top) by o~ply(atack
(top-l), atack(top»

I
4 opPushLiteral 100: Push numflric constant
Evaluate 5 opSubtract : Replace atack(top) by opSubtract(stack
(top-l), stack(top»
3/14/2004 ~ 3'
6 opEnd : Generated at end of code
3/14/2004 5 3'
3/14/2004 5 3:
J/1</2004 5 J"

Sc-!lnnedTot..

Figure 9-1. Expression compiles to the indicated Nutty Professor code

230
Code Generation to the Common Language Runtime

As you see, the expression compiles to the following stack operations:


push 23, push 8, multiply the two numbers at the top of the stack, push 100,
and subtract.
Close the Zoom display and click the Run button to run the code. You need
to look for the result in the Stack box because no code is generated to print the
result, and unlike the Evaluate button (which will display the result on a green-
on-black output screen), the Run button leaves the expression value on top of
the stack, as shown in Figure 9-2.

ISlack

Figure 9-2. Result of running the expression code

Even for this small amount of code, the interpreter took a noticeable amount
of time. Let's up the ante and use MSIL!
Go to the Tools menu and select the menu item named Run the MSIL Code,
and in a flash you should see the results shown in Figure 9-3.

After transiteratlOn to Intermediate Language, this code produces 84

OK

Figure 9-3. Result of running Microsoft Intermediate Language in the eLR

OK, I hope this is reasonably cool. Here is how it works.


The MSILrun method of the quickBasicEngine takes existing Nutty Professor
code, whether assembled or not, and attempts to create a CLR program. I say
"attempts" because at this writing this method only translates a small subset of
operations for demonstration purposes.

231
Chapter 9

NOTE As a side benefit, I will show you how the compiler implements full
thread capability simply so that multiple instances of the compiler can run
simultaneously, the compiler can be running while the user interacts with
a form, and multiple procedures in one instance of the compiler can run
simultaneously. The compiler is fully threadable.

Here is the code of the MSI Lrun method:

Public Function msi lRun() As Object


Return dispatch_("msilRun", True, Nothing, "Returning Nothing")
End Function

All the actual functionality of the msilRun method is contained in the Private
msilRun_ method because in general you should try to keep Public routines sim-
ple shells around Private code. But note that you call msilRun by way of a private
dispatcher routine.
The purpose of this dispatcher routine is to make the quickBasicEngine mul-
tithreaded by placing the needed threading logic in one place. You must lock the
state on entry to the dispatcher and release the lock on exit.
This is because at any time in the dispatcher itself or in a Private procedure
called by the dispatcher, you may need to reference the variables concentrated
in the user data type usrState (of type TYPstate) in the General Declarations sec-
tion of the quickBasicEngine.
If two procedures running in separate threads reference and change these
variables simultaneously, the execution of the program will be unpredictable
and buggy.
This strategy is rather primitive and broad. By the time any thread is execut-
ing code inside the dispatcher, other threads trying to run procedures will simply
queue up and pound, as it were, on the door of the SyncLock ... waiting their
tum, like the kids in the house with one bathroom. A more fine-grained approach
would be to identify specific zones of the dispatcher that specifically interact
with the state and lock only these zones.
Several different procedures come to the dispatcher and identify the specific
functionality they need. The problem that you create in this design is that every
time a new Public property or method is created, the dispatcher center must be
upgraded with new execution cases. Their advantage is that they allow you to
concentrate logic, here locking, in one place and to wrap it around the trans-
action center.
Here is the dispatcher code, somewhat shortened:

232
Code Generation to the Common Language Runtime

Private Overloads Function dispatch_(ByVal strProcedure As String, _


ByVal booFlag As Boolean, _
ByVal objDefault As Object, _
ByVal strDefaultHelp As String, _
ByVal ParamArray objParameters()
As Object) _
As Object
Dim strDummy As String
Select Case UBound(objParameters)
Case -1
Return (dispatch_(strProcedure,
strDummy, _
objDefault, _
strDefaultHelp»
Case 0
Return (dispatch_(strProcedure, _
strDummy, _
objDefault, _
strDefaultHelp, _
objParameters(O))
Case 1
Return (dispatch_(strProcedure, _
strDummy, _
objDefault, _
strDefaultHelp, _
objParameters(o), _
objParameters(l»)

Case Else
errorHandler_("Internal programming error: " &
"too many parameters", _
"dispatch_", _
"Making object unusable and returning Nothing", _
Nothing)
OBJstate.usrState.booUsable = False
Return (Nothing)
End Select
End Function
, --- Returns the reference value
Private Overloads Function dispatch_(ByVal strProcedure As String, _
ByRef strOut string As String, _
ByVal objDefault As Object, _

233
Chapter 9

ByVal strDefaultHelp As String, _


ByVal ParamArray objParameters()
As Object) _
As Object
Sync Lock OBJthreadStatus
If Not checkAvailability_(strProcedure, strDefaultHelp) Then
Return (objDefault)
End If
OBJthreadStatus.startThread()
End SyncLock
Dim objReturn As Object = objDefault
Sync Lock OBJstate
If checkUsable_(strProcedure, strDefaultHelp) Then
With OBJstate.usrState
Select Case UCase(strProcedure)

Case "MSILRUN"
objReturn = msilRun_

Case Else
errorHandler_("Invalid dispatch method " &
_OBJutilities.enquote(strProcedure), _
"dispatch_", _
"Marking object unusable and " &
"returning default", _
Nothing)
.booUsable = False
End Select
End With
End If
End SyncLock
Sync Lock OBJthreadStatus
OBJthreadStatus.stopThread()
End SyncLock
Return (objReturn)
End Function

Note that the dispatcher not only locks and frees a lock on the usrState but
also on a minor player, OBJthreadStatus, which keeps track of running threads so
that you can monitor them in the qbGui form: to see this, run the qbGUI executable

234
Code Generation to the Common Language Runtime

and bring up the Options form used in the last chapter. Click the check box
labeled "Stop button", return to the main form, and load the nFactorial.bas
demonstration program used in Chapter 8. Run this to see a new form that will
allow you to stop a runaway compile or run (see Figure 9-4).

opPu~hL~teral Strinq:vtStr~nq
opConcat

ReplllV Rese Step

...' k
. ..
~ •
~. -~ •

.
• '

'

Runnrng : 1threads runnrng


- .:-

Figure 9-4. Stop button

As you can see in the dispatcher code, there is a case for the msilRun method
that sets an object to the value returned from msilRun_ that does all the work of
translating into CLR code using Reflection.
The code in msilRun_ requires the following imported namespaces identified
at the beginning of the quickBasicEngine:

Imports System. Reflection


Imports System. Reflection. Emit
Imports DotNetAssembly = System.Reflection.Assembly

System. Reflection provides the basic tools that allow you to discover the prop-
erties of your types, methods, and fields. System. Reflection. Emit provides you with
the ability to create code in the CLR. DotNetAssembly is what you need to create and
load a single dynamic assembly that will execute the compiled functions.
Here is the code of msilRun_ itself:

Private Function msilRun_() As Object


With OBJstate.usrState
If (.colPolish Is Nothing) Then
errorHandler_("Cannot run MSIL code: " &
"no Polish code is available", _
"msilRun- ", -
"Returning Nothing", _
Nothing)

235
Chapter 9

Return Nothing
End If
Dim objAsmName As AssemblyName
Dim objAsm As AssemblyBuilder
Dim objClass As TypeBuilder
Dim objILgenerator As ILGenerator
Dim objMethod As MethodBuilder
Dim objModule As ModuleBuilder
Try
objAsmName = New AssemblyName
objAsmName.Name = "msilRun"
objAsmName.Version = New Version("1.0.0.0")
objAsm = _
AppDomain.CurrentDomain.DefineDynamicAssembly _
(objAsmName, AssemblyBuilderAccess.Run)
objModule = objAsm.DefineDynamicModule(objAsmName.Name)
objClass = objModule.DefineType(objAsmName.Name, _
TypeAttributes.Public)
objMethod = objClass.DefineMethod(objAsmName.Name &"_", _
MethodAttributes.Public, _
Type.GetType("System.Double"), _
Nothing)
objILgenerator = objMethod.GetILGenerator
Catch objException As Exception
errorHandler_("Not able to initialize MSIL generation: " &
Err.Number &" " &Err.Description, _
"msilRun_", _
"Returning nothing", _
objException)
End Try
Dim intIndexl As Integer
Dim objArgument As Object
Dim objNextOpcode As OpCode
Dim objNextValue As Object
With .colPolish
For intIndexl = 1 To .Count
loopEventInterface_("Generating MSIL code", _
"collection item",
intIndexl,
.Count,
0, _
1111)

With CType(.Item(intIndexl), qbPolish.qbPolish)


If msilRun__qbOpcode2MSIL_(.Opcode, objNextOpcode) Then
objILgenerator.Emit(objNextOpcode)
236
Code Generation to the Common Language Runtime

Else
If .Opcode = ENUop.opPushLiteral Then
If UCase(.Operand.GetType.ToString) _

"QBVARIABLE.QBVARIABLE" Then
objNextValue = _
CType(.Operand, qbVariable.qbVariable).value
Else
objNextValue = .Operand
End If
Try
objILgenerator.Emit(OpCodes.Ldc_RS,
CDbl(objNextValue»
Catch
Exit For
End Try
End If
End If
End With
Next intIndexl
If intIndexl <= .Count Then
errorHandler_("Not able to convert Polish code to MSIL",
"msilRun_", _
"Returning Nothing", _
Nothing)
Return Nothing
End If
objILgenerator.Emit(OpCodes.Ret)
Dim objReturn As Object
Try
objClass.CreateType()
Dim objType As Type = objAsm.GetType(objClass.Name)
Dim objInstance As Object = Activator.CreateInstance(objType)
Dim objMethodInfo As MethodInfo = _
objType.GetMethod(objMethod.Name)
objReturn = objMethodlnfo.lnvoke(objlnstance, Nothing)
Catch objException As Exception
errorHandler_("Not able to run MSIL: " & _
Err.Number & " " & Err. Description, _
"msilRun_", _
"Returning Nothing", _
Objexception)
Return Nothing
End Try

237
Chapter 9

Return (objReturn)
End With
End With
End Function
Private Function msilRun__qbOpcode2MSIL__
(ByVal enuPolishOpcode As qbOp.qbOp.ENUop, _
ByRef enuMSILopcode As OpCode) As Boolean
Select Case enuPolishOpcode
Case ENUop.opAdd : enuMSILopcode = OpCodes.Add
Case ENUop.opAnd : enuMSILopcode = OpCodes.And
Case ENUop.opDivide : enuMSILopcode = OpCodes.Div
Case ENUop.opEnd : enuMSILopcode = OpCodes.Nop
Case ENUop.opMultiply : enuMSILopcode = OpCodes.Mul
Case ENUop.opNegate : enuMSILopcode = OpCodes.Neg
Case ENUop.opNot : enuMSILopcode = OpCodes.Not
Case ENUop.opOr : enuMSILopcode = OpCodes.Or
Case ENUop.opSubtract : enuMSILopcode = OpCodes.Sub
Case Else
Return False
End Select
Return (True)
End Function

msilRun_ is a function with no parameters because it gets all its input from
the colPolish collection in usrState, which contains all the Polish operations.
msilRun_ will iterate through colPolish and compile what operations it can. If it
finds an operation it cannot convert, it will give up, report failure through an
error handler that throws an error, and return Nothing to the caller.
However, the first and rather formidable job ofmsilRun_ is to create the
objects needed to build a dynamic assembly in the first place. The job is formi-
dable because the objects need to tie together in one and only one way.

Dim objAsmName As AssemblyName


Dim objAsm As AssemblyBuilder
Dim objClass As TypeBuilder
Dim objILgenerator As ILGenerator
Dim objMethod As MethodBuilder
Dim objModule As ModuleBuilder
Dim objObject As Object
Try
objAsmName = New AssemblyName
objAsmName.Name = "msilRun"
objAsmName.Version = New Version("1.0.o.0")

238
Code Generation to the Common Language Runtime

objAsm = AppDomain.CurrentDomain.DefineDynamicAssembly _
(objAsmName, AssemblyBuilderAccess.Run)
objModule = objAsm.DefineDynamicModule(objAsmName.Name)
objClass = objModule.DefineType _
(objAsmName.Name, TypeAttributes.Public)
objMethod = objClass.DefineMethod(objAsmName.Name & "_", _
MethodAttributes.Public, _
Type.GetType("System.Double"), _
Nothing)
objILgenerator = objMethod.GetILGenerator
Catch objException As Exception
errorHandler_("Not able to initialize MSIL generation: & II

Err. Number & II &Err.Description, _


II

"msilRun_", _
"Returning nothing", _
objException)
End Try

This code first defines several objects:

• objAsmName: The name of the assembly to contain the compiled code. Note
that this is an object and not a string, because the CLR requires (primarily
for security and interoperation) a structured name that includes the string
name, the version, and the locale information.

• objAsm: The assembly itself, a container for the type and the method. Many
assemblies contain multiple classes (also referred to as types), but this
assembly will contain one class.

• objClass: The class is a type builder because you need to not only use it,
but also assign its properties.

• objILgenerator: This is the object that will enable you to emit code to the
method itself.

• objMethod: Again, the method is a builder for the same reason the class is
a type builder.

• objModule: You might need multiple classes and namespaces in more


advanced projects, so the module is a container for the class.

The next step, in the Try .• Catch block, is to create the objects.

239
Chapter 9

objAsmName = New AssemblyName


objAsmName.Name = "msilRun"
objAsmName.Version = New Version("1.o.o.0 U)
objAsm = AppDomain.CurrentDomain.DefineDynamicAssembly _
(objAsmName, AssemblyBuilderAccess.Run)
objModule = objAsm.DefineDynamicModule(objAsmName.Name)
objClass = objModule.DefineType _
(objAsmName.Name ,
TypeAttributes.Public)
objMethod = objClass.DefineMethod(objAsmName.Name &u_u, _
MethodAttributes.Public, _
Type.GetType(USystem.Double U), _
Nothing)
objILgenerator = objMethod.GetILGenerator

You first create the assembly "name" as an object that contains the string name
and the version: in this simple application, you don't need to identify the locale.
The next step is to define a new, "dynamic" assembly with the structured name
and a type. The type you select is Run, because all you want to do is run the code.
You can also choose Save if all you needed to do was save the code to disk or
RunAndSave to do both.
Inside the new assembly, you create the single module and then the one-
and-only class (also known as type).
For the one-and-only method, you need to specify its name (msilRun~, its
scope as Public, and its return type; the return type is Double because all you can
compile are arithmetic operations. The rest of the DefineMethod would specify the
parameters expected by the method if it had any.
Finally, you need to assign an IL generator to the method as the hose that
will transmit specific CLR instructions.
The next segment of code is pretty straightforward, so I won't reproduce it
here: instead see the complete listing. It loops through the col Polish collection,
converting each entry to a qbPolish.
Note how in more than one place the compiler needs to convert collection
entries to specific types. This is because as of Visual Studio .NET 2003 a "generic"
facility is absent. This would allow you to specify that the col Polish collection
always contains objects of type qbPolish. This feature will be available in the
2004 release of Visual Studio for Visual Basic and C#.
Then the code calls msilRun_qbOpcode_ to see if the colPolish opcode can be
"transliterated" one for one to a CLR opcode. msilRun_qbOpcode_ returns the IL
opcode or Nothing on failure. If the opcode cannot be translated, it might be any
one of the large number of opcodes not supported by your prototype.
In Nutty Professor machine and assembler code, the pushLiteral opcode has an
operand that is a constant represented either as a .NET value or as a qbVariable. You
determine what type it is and try to emit an instruction for the CLR Ldc_R8 opcode.

240
Code Generation to the Common Language Runtime

Note that Ldc_RSloads a constant in the CLR operand that is a Double pre-
cision value represented in 8 bytes. What we call a push in the Nutty Professor
interpreter is a load in the CLR. Furthermore, note that the Nutty Professor inter-
preter has one instruction that obtains the type of the operand from an object or
the GetType of a .NET value, whereas the CLR is more strict: it uses distinct opcodes
for distinct data types, which makes the CLR more reliable and efficient at the
same time.
If the generation of the IL for the value fails, then the operand in the inter-
preter's code cannot be converted to double, and the expression deals with strings
that the demo object code generator cannot handle.
The "piece of resistance," of course, is where the code is actually run.

objILgenerator.Emit(OpCodes.Ret)
Dim objReturn As Object
Try
objClass.CreateType()
Dim objType As Type = objAsm.GetType(objClass.Name)
Dim objlnstance As Object = Activator.Createlnstance(objType)
Dim objMethodlnfo As Methodlnfo =
objType.GetMethod(objMethod.Name)
objReturn = objMethodlnfo.lnvoke(objlnstance, Nothing)
Catch objException As Exception
errorHandler_("Not able to run MSIL: " & _
Err.Number & " " & Err.Description, _
"msilRun_", _
"Returning Nothing", _
Objexception)
Return Nothing
End Try
Return (objReturn)

To complete the method, you must emit the Ret opcode to return to the caller.
The Try block then uses objClass (which, you'll recall, is a class builder) to "bake"
the ingredients into an executable pie. You then have to obtain as a 1YPe the cre-
ated type, and make an instance of that type.
Note that having to create an instance is somewhat of an unnecessary require-
ment because the very simple method you create is stateless and could be, in
terms of both C# and C++, a static method; in Visual Basic, it could be a method
with shared procedures and no variables in general declarations.
Using the instance object, you invoke, or run, the method without any param-
eters. Because it leaves one value on the stack, the CLR returns this as the function
value. This is returned by the code.

241
Chapter 9

Towards a Complete Object Code Generator


The simple prototype could itself form a basis for a complete object code genera-
tor, but you'd have to expand the Select Case statement in msilRun_ qbOpcode2MSI L_
to handle all "zero address" opcodes as well as add logic to the main msilRun_ rou-
tine to handle strings, functions, and ID.
Two better ways would be to either change the compilerJenCode_ procedure
or else develop a macro processor to do the translation from Nutty Professor
opcodes to the CLR.
compilerJenCode_ could use a compile-time switch to translate its input,
including the opcode and the operand to CLR.
The macro processing approach would define each possible Nutty Professor
opcode as one or more CLR opcodes and then expand the former to the latter.
You'd need to retain the Nutty Professor level to be able to watch code execute,
or simulate this functionality in CLR to trigger the interpret Event of the main com-
piler. This event sends information to an event handler (which in qbGUI places the
updated stack on the screen and refreshes the highlight of the RPN code), and it
would have to be changed to carry CLR information.

Challenge Exercise
Develop the MSIL object code generator for the quickBasicEngine.

Summary
Both the Nutty Professor interpreter and the CLR are stack machines that exe-
cute opcodes that interact with the stack. In Chapter 8, you saw how a simulator
for the CLR itself could be written in a similar manner to the interpret_method
of the quickBasicEngine.
Unlike many Microsoft products, .NET and the CLR were developed in an
open spirit. Officially, we don't know how the Visual Basic interpreter for VB 6 pro-
grams worked apart from the fact that it was a C++ program that at times imposed
some strange performance penalties. But the operation of any valid interpreter for
the CLR is well defined, even if you don't have its code, by the governing standard.
The open standards create an aftermarket of opportunities for unemployed
compiler writers, not only for .NET compilers for traditional languages, but also
for business rule compilers for expressing the rules of the organization in a highly
maintainable, and reasonably efficient, form.
Beyond the books by Andrew Troelsen and Serge Lidin on basic and
advanced use of the Common Language Runtime and lLASM, a considerable
amount of content is shipped with Visual Studio on the CLR and lLASM.

242
CHAPTER 10

Implementing
Business Rules
It's not simple unless it's complete.
-Larry Ellison, CEO of Oracle

MANY ENTERPRISE SYSTEMS require the implementation of a large set of so-called


business rules, which interact in complex ways and tend to be hard to change
and hard to document. Junior programmers often think that as a program gets
more complex, it will be necessarily larger and have more interactions, but
Edsger Dijkstra's 1968 discovery of structured programming shows that this isn't
necessarily the case. When a program gets to a certain level of complexity, its
actual text can become radically simpler by thinking of logic as input data, and
translating the input data to interpreted code.
This chapter describes how you can use the QuickBasic engine in particular,
and compiler design theory in general, to handle complex business rules, a source
of some difficulty in large enterprise systems. But first, we'll look at some real-
world problems with business rules and how they were solved.

Business Rule Solutions


Problems with handling business rules are common in the real world. Here, we'll
look at two examples of what can happen. The first is an example from a book,
and the other is a "war story" from my own personal experience.

Step-by-Step Engineering Instructions


Gerald Weinberg, in his book, The Psychology of Computer Programming,
described a large auto manufacturer's system that was failing repeatedly. It was
supposed to transform a customer's requirements for his built-to-order auto into
a set of step-by-step engineering instructions. The system was creating specifi-
cations for autos with ridiculous configurations, such as without any doors. The
step-by-step assembly procedures were impossible for the lads to follow at the

243
Chapter 10

shop floor. It appears from Weinberg's story that the requirements were being
translated into sequential assembly steps with many dependencies.
The code was, according to Weinberg, a rather confusing mess, and a pro-
grammer was assigned to rescue the project in a way that will be completely
familiar to modem programmers (the story happened long ago).! After examin-
ing the code, the programmer realized that he could rewrite the buggy program
faster than he could fix it, by creating a program that read tables, looked up the
configuration requests in the table, and found the specific sequence of steps.
Weinberg doesn't go into any detail about specifics about what the program-
mer went through, but it is clear that he recognized logic as data, and that the
existing, buggy solution did not make manifest the business rules. The business
rules were hidden in the complex code, and for this reason weren't being cor-
rectly implemented. Weinberg's hero seems to have rethought his problem as
less a question of implementing a specific set of business rules and more as an
exercise in implementing a second level of logic, in the form of a system that
read, "compiled," and processed the actual engineering requirements.

Digital Switch Usage


A program to bill users for complex usage of a digital switch was failing in a manner
that reminded me, at the time, ofWeinberg's story of the auto assembly program. It
was failing to handle conference calls and other calls with complexity over and
above a simple two-person call.
This program was written in Cobol using the latest structured techniques, but
all that meant was that the program was a structured mess of exceedingly small
routines. These routines called each other in such a complex way that nobody was
able to predict what the program would do for any particular telephone call above
a certain level of complexity.
In projects like these, you spend a few rather disheartening days, gazing at
the code and making notes, learning what it's about. A danger is that you will get
obsessed prematurely with one solution. Anyone flash of intuition must be criti-
cally examined, lest its attractiveness lie in merely giving you something to do.
One particular brain flash turned out to be the solution. I realized that what
the program needed to do was re-create a call when all it had was a call event,

1. In these programmer-to-the-rescue situations, the company retaining your services is told by


your "broker" or direct boss that you leap tall buildings in a single bound and walk on water.
This sort of oversell used to be more the norm, but it still happens. You waltz or foxtrot in,
and in many cases you do solve a problem. In other cases you don't, because you can't or the
problem isn't solvable.

244
Implementing Business Rules

such as "caller removed the phone from the hook."2 Therefore, I needed to get
from an event to a call. How could I do this?
I realized that the mere Cobol program had to act as ifit were a digital switch,
and thereby create calls from events. I knew in general how the Cobol program had
to act to simulate a Private Branch Exchange (PBX), which is a limited but general-
purpose (in 1\Jring's sense) digital computer, since these switches are commonly
state machines.
Starting in a "start" state, they wait for symbols that consist of atomic user
events, such as user picks up the phone, user dials digit x, user hangs up, and user
throws phone across room (sorry, just making sure you're awake).
Characteristic of this type of state machine is the fact that a state and sym-
bol fully determine a new state and perhaps a list of actions. I demonstrated to
the client that the simple transitions could be obtained from the original design-
ers of the switch and placed into a fIxed table defIned using the Cobol occurs
clause. Then, as the program read the file of events, it would actually retrace the
sequence of events the original switch had encountered. The billing people were
then able to identify at which points an actual call completed and how to appor-
tion the bill, for example, by dividing the cost of the call by the number of
conference callers. The solution worked (lucky me), and the client was happy.

The Dilbert Factor


Younger developers need to be cautious. Even a brilliant solution, if it involves
any new code, represents a financial risk to management. This means that you
need to do due diligence to make sure that a solution to the problem does not
already exist.
Another problem younger developers will encounter is the "Dilbert factor."
Scott Adam's popular comic strip seems to be on the side of the ordinary devel-
oper because it mocks (sometimes cruelly) low-level managers. In recent years,
it has mocked offshore and immigrant developers.
A very interesting deconstruction of the Dilbert phenomenon by Norman Solomon,
The Trouble with Dilbert: How Corporate Culture Gets the Last Laugh (Common
Courage Press, 1997) describes how the real Scott Adams hated his job and uses
irony to essentially exploit the ordinary working folks he left behind.

2. As a recovering philosophy major, I call these moments ontological moments. Ontology isn't the
study of how to get onto the bus. It's the theory of the fundamental constituents of reality. This
was an ontological moment because I realized that the entity analysis of just what a phone call
might be had never been carried out, and as a result, the programmer had no clear pathway
from event to call. Business rules must be based on a clear business ontology in this sense,
which ordinary users know without having to study philosophy.

245
Chapter 10

The Dilbert factor is the belief, subtly fostered by the strip, that (1) everything
interesting and worthwhile has been done in computing, and that for this rea-
son work should be the boring installation and cleanup of existing solutions;
and (2) even if there's room for innovation, we here at XYZ company are proba-
bly going to be laid off, we don't have a clue, and it's best not to take any risks.
Because of the Dilbert factor, you need to present your idea carefully, you need
to do your homework, and above all, you need to be sensitive to the feelings of
the designers of the bad solution if one exists. (But since I'm an insensitive clod
who treads on other people's feelings when I am not otherwise engaged in
walking on water or shooting myself in the foot, my advice in this area is some-
what limited.)

Logic As Data
As you know, client server and Web systems should be organized into two or more
tiers. The simplest design separates the GUI from the logic I data side. This allows
the GUI to be developed in Visual Basic when it runs on Wmdows and in HTML,
ASP, JavaScript, and other technologies when it runs on the Web.
The next step is to divide the logic from the data, and when this is done, the
logic consists of business rules. Normally, business rules are code in Visual Basic,
e#, and other compiled languages. In some cases, business rules linked directly
to data appear in 'fransact/SQL procedures. There is nothing wrong with this, as
long as the business rules are completely specified by the end user and are rea-
sonably static.
When the rules change, problems arise. This is because changes to the rules,
caused either by users changing their minds or by changes in business needs, cause
a lot of work in the typical Wmdows development environment. Programmers must
obtain the current version from the source code library. They must then determine
where to change the code and, of course, make the changes. The changes must
then be tested, and a new version built and installed.
The code changes are simple in many scenarios and might involve tweaking
a few operators or changing the value of a constant, but the interactions with the
source code library and the installation process constitute a large, fixed invest-
ment of programmer time. That's why it makes sense to represent much of the
logic in such a way that authorized users can change the logic without bothering
the programmers. It makes sense because the programmers are freed to concen-
trate on new problems and to work on the data and presentation tiers, where
their skills are best leveraged. It also makes sense because end users can under-
stand rules presented as data.
Logic as data also means less exposure to multiple versions of source code
with different business rules, a source of bugs. Of course, if the logic consists of
text business rules stored as SQL or Oracle fields, there can be multiple copies of
the database. But security procedures normally protect the end user from using
246
Implementing Business Rules

the wrong version of a database, whereas no such protection exists against using
multiple versions of compiled code, except internal programming procedures.
In some environments, these procedures may be more than adequate. In many
other situations, the user will prefer to see a layer of business rules in the data.
In his book, What Not How: The Business Rules Approach to Application
Development (Addison-Wesley, 2000), C. J. Date (a relational database pioneer)
makes the case for the use of pure predicate logic for the control of the business
organization. Here, complex expressions-including but not restricted to mathe-
matical, logical, and relational expressions as seen in ordinary programming
languages-would be used to implement the mission, goals, and constraints of
the business such as a formalized version of "For all customers, their satisfaction
level must never be less than 5; if it is, call the customer to find out what's wrong."
Although this level of control is what many users want, they also find that it
locks them into a dependence on a vendor. Therefore, my humbler and tactical
(as opposed to strategic) approach has the ordinary programmer using the logic
as data approach in specific situations, rather than across the board, and from
the view of the executive suite. Furthermore, this book is about what it takes to
develop a processor for business rules, while management texts typically assume
they are already available.

Case Study: Credit Granting


This is a song to celebrate banks,
Because they are full of money and you go into them and all
you hear is clinks and clanks,
Or maybe a sound like the wind in the trees on the hills,
Which is the rustling of the thousand dollar bills.
Most bankers dwell in marble halls,
Which they get to dwell in because they encourage deposits
and discourage withdrawals,
And particularly because they all observe one rule which woe
betides the banker who fails to heed it,
Which is you must never lend any money to anybody unless
they don't need it.
-Ogden Nash, "Bankers Are Just Like Anybody Else, Except Richer"

There are many areas in which representing logic as data makes sense. I've
mentioned a customer engineering application and a combined business and
engineering application earlier in this chapter. Legal applications and "expert"
diagnostic/strategic systems also generate rule sets of a complexity and rapid-
ity of change that strain the typical software change cycle.
The case study we'll look at in this section is taken from the credit industry,
presenting a part of a credit-rating solution.

247
Chapter 10

Credit Assessment Systems


Many grantors of credit, whether consumer credit or home equity, use complex
models to assess the ability and willingness of people to repay their loans. This is
because if you restrict your loans to blue-chip citizens, who pay their bills on time
and floss daily, you will have no market, for the very good reason that these good
people don't need to borrow money. They are your competition, because they are
buying equities and mutual funds with their spare cash. Therefore, credit grantors
have been forced to expand their markets repeatedly over the years. In the 1950s,
only solid citizens like James Bond had the early credit cards issued by Diner's
Club and American Express.
But then, the early credit grantors discovered a very interesting fact: people
actually are very willing to repay their debts and on time. This was first noticed
when the returning GIs of World War II faithfully repaid their home and school
loans. The civil rights movement also sent a message to these companies:
Americans of all races can be deadbeats or good risks, completely independent
of race. Companies were forced by 1960s legislation to ignore race on credit
applications, and found, much to their delight, that the market was bigger and
nearly all the new credit users were solid risks.
The credit grantors realized that even their primitive systems were a resource
that they could study to find out ideal risks. Behavioral studies of the era discov-
ered that men as debtors will do almost anything to repay automobile loans, and
would go hungry rather than lose their moped, Honda, TransAm, or GTO. Later
on, studies conducted internationally by the Grameen Bank discovered that poor
women worldwide are very conscientious about repaying loans that allow them
to set up their own businesses and talk back to their husbands. 3
However, the companies were obviously unwilling to use a random-number
generator-a lottery-to take a chance on applicants. Instead, they have, over
time, developed two approaches:

• The somewhat older approach is to apply a series of rules, or questions,


such as does the borrower own or rent, is the borrower employed and for
how long, and what about the borrower's credit record. These rules are
documented and explained to the borrower. I call this the "open system."

• A more recent approach, employed by a company in the US, uses a complex


metric (a proprietary trade secret) to give a single number that represents
the borrower's credit worthiness. This saves thinking time, since the only
business rule is whether the score is above a number set by the lender. I call
this the "closed system."

3. Cultural and religious factors playa role in credit; for example, Islamic law forbids high inter-
est rates and advises the credit grantor that he is taking just as much a risk of nonpayment as
is the debtor and is therefore bound by sha' aria to investigate the borrower's ability to repay
and his own risk aversion.

248
Implementing Business Rules

The advantages of the open system are twofold: the company can do
a detailed analysis, and the open system is more easily internationalized. For
example, a detailed analysis may find, by a complex application of the rules to
existing payment records, that people who rent in a particular Louisiana parish
and whose income is less than $15,000 a year always pay their debts on time,
but people in the same parish with incomes higher than $15,000 never pay their
debts on time. The single number mayor may not reflect this research. And, of
course, the open system is more easily used in international markets where
there is no preexisting credit scoring database.
A disadvantage of the open system is that borrowers can circumvent the rules
if they know the rules. This isn't true with the closed system, since you cannot
question the finality of the single number.
Many companies combine the new single-number, closed system with their
own rules.
In our example, we want loan officers to have a "calculator" to evaluate the
credit-worthiness of applicants for consumer credit, first-time home loans, home
equity loans, and so on. But in addition, the calculator should be itself change-
able-programmable when the loan supervisor wants to change the rules. Our
tool will use the open system, in that it will make clear the rules, indeed in such
a way that the calculator itself will explain the rules.

Credit Evaluation Considerations


In credit evaluation, annual income is probably the most important data point.
Undischarged bankruptcy is also important as a strong negative, although many
lenders will lend to people who have declared bankruptcy during the statutory
period of seven years. As a matter of public records maintained by credit-reporting
companies, the company can obtain the number of bills the customer has paid,
after 30 and after 60 days.
Housing is a consideration for many lenders. They tend to like homeowners,
although they seldom inquire whether the home is paid for. Renters are far more
numerous and less loved by lenders, except insofar that they are willing to pay
higher interest rates. Additionally, our application will have an "other category,"
for those who do not rent or own. Consider, for example, social critic Barbara
Ehrenreich's identification, in her book Nickel and Dimed (Owl Books, 2002), of
large numbers of honest, working people who pay their debts but live in motels.
Our system will not output a yes/no decision. Rather, it will generate an
annual percentage rate because we've decided (1) to offer a favored demographic
of middle income people a promotional rate and (2) to charge a higher rate to
people with high incomes. 4

4. Note that the latter varies somewhat from common practice, but I would like to show how
the flexibility of logic as data makes it possible to recognize either a reality we've discovered
or the populist tendency of the boss.

249
Chapter 10

This solution can use the QuickBasic engine as a nonvisual object to com-
pile and run a set of rules transliterated by us from a user-oriented notation into
strict Basic. The transliteration can be done with the qbScanner object that was
used to construct the QuickBasic engine to translate rules in the user's notation
. to Basic.
The user's format is condition, action: comments. When the condition is
true, the action is taken. The condition can use the same operators as are used
in QuickBasic, since they will be familiar as simple math to the end user. It can
use the data names annualIncome, bankrupt, thirtyDayPastDue, sixtyDayPastDue,
owns, rents, other, and otherDescription. The only annoyance is the fact that
the user cannot put spaces between word breaks in data names. This can be
overcome by careful language design, but it's probably not important enough
to warrant the effort.
The action will be either decline or an integer interpreted as the loan's
annual percentage rate. Note that the action is an object represented as a weak
type .NET object, because it is either a single-precision annual percentage rate
or the string "decline".5

The Credit Evaluation Calculator Application


Run the sample application creditEvaluation.exe, available from the Downloads
section of the Apress Web site (http://www.apress.com). Mer an introductory mes-
sage box, you will be asked if you want a set of example rules. Click Yes to see the
creditEvaluation main form with an example, as shown in Figure 1O-l.

5. The word decline is used by credit grantors to remind the borrower that a loan in our society
is a contract into which both parties freely enter.

250
Implementing Business Rules

Evaluate Credit Annuallnmme I 20000jj Housi"....""""---'-


Thirty day past due I o::a rOwn. (0' Ranis
Sixty day past _ Ii----::o::a=:=;.
Close r Bankruptcv /undi9charQad)

Icredilscorin g rules Add Rula Edn Rule


r Thorough rule
.pphe.1JOn Delete Rule

nnusl tncC!!:'le < 5000 , decI1:le : In!luffl.cenc annua 1 .1nco:r:e


InnuolIn=< >= SOOO And a""uallnCCllle <= 15000 And Not Bankrupt And Thlrtytlay < 2 And SlXtyDay =
0, .10: In the
a.nnaallneo::e ~ 15000 Anc:i annualIDCOU'~ <;;0 25000, decline: ':hl,,, income range is declJ.ned by ch1!1 fir'll as company
AAD.UAllnco:r.~ > 25000 And Not Bankrup'C, .15: !'he hl.qh l.nCQ:te client w1.l1 pay a biqhex lnt.ere.st rate et our f1.m
defaultPolley, decline-: Rejects o~her applicants

3/2112004 10:12:4' AM LoaclJ.n

SlIVa Se\t,ngs I Restore SIII~ngs I Clear Selbngs I About

Figure 10-1. Credit Evaluation Calculator with an example

The calculator has been set up to enforce these rules:

• If the applicant's annual income is less than $5,000.00, decline. We keep


income levels low in this example for clarity.

• Our favorite demographic group is people with a very moderate income


between $5,000 and $15,000, who have a good history, including no bank-
ruptcy within the past 7 years, not more than one bill paid past 30 days
in the statutory period, and no bills paid past 60 days.

• We have discovered that people in the target demographic area with incomes
between $15,000 and $25,000 don't pay their bills; therefore, we decline to do
business with them. In order to avoid the appearance of discrimination,
we need to give a bona fide business reason for doing so, and it is because
they don't pay their bills.

251
Chapter 10

• Our company would like to concentrate its business among people of


moderate income, but if high-income, good risks apply, we accept them at
a higher rate. (If they are foolish enough to apply to us, when our mission
statement is "loans for the honest poor," let's soak them for spare change.)

These rules in particular, and any set of rules in general, mayor may not be
logically sealed, or airtight, in the sense that one or more rules exist for each logical
possibility. The set of rules used by the calculator is not logically sealed. Consider,
for example, what happens when the applicant makes $25,000 or more but has
an undeclared bankruptcy. Since this set of rules is not logically sealed, we need
a default decision, which is decline here. The Default Policy button in the Credit
scoring rules section allows you to define the default policy. Later in this chapter,
in the "Handling Contradictory and Redundant Rules" section, I will discuss an
application of compiler and symbolic interpretation technology that can analyze
complex sets of rules to see whether they are sealed, and if multiple rules apply
to one case.
A requirement is that we explain the rules to the credit analyst and the appli-
cant in an understandable way. This is why the rules contain a comment field.
Also, code inside creditEva1uation uses an instance of the qbScanner object to
transliterate the rules from source form to lengthier explanations. To see a spe-
cific example, click the third rule in the list box, which excludes the income range
from 15000 to 25000. Then click the button labeled Explain this rule, on the right
side of the form, to see an explanation of the rule:

Since we want to make sure the user knows that coding < Oess than) versus
>= (greater than or equal to) has a different result, we use words to clearly state
the effect of the rule. Since we're using the QuickBasic engine to compile rules,
this transliteration can be applied to any changes in the rules.
You can document any set of rules completely.1i:y clicking the button labeled
Explain (in the Credit scoring rules section). You should see the report illustrated in
Figure 10-2. You can copy and paste this report into documentation.

252
Implementing Business Rules

1. If annual ~ncome 1S less tnan 5000, the app11cat~on wIll be declined


(Insufficent annual inco~)

2. If annual ~ncome ~s greater than or equal to 5000 and annual 1ncome ~S less
tban or equal to 15000 and not undischarged bankruptcy and 3D-day overdue
reports is less than 2 and 60-day overdue reports equals 0, the appl1cat~on Will
be accepted With an Annual Percentage Rate of 0.1 (In the ~dxange group we
accept most applicants at a very favorable rate)

3. If annual 1Dcome is greater than or equal to 15000 and annual 1ncome 1S less
than or equal to 25000, the application will be declIned (~hIS income range is
declined by ~~is firm as company POlICY)

4. If annual lnco~e is greater than 25000 and not undischarged bankruptcy, the
application w111 be accepted w1th an Annual Percentage Rate of 0.!5 (-he high
1ncome cl~ent w11l pa;,' a h1gher interest rate at our f1=)

5. If no other rules apply, the appl1catl0n w1l1 be declined (Re~ects other


applicants)

Figure 10-2. Explaining the rules

Applying Rules
Let's try applying the rules to the starter applicant standing. Take a look at the
applicant information, which provides the applicant's credit history.

Annuallnmme 20008 Housi


Thirty day pest due r: Owns r. Rents

Sixty dey pest due r: Other (pleese describe)


r Bankruptcv (undischarqed} I

The applicant has an income of $20,000, pays her bills on time, has no undis-
charged bankruptcy, and rents her home.
Click the large Evaluate Credit button in the upper-left comer of the screen,
and then wait. This prototype software is fully scanning, parsing, and interpret-
ing the code, and its progress appears at the bottom of the screen. It takes about
30 seconds on my Pentium 4 (much of this time is spent in making the progress
report). Later in this chapter, in the "Improving the Credit Evaluation Calculator"
section, I will suggest some ways to make this prototype faster.
Since a system requirement is that we explain the result to the applicant, the
screen in Figure 10-3 should appear. This screen provides a statement, similar to
the preliminary explanations shown in Figure 10-2, which can be provided to
the user.

253
Chapter 10

Ap~lIcan' Stand·Dr=:---:----:=:----;::=-:-:=~
Evaluale CredIt An nuallncome 1 2ilOCi03 Hous ing ]
Thirty dny .,..,.. due 1 0:::8 r OWns to Rents
Sixty day.,..,.. due i-1-----:0:::8""'." r: OTher (pleese describe)

Close (don·' Close I


""veset1mgs)
-Cr-ed-j-t s-c-on- ng rule:s;----:=Add
=:=R~uIe
-:;----;---::Def
:-:-O-U:::II:-:P::oI-:::ICV=--;----:----7:-=:::~7:~::-;-;:-:-;::::;-::-::;~;:::=.1
r Thorough rule CIeor
~ppI""'tlOl1

n..,u&lIneo~ < 500 0 , decline : Insuff l eenc ennu" l l.nco::ne


annuallnccl:\.e >= 5000 And annua.llncOT".e <= 15000 And Not s&okrupe And i:h.1.r-ey1)ay < 2 And S1.xtyOoy = 0, .10: In the
an:luallncOllle >= 15000 And ennuISllnco;:"'!' <- 25000, d~cll.n~: ::hl!1 1ncoIU: range 1!1i declined by ebl" fi.Ol as c~a.ny
~nn..nllllncor:e > 25000 And N"ot S&:lkrupe,.:;'5: -:he: h1.qh l.nc~ cllent wl.ll pay a b~9he:r l.nteren .rate at: our firm
defaalt:Poll<:Y. declu.e: Reject:. oth~r Opp11cllnt~

':he .ppllea~1on has been c:le:cllned

Rale 3: BecaU5l1 annll41 lnco~ l.~ qrea.ter t!\an or I!!qu.tl t.a 15000 and anDWll
lncome 1.3 le~" chan or equ"l t.o 25000, eh@ app11cat.l.o.o. has ~en decllne:d ('=tu.~
lnc:OIr..e rallge 1.5 decll.ned by -<:hi.!- fl..rc. as cQ:pany p<:I11cy)

3/21/2014 lJ,46,03 l\H Exec~elng cocplled code oe 79


3/21/2004 10,46:03 lU! Exec~tug ca:plled code oe 80
3/21/2004 10:46:03 AA Execut1.nq car.pl1ed cod~ Clt -:.!.
3121/2004 10: 46:03 11K Executing c lled code ae 82

s~. SettIngs

Figure 10-3. Credit evaluation

Let's examine how the rules (which are not, after all, in QuickBasic on the
screen) are transliterated into QuickBasic source code using the independent
lexical analyzer. Anyone rule can be viewed as its corresponding QuickBasic
code. Highlight the third rule (the rule that was used in the previous example)
and click the button labeled Show Basic code to see the basic form of the busi-
ness rule:

If annua Incom~ >= 15000 And annualIncom~ <= 25000 Then


Pr~nt "d~c ~ne" , n n , 3
E.nd
End If

Note how the Basic code communicates results.


quickBasicEngine and the Nutty Professor interpreter do not do any printing.
Needless to say, they don't search for and activate your printer. Nor do they display
on the screen, because quickBasicEngine needs to be completely independent of
any graphical environment to move from the Windows environment of qbGUI to
a Web service.

254
Implementing Business Rules

Instead, the Print command raises the interpreterPrintEvent event, which is


intercepted and parsed by the credit evaluation software, which then displays the
decision and the applicable rule number. This is what you see when you click
the Show Basic code button.

Viewing the Code in the QuickBasic CUI

You can see how the code is executed in the QuickBasic GUI. The button labeled
All Basic code displays the complete program transliterated from the rules. Click
this button and copy and paste the code into the code section of the qbGUI pro-
gram. Then click Run in qbGUI to see the results, as shown in Figure 10-4. If the
More screen is displayed, you will see the detailed, step-by-step execution of the
business rules.

code at IP lOa
3/21/2004 10:56: 39 AM ~u.nning cod • • t I P 10'9
3/21 / 200-4. 10: 56: 39 AM Rwrningo code ee .IP 110
cDde at. If' 111
XMl Inspecl Test Ir T••levenlloo

token':"ypel'd~ntl.fler
.
fScenned Tele;=- loom Pa ..... OUI""" loom RPN Zoam
prog:.tall: _ource ecde: trcm 1 eo ... 1 o~_ 0: .......... r..t ...
c:::l itolJl.Iceprogl:&lI: .IIDU!'"C'I!! cod'!!! (z:o.:::j 2 opl'v.bL1terol 2MDL:J
~ Zoom S\crege L loom
0 .. be.nbupe V."~(!1
OS owns VlI.ruu:lc,
J

tot~~:rypf!ap.e.n.t.o: en .-GlU'ce:Pro9r &ldody: .ourclI!! c.od 3 opPop 1: A:II.:IIlqJ:11 eX) 06 rent. v.ru:.n~~
tokentypel1nJi1.!1nedlnt;t ope nCocie-: "Q\lzee -cede: hOll 1 4 cp.Re:m 0: ......... Let 01 ot.her V.rUnt.
tobn':"ypt."Ne-..lln~ OD . ...:J .sourceProgrlllt.3ody: .sO\Lrce c~ 5 cpP'1.lsh.Lltf!l'al 0: PI...::J ·1

Figure 10-4. Running the business rule code in the qbGUI application

In qbGUI, click the Storage Zoom button to see the business rule data storage,
as shown in Figure 10-5. Note that each data point has been compiled as a Variant,
with the narrowest type for its current value; therefore, annualIncome and rents
have the Integer type and the value 20000 and -1, while other data points are
zero bytes. The rents variable is -1, not Boolean and True, owing to the way in
255
Chapter 10

which the value of the Rents radio button is converted to the value of the rents
qbVariable.

Zoom ,':~~

01 annual Income Variant,Inteqer:vtInteqer(


20000)
02 thirtyDay Variant,Byte:vtByte(O)
03 sixtyDay Variant,Byte:vtByte(O}
04 bankrupt Variant,Byte:vtByte(O}
05 owns Variant,Byte:vtByte(O) I
06 rents Variant,Inteqer:vtInteqer(-l)

Close

Figure 10-5. Business rule data storage

When the Rents radio button's Checked status is passed to the valueSet method
of qbVariable (discussed in Chapter 6), it is a .NET object, and valueSet tries to
assign it to a series of widening QuickBasic data types, starting with the byte. If it
started with Boolean, the integer value -1 would be converted to Boolean, and in
many cases, this would be wrong.
The .NET representation ofTrue is -I, and this fails to convert to a byte but
can be converted to a QuickBasic Integer (represented by a .NET Short Integer).
Therefore, the selected type is Integer. Since quickBasicEngine is not as fussy as
is .NETVisual Basic with Option Strict, the Integer value -1 can still be used in
Boolean rules such as "rents or owns."
In qbGUI, click the RPN Zoom button to see the generated Nutty Professor
code. Figure 10-6 shows what this code looks like when it contains comments.
When you selected a rule and clicked the Show Basic code button in the
credi tEvaluation application, you saw an End statement in the generated code,
which brings us to another feature of that application. Scroll to the bottom of
the RPN list box in the main form of qbGUI to see it at location 155.

256
Implementing Business Rules

45 opRem 0: *****
If annual Income >= 5000 And
annual Income <= 15000 And Not Bankrupt And ThirtyDay <
2 And SixtyDay =
0 Then
46 opNop 0: Push lValue annual Income contents of
memory location
47 opPushLiteral 1: Push indirect address
48 opPushIndirect : Push contents of memory location
49 opPushLiteral 5000: Push numeric constant
50 opPushGE : Replace stack(top) by opPushGE(stack(
top-1), stack(top»
51 opNop 0: Push lValue annualIncome contents of
memory location
52 opPushLiteral 1: Push indirect address
53 opPushIndirect : Push contents of memory location
54 opPushLiteral 15000: Push numeric constant
55 opPushLE : Replace stack(top) by opPushLE(stack(
top-1), stack(top»
56 opAnd : Replace stack(top) by opAnd(stack(top-1),
stack (top»
57 opNop 0: Push lValue Bankrupt contents of memory
location
58 opPushLiteral 4: Push indirect address
------_....
Figure 10-6. Assembly code for a business rule

Handling Contradictory and Redundant Rules

By default, when the check box on the calculator's main form labeled Thorough
rule application is not selected, each successful rule causes the rule evaluation to
be terminated. We need to allow the user to thoroughly apply all rules to detect
contradictory and redundant rules. This is not because "users aren't programmers"
(in fact, many are). It's needed because programmers and users make mistakes
when managing large sets of rules.
Three types of contradictory situations may exist in this application:

• A benign contradiction (from our point of view and not that of the cus-
tomer) occurs when more than one rule indicates that the customer should
be declined. A benign contradiction is resolved by declining the user and
providing him all the reasons, so that he can recover his creditworthiness
with us.

257
Chapter 10

• An APR contradiction occurs when two or more rules indicate that the
customer should be accepted at different annual percentage rates. An APR
contradiction is resolved by giving the customer the lowest interest rate;
otherwise, she will complain if another customer in a similar situation
receives a better rate.

• A fatal contradiction exists when one or more rules indicate that the cus-
tomer should be accepted and other rules indicate a decline. Here, the rule
set is broken, and a decision cannot be made based on our system; the
analyst needs to fix the erroneous collection of rules.

Let's see how each type of contradiction is handled. This will also let you see
how rules can be added and edited.

Benign Contradictions

First, let's add a benign contradiction. Make sure that the Thorough rule appli-
cation check box on the main calculator form is checked, indicating that you
want all of the rules to be applied. Click the Add Rule button above the rule list
to see the ruleEntry form. Double-click bankrupt in the Data Names list to get
a condition of bankrupt, and enter the explanation Cannot lend when there is
an undischarged bankruptcy, as shown in Figure 10-7. Make sure the Decline
radio button is selected in the Policy section, and then close the form.

ruleEntry ~~;;?.

f Condition Poli
CondrtlCn r. Decline
lbankrupt
I (' Accept et this APR <83 1
I ExplenetlCn
ICon not lend when there is en undischerged benkruptcy
Cencal Close I
Data Names Example Rules
nnu.fIllncome < 5000 dedlne Insufficenl ennuallncome
annua ncome >= 5000 And annuaUncome <= 15000 And Not Benkrupl And ThlrtyOey < 2 And SottyOey
IInnu"lIncome >= 15000 And IInnu"lInccme <= 25OOO,decllne Th,s Income range IS declined by th,s fim
IInnueUncome :> 25000 And Not Benkrupl. 15 The hIgh Income den! "';11 peY II h>gher Interest rete lit D!
defllultPofocy declme RejBCIS other appiocenlS
rents
other

Figure 10-7. Adding the bankruptcy rule

258
Implementing Business Rules

Click the Bankruptcy (undischarged) check box in the Applicant Standing


area of the main form to indicate that the applicant has a bankruptcy. Then click
Evaluate Credit to see the rejection and its explanation, as shown in Figure 10-8.
In the explanation, two rules are explained in the benign case, where the appli-
cant has been declined for two reasons.

Evaluate Cmdil
Annuallnaxne 1 2OOOO±l Housincr- - - - - - ,
Thittyday.-_ 1 O±l r awns r. R@nl9
--:~d"y p;;;;i;h;
r--~-~==---;;:Stxty ;-1 ---:'0±l7'l.
Close (don't
so ... sattongs) Close

AddRuI&

ann~ellnco:e < SOOO/d.~cll11~! In1Iufh.cent. annual l.D.CO:i!!:


ann-..zaltnc~e >= 5000 And annu.&l.lncQ:lr~ <= lS000 And Not. Bankru t And 'l'hirt. 04

llD..'lu!l :n.ec~@ > 25000 A.'ld Not. Bankru t .15: 7he hiOh income el1~ne w.111 ay a. hl.qher lnte.re:.5t rau at ou.r f1
nkrupt, D!:CLINE: C41lO0t. le.nd when cherI!! .15 an anch.5charge:d bankruptcy

he: a.ppllcat1on has been dl!!clln~d

ule- 3: 8e:c:ause annual 1ncome 15 gt:l!!ate.r than or equal to 15000 and a..nnual
nccm.e 15 le.!!'s than or eq-.lal 'to 25000, the. applicot1o;a M.3 been deel1.ne:d ( .... oS
nc~ range 1.5 decl1ned by thi. firm as company poliCY)

ule: 5: Se:caU:llII! &ncU,charqe:d bankruptcy, the: applicAtion haa been declined


(Can.nOt lend when t:h~re 1$ an undl.:seharge:d bankrupecy)

3121/2004 11:24:54 All !!~ecut1n!l code at 119


3121/2004 11: 24: 54 All El<ecuuD!I c""'Piled code at t20
3/21/2 O~ :::24:5! All E><ecuung c""'Pi1ed code at :21
3/21/2004 1::2,:54 All code at 132

Seve SettIngs Reslore Sell ngs


----------------~
Clear Sailings

Figure 10-8. Multiple declines explained in thorough rule application

Click the button labeled All Basic code to see the code that is generated
when the rule application is thorough, as shown in Figure 10-9.

259
Chapter 10

Let annualIncolte = 20000


Let tiurtyDay = 0
Let nx~yDay = 0
Let ba.nltrupt = True
Let owns ;:. !'alse:
,Let rene.! = T.rue
Let other -= Fal!le
Lee otherDe.script1on = "" r.
If annuallncotr.e < 5000 :"nen
Print ftdeellne"" , • " , 1
decl.al.on."!a:de = '!'rue
End If
If annualIncer.. >= 5000 And annualInco= <= 15000 And Net Bankrupt And ::hueyDay < 2 And S1.X~yDay = 0 Then
Prlnt 0.1 & " n , 2
deol.s1o~~4de = ~rue
End If
If annualIncem<! >= 15000 And annualInccme <= 25000 Then
Print Ndecline~ , " "" , J
decl.!lionMade: :rue =
End If
If annualIncOJr.e > 25000 And lot Bankrupt Then
Print 0 e is , lit " , 4
de:cl.~10nM4de = Txue:
end If
If bankrupt :hen
Print "DECLIN!." , N f l , 5
dec1.stcnMade: = 7~ue:
End If
If Not dec1~ionMAde "hen
Prl.De "decll.ne- , " "" , 6
dec1.s1o~~a.de: = True
EJ1d rf

Figure 10-9. Basic code for thorough application of rules

The thorough code moves the default action to the bottom and uses the
decisionMade flag to indicate whether the default rule needs to fire.
The interpreterPrintEvent handler in creditEvaluation in this scenario inter-
cepts two events. The first is the detection that the applicant's annualIncome is
between the excluded range of 15000 to 25000, and here the event handler receives
a decision of decline and a rule number of 3. The second is the detection of an
undischarged bankruptcy, with a decision of decline and a rule number of 5.
Each time the event handler code receives a new decision, it executes the
following logic:

1. If the evaluation is unassigned in the module global OBJevaluation,


OBJevaluation is assigned to the new evaluation.

2. If the evaluation is a string {which will be "decline"} and a new evalua-


tion is also "decline", this is a benign contradiction.

3. If the evaluation is a number (which is the APR for an acceptance) and


a new evaluation is another acceptance, we continue to accept, using
whichever APR is more favorable.

4. If the evaluation is a number and the new evaluation is decline, or vice


versa, the rules are invalid.
260
Implementing Business Rules

APR Contradictions

Now, let's try an APR contradiction, where mUltiple APRs are specified. Delete
the rule you just added by clicking the rule bankrupt, DECLINE, and then clicking
the Delete Rule button. Add the rule shown in Figure 10-lO. Here, we've decided
to accept people at the high end of the favored range (whose annual income is
between $14,000 and $15,000) with a promotional APR of 5%.

Condilion- -
I Cond~lOn
lannu!lllncome> 14000 And annuallncome<=1 5000

l ~____~__________~~~~=n~!I=~~n~_______________
c
IPromobonal rete
j r po1;q----
r DeclIne
to Accept at thIS APR
~~~~________________~
Concel

Data Names Example Rules


~nnu6l1ncome tJnnualincome < 5000 decline InsufflCenl !lnnuellncome
thlrtyDey annuaUncome >= 5000 And annuallncome <= 15000 And Not Bonkl'upt And ThlrtyOay < 2 And SixtyOa)l
SIXIyDay annuaGncome >= 15000 And annuallncome <= 25000.dechne This Income renge IS declined by thIS fi
benkrupt annua ncome> 25000 And ot Bonkrupt•. 15: The high Income client will pey a higher mterest rete at
awns defeul1PoIicy. decline Rejects other eppllC&nts
rents
other

Figure 10-10. Adding the promotional rule

Return to the main form, and then set the applicant's income to 14500. Thrn
off the Bankruptcy indicator. Click Evaluate Credit to see the screen in Figure 10-11.

Th~ appl~cation ha~ b~~n accepted: the annual perc~ntag~ rate ~hall b~ 0.05

Rule 2: 8ecau~~ annual lnco~e i~ gr~ater than o~ e~~al to 5000 and annual 1ncc~e
1~ le3~ than 0% equal to 15000 acd not undl~cha%ged bankruptcy and 30-day
ove~due report~ ~~ le~~ than 2 and 60-day overdue report~ equal~ 0, the
appl~cat~on ha~ been accepted with an Annual Percentage Rate of 0.1 (In the
m1drange group we accept mo~t appllcant~ at a very favorable rate)

Rule 5: Becau~e annual inco~e 1~ greater than 14000 and annual 1ncome 1~ le~~
than or equal to 15000, ~~e appllcat10n has been accepted w1th an Annual
Percentage Rate of 0.05 (Promot10nal rate)

Note that r~le 5 conf1rm5 a preceding acceptanc~: s~lect1ng 10we~t appl1cable APR

Figure 10-11. Results of an APR contradiction

The customer is given the best APR in a clear and documented fashion,
because we've treated logic as data.

261
Chapter 10

Fatal Contradictions

Now, let's see what happens when we insert a "fatal" contradiction. We'll try to
decline any applicant who is a renter.
Go to the ruleEntry form and enter the rule shown in Figure 10-12. (In gen-
eral, creditEvaluation requires that each rule be rather verbose and specify all
the conditions that apply, which for many applications, is a good thing.)

Condition Pair
Condlllon .. Decline
irents

I ExpillOlloon
I('" Accept at thIS APR

IDecllne ell renlersl My lenanllS a bum]


Cencel Close

DIIta Names ex..mple Rules


IInnuaUnc:ome ennulliincome < 500D,decllne Insuff,cen1 IInnuIII mcome
thlrtyOlIY annulIUncome >= 5000 And IInnulllincome <= 15000 And Not Benkrupt And ThiltyDay < 2 And SocIyOIIy
sixtyOay annullllncome >= 15000 And ennuellncome <= 25OOO,dechne ThIS income range IS declined by this firn
bankrupt ennulIUncome > 25000 And Not Benkrupt , 15' The high income dent ",U pay II higher ,nterest rale at 04
owns defaultPohcy, declme. Rejects other applicants

other

Figure 10-12. Frivolous rule

Close the ruleEntry form and click Evaluate Credit again, to see the report
shown in Figure 10-13.

~h~ d~c~slon cannot b~ p~rform~d b~caus~ th~ rul~s ere not consll1t~nt

Rule 2: B~cause annual income 1S greater than or equal to 5000 and annual income
is less th~n or equal to 15000 and not undlsch~rged bankruptcy end 3D-dey
overdue reports 1S less than 2 and 60-day overdue reports equals 0, th~
appl~cat~on has been accepted ~1th an Annual Percentage Rate of 0.1 (In the
m1drange group ~e accept most applicants at a very favorable rate)

~u1e 5: Because annual income 1S greater than 14000 and annual 1ncome 1S less
than or equal to 15000, the appl1cat~on has been accept~d With an Annual
Percentaqe Rate of 0.05 (Promot1onal rate)

Rule 7: Because rents home, the appllcat10n has been declined (Decline all
renters! My tenant 1S a bum.)

that rule 5 COnfirms a precedinq acceptance: selecting lowest applicable APR

Note tbat rule 7 contrad1cts a preced~nq rule


----~----~--------------~------------~

Figure 10-13. Frivolous rule causes this report

262
Implementing Business Rules

The rule was a bad rule, since it logically contradicts two other rules. When
business rules of this or greater complexity are encoded in a programming lan-
guage, the bad rule would be at best dead code (where the renter test follows the
other tests); at worst, it would be live code that prevents code that contradicts its
effect from executing.

Improving the Credit Evaluation Calculator


Considerable improvements can be made to the approach shown in
creditEvaluation. For example, the Basic code for the benign case, shown earlier
in Figure 10-9, could be optimized by the creditEvaluation form code, and this
would be a worthwhile exercise. Consider that the calculator assigns and then
uses several variables that reflect the applicant's standing. We've noted how they
are variants and the Nutty Professor interpreter imposes a "tax" on their
retrieval.
Instead of this technique, it would be a simple matter to replace the names
by Applicant Standing values. This may be combined with the constant evalua-
tion optimizing feature of the quickBasicEngine's compiler (described in Chapter 7)
to speed up evaluation considerably by reducing the NUtty Professor code. If you
add complete MSIL generation, the speed of evaluation will approach, and in some
cases exceed, that of hard -coded business rules.
Or, the code seen in Figure 10-9 can be copied and modified if the user decides
the rules won't change, and the code can be transferred to a business object. For
example, you can add declaration statements for each data point and easily con-
vert the code to a Visual Basic .NET validation rule.
Another enhancement would be static evaluation of the rules, added to the
dynamic evaluation we obtain when the check box labeled Thorough rule appli-
cation is checked. The problem with dynamic application is that it reports only
contradictions-whether benign, APR, or fatal-for a specific case in which the
contradictions arise. A more intensive analysis of the static semantics of the
rules would reveal flaws for the user at a deeper level. For example, note that the
credit evaluation rules specify a series of contiguous income bands. A trivial but
dangerous error would be to leave gaps inadvertently, so that certain people
receive the default treatment.
Note that the first rule shown in Figure 10-3 specifies that annuaHncome
must be less than 5000, and the second rule states that annuaHncome must be
greater than or equal to, since the opposite of less than is not greater than; it
is greater than or equal to. (This isn't "programming" knowledge; it's high
school math.) Suppose the user mistakenly uses the greater than in the second
rule. People with incomes of $5,000 won't be treated correctly.
Since logic is represented as data, there are two approaches that can be used
to prevent these problems:

263
Chapter 10

• A stress test button could be added to the creditEvaluation form to pro-


vide random applicant information, and the output can be examined by
the user.

• A more advanced method would be to create a set object and make each
rule the defining rule of each set. Then set operations-including intersec-
tion, union, and complement-could determine the existence of error
sets, including, in the example, the set of all people making exactly 5000,
which are wrongly declined.

For the more advanced set solution, you would need the ability, probably
inside the Nutty Professor interpreter, to do symbolic calculation with unknown
variables. This would be a version of the interpreter that when given (let's say)
the And operation, a stack value of False, and another stack value of unknown,
would push False, transforming the unknown value back into a known value.
The symbolic version of the Nutty Professor interpreter would need to use
fairly advanced math in cases where it had to compute with values known as
ranges of values. For example, to symbolically "add" 5 to a value known to be in
the range 10.. 20 is to get the range 15 .. 25. The object-oriented approach empow-
ers this type of development since it can define sets, ranges, and unknown
values as objects and value types.

Summary
This chapter has shown you that, with a performance penalty, treating logic as
data provides a new level of flexibility and control for real applications. And note
that as long as you have an effective tokenizing tool for transforming the exter-
nal representation of business rules on a form or in a database and a compiler/
interpreter, it's not necessary to write lexical analyzers and compilers to get to
this level of flexibility.
Consider applying the techniques in the preceding chapters to get any user
with a business rule problem to a level that genuinely eliminates "programming."
The elimination of programming has long been a Philosopher's Stone. Many pro-
grammers claim it is not possible. However, this chapter shows that if pure,
declarative, nonprocedural logic isn't programming, it is indeed possible to
eliminate, for classes of applications, traditional procedural programming, and
that it has benefits even in the area of mere documentation, as our automated
explanations show.
In the next and final chapter, we need to address the issue of language
design as it occurs in crafting a notation for an end user, as in our simple
condition,action:commentnotation, or in creating a tradition~ language.

264
Implementing Business Rules

Challenge Exercise
Refine the rules in this chapter, by taking into account a new fact: homeowners
in the 5000 .. 15000 band are better risks than renters and other housing categories.
Give the homeowners a better rate, and test the new rules.
For a real challenge, find a new industry with which you are familiar and
define a set of business rules. Using Visual Basic .NET, design a calculator using
creditEva1uation as a model and the quickBasicEngine DLL. For example, if you are
selling life insurance to put yourself through school and your sales calls involve
pricing life insurance based on the age of the applicant, his financial standing, and
whether he smokes, you can design a simple Gill, using quickBasicEngine to evalu-
ate the rules (using quickBasicEngine's eva1 method) to come up with a yes/no
accept/ decline decision, or a rate versus decline. Your code will gather the input
data from the screen and assemble a valid QuickBasic expression, which will then
be passed to quickBasicEngine.

Resources
The following are some resources for more information about handling busi-
ness rules:

The Psychology o/Computer Programming: Silver Anniversary Edition, by


Gerald M. Weinberg (Dorset House, 1998). This book is a well-known clas-
sic, first printed in 1972. Weinberg, a former IBM "wild duck" employee,
concluded from his experience in Big Blue that programmers and their
managers systematically disregarded human factors and psychology, and
related the story from auto manufacture that I passed along in the "Step-
by-Step Engineering Instructions" section of this chapter.

Softwar: An Intimate Portrait 0/Larry Ellison and Oracle, by Matthew


Symonds (Simon & Schuster, 2003). Larry Ellison is an interesting guy
who matured from just wanting to have fun to a real commitment to his
company.

265
Chapter 10

The Trouble with Dilbert: How Corporate Culture Gets the Last Laugh, by
Norman Solomon (Common Courage Press, 1997). Dilbert communicates
hopelessness and lack of initiative as a positive virtue that masquerades
as cool. As such, he reminds me of Herman Melville's Bartelby the
Scrivener, who was so burned out that he preferred not to do much of
anything. Norman Solomon shows how this is passive I aggressive, and
while it may manufacture consent to corporate policies, it is a recipe for
nonproductive organizations that serve, at best, only an inner ring of top-
level people. Norman Solomon also observes that the comic strip never
mocks or disrespects top-level executives, only middle managers.

What Not How: The Business Rules Approach to Application Development,


by C. J. Date (Addison-Wesley, 2000). C. J. Date's original vision was that
we would be able to write the rules once for the organization in a non-
procedural language. The reality is that not only are the rules hard to
discover, but also that different stakeholders want different rules. That's
why my example stays at a low level-that of "loans for the honest poor."
The boss is easy to identify, and while he occasionally makes mistakes
("deny all renters!"), he is able to admit his mistakes.

266
CHAPTER 11

Language Design:
Some Notes
Confucius, hearing this, said, "Don't bother explaining that which has
already been done; don't bother criticizing that which is already gone;
don't bother blaming that which is already past."
- Confucius, The Analects

MASTER KONG Fu (Confucius' real name) might have thought that in this book I
have "explained that which has already been done," "criticized that which has gone,"
and perhaps even "blamed that which is past." But, like most great ones, Kong con-
tradicted himself, for even he was concerned with transmitting the past and scorned
originality for its own sake. I do believe that as so much knowledge is increasingly
encapsulated in products, we forget how much work the basics represent.
You have learned techniques for specifying a language precisely in Chapter 4,
and for building a lexical analyzer in Chapter 5. You learned how to parse and
interpret this language in Chapters 6 through 9, and saw how to apply this tool
to a practical, real-world problem in Chapter 10.
This chapter addresses, broadly and generally, how to design a language.
This in itself warrants an entire book and a thorough knowledge of software his-
tory, in order to avoid repeating mistakes. Here, I will simply talk about four
important issues:

• Determining the goals of your language design

• Deciding on the semantics of your language

• Deciding on the syntax of your language

• Documenting your language

Determining Your Goals


Your first step in language design should be goals definition. This is primarily
identifying the audience of users who will use your language. You may wish to
design a language for yourself, purely for fun; or you may have a genuine prob-
lem to solve for a client, such as how to represent a set of logic statements as
data, as demonstrated in the preceding chapter.
267
Chapter 11

It is unlikely but possible that you may have some ideas for more productive
general programming of .NET applications. In this case, you need to keep your
audience in mind. It's not enough for you to be more productive with a unique
language; you need to convince your prospective audience that they, too, will be
more productive with your solution. This is an almost impossible task. Managers
call it a "people" task. I call it social engineering.

How Hard Is It to Overcome


Network Externality?
The Dvorak typewriter keyboard was invented during World War II by an
inventor who recognized correctly that the existing arrangement, which per-
sists today on computer keyboards, makes the most common letters easily
accessible to most of us who have two hands, but also makes typing slower.
The original keyboard was arranged to avoid jams, so that fast operators of
the machines of the 1890s would not cause two typefaces to arrive at the
paper at the same time.
But by the time the Dvorak keyboard was introduced, a more than critical
mass of typists had jobs only because they tested out at high rates of speed
on the existing equipment, and they were not about to retrain on the Dvorak
typewriter! 1YPing was brutally difficult to learn well (Tennessee WIlliams'
play The Glass Menagerie shows the misery of a poor Southern gentlewoman
being forced to learn it in business school), and the typists did not want to go
through the pain allover again.
As a consultant, Charles Moore, the author of the Forth language (which is
based on RPN) insisted that his prospective clients allow him to write applica-
tions and all tools in this language. If they said no, he was willing to reject the
work. Few programmers have Moore's combination of willingness to part with
opportunities and willingness to maintain, in effect, his own infrastructure.
In fact, during the 1950s and 1960s, a critical mass of programmers did notice
the applicability of Polish logic to programming, and many modern procedural
languages, including C, rely on the stack, of Polish logic, to run. However, they
cover up this fact by using infix, "normal" syntax at the level of source code.
Of course, if we lived in a World State, in which children at a young age were
ripped from their mother's arms and taught the Dvorak typewriter and Polish
logic, then both the Dvorak typewriter and Forth would be universally used.
Fortunately, this is not the case.
Because we are different, there is no one solution for general programming-
not even the C language, which is showing its age and is unsafe for the general
programming of simple applications.

268
Language Design: Some Notes

However, you may well discover that you are more productive using your own
notation in general, procedural programming. For example, highly skilled but
dyslexic programmers have been known to secretly use system facilities to
create a comfortable environment, without desiring to make their different
abilities the norm.l
The non-Dvorak keyboard, and the prevalence of operating systems like
Windows (which cause purists to shudder and gag) are both examples of the
network externality, in which success is reinforced as long as your product fits
into an existing technical infrastructure. As far as the "ideal computer system"
is concerned, be well advised that your typical user might be a poor Southern
gentlewoman in reduced circumstances forced to put up not only with your
ideal but also with you. The acceptance of something new in anyone case is
going to take into account far more than the bits and the bytes. It will also be
based on the way in which the new paradigm fits in the existing network.

The #define capability of the C language can be used to create a completely


different language at the lexical level. But no standard way exists to restore the
program to a standard C without the #define. This means there is no way oflexi-
cally changing the style back to the norm. The problem was the same in PLlI's
large redefinition capability. In general, any compiler writer who provides a macro
facility needs to make one that can be used to change a nonstandard program
back to the norm or to a different, nonstandard style.
A tool like qbScanner (described in Chapter 5)-as long as it conforms to the
language definition at the lexical level of individual operators, identifiers, and so
forth-can be used to make these transformations. If lexical transformation
were indeed flexible, this would reduce the volume of debates over a maintain-
able style. 2

1. Microsoft provides comprehensive facilities for the differentlyabled (just as the blind "see"
that which is hidden from the view of the sighted, the handicapped are differently abled in
that their difference should be added to the sum total of their insight) to use computers, but
not as many to program computers. The SIGCAPH (Special Interest Group for Computers and
the Physically Handicapped) of the American Association for Computing Machinery, and
similar organizations worldwide, address programming for the deaf and other groups, but
mostly when the different ability makes the candipate attractive as a programmer of existing
systems.
2. Program maintainability is another issue that needs to be addressed in relation to differently
abled programmers, because we want to maintain the code written by the differentlyabled.
But in terms of corporate needs, maintainability is often exaggerated by programmers in
search of job security.

269
Chapter 11

But on the whole, it is unlikely that your goal will be to save the world with
a new programming language. Typically, your goal will be humbler, like my goal
to adequately bill switch users for complex calls, described in Chapter 10.
Far from creating a general-purpose language, you may wish to create, for
a user, a language that is deliberately limited, so the user doesn't get into trouble.
The credit Evaluation software described in Chapter 10 is an example of a pro-
gram designed to meet this goal. The user doesn't need to think step-by-step or
procedurally. The system orders the default rule so it appears last, and it watches
for collisions of the benign form (where an applicant is declined because of mul-
tiple rules), the APR form (where multiple annual percentage rates are entailed
by multiple rules and we select the highest), and the fatal form (where the rules
are contradictory).
Such languages have a tendency to disappear into the woodwork; that is, the
business rules are stored as data. An example would be where you discover that
a SQL column's value is sent to a transaction center and parsed, and used to drive
a SELECT statement strangely akin to the interpreter method of quickBasicEngine.
In fact, logic as data may be said to occur when a data field drives a process.
A customer name field does not drive a process-it's just data. On the other
hand, a customer request code field used to select from a large number of
options does drive a process.

Deciding on the Semantics of Your Language


The syntax of a language (both programming and human) consists of the rules for
forming valid constructs. In human language, semantics is the study of meaning; in
programming languages, semantics is the study of the meaning of the program-
for the most part, the program's effect at runtime.
It is necessary to "put the cart before the horse" and make the semantics
decisions first, since these will influence the syntactical decisions.
Here are five major issues involved in making semantic decisions:

• Object-oriented or traditional

• Interpreted, compiled, or both

• Backdoor problems

• Data typing

• Mathematical and logical details

Let's take a look at each of these issues.

270
Language Design: Some Notes

Object-Oriented or Traditional
Your first instinct, in all probability, will be to keep it simple for the end user and
not implement object orientation, because the end user wouldn't understand it.
Well, you need to look deep within yourself, for many times we say the end user
won't understand it, when the truth is that we don't understand it.
In fact, the experience of the early designer (the late Ole-Johan Dahl) of
the prototype object-oriented language, Simula, was that end users found the
object-oriented paradigm far more understandable than the procedural para-
digms of Fortran and Algol, because code was tightly coupled to objects familiar
in the industrial shop floors where the end users worked. Dahl did not babble
on in computerese about new processes and new files, but instead about Simula
proto-objects with a clear relationship to the daily work of the shop floor.
One problem, which I haven't addressed in this book, is the question of how
to develop a compiler and/or interpreter for object-oriented code. In fact, the
object-oriented paradigm itself comes to the rescue. An investigation into the
System namespace of CLR will make it clear that object-oriented approaches are
closed in the benign sense; a closed system is one whose objects combine to form
new members of the same system. An Object is an Object is an object, and in par-
ticular, within an object-oriented compiler within an object-oriented language,
an Object can be represented by an object. Contrast this with traditional develop-
ment, whether of complex MIS programs or compilers.
Entities mUltiply within MIS and compiler development. For each user
object, a table or file is typically created, and the designers focus on its care
and feeding-often in excess of what the user wants. On the other hand, object-
oriented development tries to ensure a one-to-one mapping between the nouns
that the user wants and what we are working on.
Within traditional compilers, there was often a one-to-many, or even many-
to-many, relationship between what the user (here, the application programmer)
wanted and the entities of the compiler. For example, the IBM Fortran compiler
I encountered in 1972 (described in Chapter 1) was divided into 99 phases. These
phases had nothing to do with Fortran per se and instead were necessitated by
the small storage of the machine. The designers had to write special code to
manage the transition between phases.
In an implementation of an object-oriented system like .NET, whether the
proprietary implementation created by Microsoft or the open implementation
created by the Mono organization (see http://www •go-mono. org), many, many
objects need to be created. Using tables would create many more relationships
between the target and the implementation; therefore, the only sensible way to
develop an object-oriented system seems to be with an object-oriented approach.

271
Chapter 11

Interpreted or Compiled
Traditional procedural languages fell into two broad categories: interpreted or
compiled. Languages like Fortran, Algol, and C were meant to be compiled to
efficient object code. But soon after the introduction of these tools, a need was
seen for fast compile times, even at the expense of efficiency, especially in one-
time proto typing in industrial settings and student coding in universities.
An early effort was PUFFT, described in Chapter 1, which compiled to a sort
of bytecode in order to provide Purdue students the ability to get their assign-
ments done on time in a mainframe environment. Another early effort was
Basic, whose first compilers compiled to generally undocumented bytecodes.

NOTE An old joke: How do you debug a C program? Answer: Change your
major. The attraction ofCompSci 101 for Boneheads is that ifyour programs
work. you get an A. while in Literarily Theorizing Jane Austen and Relating It
to Women and Their Lives in the Post-Colonial Era. you need to behave your-
self The attraction is also the downSide-if your code doesn't work. you get an
E Therefore, students demand good turnaround. which PUFFT was the first
to provide.

Uyou want, through your language, to provide fast turnaround and com-
prehensive debugging, the language should be designed with this goal in mind.
Often, but not universally, such languages use a single, weak. type to represent
all data. This way, the debug messages can present data easily. A popular weak.
type is the string, which is the least narrow value object in .NET. But if you
desire efficiency, the language will need strong data types, as described in the
"Data Typing" section coming up soon, because you need to avoid runtime
conversions.
However, the issue of security, if it is one of your issues, complicates this dis-
tinction. Interpreted, weakly typed languages like VBA have been found to host
crude viruses. These aren't anything like the very vicious, industrial-strength
viruses (like SoBig in the summer of 2003), which are coded by extremely knowl-
edgeable, if evil, people. Rather, they are like the Outlook viruses, which used an
innocent and helpful feature to run macros. The Outlook viruses were run by
unsuspecting users in the late 1990s when they opened certain e-mail. These
viruses executed in a pesky and self-replicating fashion. This is why the CLR for
.NET is strongly typed: so that remote platforms can determine what a remote
executable will do, as far as possible.
If your language is interpreted and weakly typed, you need to determine
whether its use will create exposure in the form of crude Trojans, viruses, and
worms, and whether, in its intended environment, this risk is acceptable.

272
Language Design: Some Notes

Backdoor Problems
This section describes some business exposures that can be unintentionally cre-
ated by overeager compiler developers. I call these "backdoor" problems.
One little-known problem can be described as unintentionally (or inten-
tionally) giving away the store. Consider the eval, evaluate, and run methods of
quickBasicEngine (described in Chapter 7). The eval method evaluates an expres-
sion in source form and returns its value. The evaluate method does the same job
using the current settings of quickBasicEngine. The run method acts as if a string
contained a QuickBasic program, and it compiles and interprets the string.
If you plan to sell a language for money, you need to know that providing
this level of functionality will mean that for common use, your users don't need
extra copies of the compiler. Instead, because the compiler runs at interpretation
time, the user simply can build a GUI around your compiler to get a new copy.
In open source, university, and older large-corporate environments, this
anxiety about giving away the code doesn't exist. For example, the Rexx language
for running interpreted code contained a function that executed Rexx source
code. However, at the time I encountered Rexx, its use was restricted to universi-
ties and large companies running the Conversational Monitor System on IBM
mainframes.
Backdoor problems exist whenever language facilities are of one class such that
any object can be explicitly processed (as in Reflection) by another object. They can
have unpredictable effects when programmers discover unintended uses for the
facility, and you should assess their impact if you want to make money. If, on the
other hand, you don't want to make money, this is not a concern.

Data Typing
quickBasicEngine provides a selection of data types current when QuickBasic was
in vogue: Boolean, Byte, Integer, Long, Single, Double, String, Variant, and Array.
The native types supported in the CLR are Boolean, Byte, (l6-bit) Short,
(32-bit) Integer, (64-bit) Long, Single, Double, String, and Object. A strongly typed
language designed for .NET should probably support these data types. The CLR
also supports, on behalf ofC# and C++, the unsigned integer types Unsigned Short,
Unsigned Integer, and Unsigned Long (which will also be supported natively in the
next version of Visual Basic .NET).
Beyond this starter set, you may decide that the user's needs demand a new
primitive type. For example, the hash table Collection of traditional Visual Basic
genuinely replaces the need to create special-purpose code for fast access to
tables.
However, one lesson from the PLII language is still germane: A language can
become bloated with a variety of cool primitive types to the point where there are

273
Chapter 11

so many features that the programmer doesn't know which ones are optimum.
So, the programmer finds a suboptimum subset and cultivates her own personal
style based on the subset, making her code hard to understand and debug.

The Power of Keeping Things Simple and Focused


A parallel problem of feature-bloat occurred in machine design. As more and
more mainframes came onstream in the 1950s, 1960s, and 1970s, manufacturers
naturally added more and more features in the form of complex instructions.
For example, the IBM 360 came with Translate and Translate and Test opcodes,
which appeared to allow the programmer to scan and modify strings quicldy.
But, in many cases, the speed advantage was illusory. On simpler, smaller
machines, the advanced instructions took many cycles because they were
implemented in firmware-as hard-coded instructions that carried out tasks
dictated by the regular opcodes. Furthermore, when IBM introduced virtual
memory in the early 1970s, it found that the Translate and Translate and Test
instructions could induce page faults when the translated string lay across the
boundary between two pages.
The furthest evolution of the tendency was probably Digital Equipment
Corporation's (DEC's) VAX line of minicomputers and mainframes, which pro-
vided an elegant and comprehensive line of instructions. However, it was
discovered here that. as in the case of PLlI, assembler programmers had a hard
time utilizing this embarrassment of riches. It was also discovered that com-
piler writers had a hard time writing code generators that could utilize these
large instruction sets.
The reaction, in the 1980s, was the Reduced Instruction Set (ruSC) movement,
in which entrepreneurial companies, including MIPS and Sun, discovered that
a simpler instruction code (one that, in some cases, even excluded multiply and
divide) allowed compilers to generate fast programs by finding the right combi-
nation of simple opcodes. They also used optimization techniques, including
the simple constant evaluation and degenerate opcode elimination methods
described in Chapter 7.
The discovery was similar to the discovery made by the initial designers of
the C language, who did not have the time or the manpower to add, as part of
the language, all sorts of cool features. Instead, they decided that libraries of
code, either brought in using the call mechanism or included using the pre-
processor's #include command, could, in effect, provide the extended facilities.
What's more, they could be removed and replaced by better code, in a more
modular fashion.

274
Language Design: Some Notes

In fact, one of the charges in the anti-Linux lawsuit filed by seo, a company
that owns a commercial Unix, is that Linux's runtime libraries for e programs
were not scaled up to industrial strength until an abortive partnership between
IBM and seo in 2000. At that time, IBM was able to look at e libraries that had
benefited from 15 years of testing and improvement within AT&T, Bell Labs,
and Lucent. seo maintains that a e-written system can be very different,
depending on the libraries.
The situation in kernel operating system design is parallel. This design focuses
on the basic job of any operating system, which is apportioning resources, such
as computing time and 1/0 facilities. We ordinarily think of an operating system
as something like Wmdows 2000, a vast empire of device drivers, DLLs, APIs, and
fun games. However, in kernel design of the operating system, developers focus,
like the hedgehog of the proverb and not the fox, on one thing. A kernel doesn't
drive devices or expose tools for programmers; instead, it gives processes time
slices and access to resources. Around the kernel, various drivers and GUls (also
known as skins) provide the final computing experience to the end user. But all
of these extras must go through the kernel to get work done. The kernel approach
thus restricts the operating system to the basics and allows itself to be retrofit
with different layers of functionality.
RIse design, the e language's use of libraries, and kernel operating systems
demonstrate the power of keeping things simple and focused, and argue strongly
for a language that provides users with definitional capabilities in place of facil-
ities (like the Collection) that they could code, copy, or buy from others.

Mathematical and Logical Details


Over the years, language deSigners have found that our pre-computer notions
about math have failed to anticipate how a mathematical or logical expression is
actually evaluated. This refers to the labor process of evaluation, which the tradi-
tional mathematicians (before 'furing) regarded as simple and clerical. A well-known
example, to which I have referred in previous chapters, is the semantics of the
Boolean operators And and Or.

Lazy vs. Busy And and Or


In C, logical And is represented by two ampersands, and it has always been "lazy."
If a in a&&b (a And b) is True, then b is not evaluated. If you think about it, since
And requires that a and b be True, evaluating b is a waste of time when a has been
found to be False, in aleft-to-right evaluation.
Also in C, logical Or is represented by two strokes, and it is also lazy. When a
in a II b (a Or b) is True, then b is not evaluated.

275
Chapter 11

In Visual Basic, the semantics, or runtime effect, of And and Or is different,


and the evaluation is "busy." In a And b, when a is False, b is always evaluated. In
a Or b, when a is True, b is still examined. This may not be consistent in all of the
many versions of Basic, but it was the case in Visual Basic as well as QuickBasic,
and it remains so in Visual Basic .NET (despite the fact that the Visual Basic .NET
team follows the Tao, or way, of C).
In fact, prior to March 2001, Microsoft attempted a change to the runtime
effect of And and of Or from lazy to busy, only to be subject to a hue and cry from
users who thought this would make programs hard to convert. Here, the user com-
munity was wrong and the Microsoft team was right, but Microsoft caved in and
changed the semantics of And and Or back to the old way. Fortunately, Microsoft
also had its clever devils on the compiler team implement two very slick operators:
AndAlso and OrElse, which I've talked about in Chapters 3 and 4.
AndAlso and OrElse are lazy and work exactly the same as C's && and II, and,
as I've said in earlier chapters, they should replace all use of And and Or. But, by
now you may ask, "Why is this issue important?"
It's important because, while it's true that for simple variables a and b, both the
lazy and the busy ways of evaluation are equivalent in effect, suppose b is a func-
tion with side effects, such as opening a needed database. If the code containing
the Boolean logic is mindlessly converted (let's say) from Visual Basic to C, the
worst type of bug in the world might occur: a bug unnoticed until it is too late.
Prior to Pascal, for which the busy And and busy Or were consciously selected,
this issue was rather invisible, and at times, it was left to the discretion of the
compiler developers-a bad idea. Today, lazy evaluation pretty much rules the
world; it is standard in C, Perl, JavaScript, and Java.
Because I am such a total dweeb on this issue, I could not resist implement-
ing AndAlso and OrElse operators in quickBasicEngine. 3 To see the compiler effect
oflazy and busy evaluation, run qbGUl.exe (the testing GUI for quickBasicEngine,
introduced in Chapter 7) and enter the following code:

Print False And eval(False)

Recall that the eval function evaluates QuickBasic expressions by creating


a new quickBasicEngine, and observe that this code will take a long time, rela-
tively, to evaluate both sides of the And operator. Run the code to see, of course,
a 0, which is how unformatted False appears in a normal Print statement. In
a More view, click the Zoom button of the RPN box to see the Nutty Professor
assembler language that appears in Figure 11-1.

3. This departs from an exact implementation of QuickBasic, but at this writing, the compiler
isn't standard in all respects, anyway.

276
Language Design: Some Notes

1 opRem 0: ***** Print False And eval(False)


2 opPushLiteral 0: Push the False
3 opPushLiteral Strinq: vtStrinq ("False") : Function
parameter 1
4 opEval 0: Evaluate the Quickbasic expression (
liqhtweiqht) II False II
5 opAnd : Rep~ace stack(top) by opAnd(stack(top-l) ,
stack ( top) )
6 opPushLiteral Strinq:vtStrinq(ChrW(13) & ChrW(10»:
Terminate print line
7 opConcat : opConcat(s,s): Replaces stack(top) and
stack (top-1) with stack(top-l)&stack(top)
8 opPrint : opPrint(x}: Prints (and removes) value at
top of the stack
9 opEnd : Generated at end of code

Figure 11-1. Assembly language for a busy And

Notice that line 2 pushes False, and line 3 pushes the string "False" for
evaluation by the opEval opcode in line 4, despite the fact that the eval is always
unnecessary. Of course, the False value that is pushed in line 2 could be a vari-
able, a function, or a subexpression. Likewise, the eval could be far more complex
and time-consuming; however, it is always evaluated.
Next, change the And operator to AndAlso, and compile and/ or run the code.
Then click the Zoom button of the RPN box to see the Nutty Professor assembler
language that appears in Figure 11-2.

1 opRem 0: ***** Print False AndAlso eval(True)


2 opPushLiteral 0: Push the Fa~se
3 opDuplicate : AndAlso: duplicate stack and skip RHS
when LHS is False
4 opJumpZ 7: opJumpZ(n): Jumps to location when stack
(top) =
0 (pop the stack top)
5 opPushLiteral Strinq:vtStrinq("True"): Function
parameter 1
6 opEval 0: Evaluate the Quickbasic expression (
liqhtweiqht) "True"
7 opLabel "LBLl": AndAlso jump tarqet for False
8 opPushLiteral Strinq:vtStrinq(ChrW(13) & ChrW(10»:
Terminate print ~ine
9 opConcat : opConcat(s,s): Replaces stack(top) and
stack (top-l) with stack(top-l)&stack(top)
10 opPrint : opPrint(x): Prints (and removes) value at
top of the stack
11 opEnd : Generated at end of code

Figure 11-2. Assembly language for a lazy AndAlso

277
Chapter 11

In this example, line 2 still pushes False because constant folding-the


replacement of constant expressions described in Chapter 7-is not in effect.
Then line 3 uses the opDuplicate opcode to make a copy of the value at the top of
the stack, and executes opJumpZ to both test and remove the value at the top of the
stack. The opJumpZ operation transfers control to the label at line 7 if the top of the
stack is zero or False, and thereby avoids the evaluation when it is unnecessary.
In the challenge exercise for this chapter, you will repeat this experiment for
busy Or and lazy OrElse.

Floating-Point Math

Another issue can be floating-point mathematics. My recommendation in this area


is that you implement an open standard such as that of the Institute of Electrical
and Electronics Engineers UEEE), if you expect mad scientists, disturbed engineers,
or Nutty Professors to use your language (see http://'tMW.research.microsoft . coml
~hollaschl cgindexl coding/ieeefloat. html). Of course, if your platform already
implements this important standard, you don't need to worry about it. However,
if you are writing any sort of retargetable compiler, this can be an issue.

NOTE Early compilers forced many users to become numerical analysts in


spite of themselves in order to predict how their code would evaluate expres-
sions, and mere humility would probably declare that it's unlikely that your
implementation will be more useful than an open standard.

String Handling

Another form of complexity is in string handling. You should probably decide


how long strings may be. Basically, if you use a "sentinel" character, as does the
C language to delimit strings (standard C strings are delimited by the ASCII null
character, which has the value zero), you've decided that the sentinel character
cannot be a member of the string. This may hurt the character's feelings. Far
more important, it means that a large number of strings (strictly speaking, an
infinite number of distinct strings) cannot be represented in your language.
However, the alternative also is a limitation, and, interestingly, it means that
a very large but finite set of strings is representable by your language. This
approach allocates a separate number to hold the string length. In the case of
Visual Basic, this number is a 4-byte unsigned integer and capable of represent-
ing strings up to 2"32-1 characters long.

278
Language Design: Some Notes

Another mistake in string handling is being ASCII -centric. C originally made


provisions for only the ASCII character set, which in fully extended form sup-
ports only 256 characters and is inadequate for many world languages. Note that
XML did not make this mistake and allows full Unicode representation. Visual
Basic has been repaired in this regard; its older Asc and Chr functions (which
return the numeric value of a character and the character value of a number in
the range 0.. 255, respectively) have been replaced by AscW and ChrW, which work
for double-byte characters.

Deciding on the Syntax of Your Language


Syntax decisions generally follow semantic decisions. If, for example, you have
decided on a procedural language, there are strong arguments in favor of making
it look like C. The developers of Java, Perl, and many other languages have used
C as their basis. An alternative is to use the less "friendly" syntax found in the
Ada and Eiffellanguages, which are marginally "harder" to code because their
designers were concerned with correct mission-critical code. 4
C's syntax and semantics have numerous flaws. C encourages overly terse
coding styles, and it uses delimiters and opcodes in ways that, at least in the
past, were unique to C. An example is the C operator that consists of the ques-
tion mark and colon in two different places, as in a?b: c, which returns b when a
is True (that is, evaluates to a number other than 0) or returns c when a is False.
This operator is like the II f of Visual Basic, with the important difference that ? :
does not evaluate c unnecessarily when a is True, or b unnecessarily when a is
False. IIf evaluates both sides and is easier to read. s
Some combined semantic and syntactical constructs of C should have been
drowned in the bathtub at birth, including C's overly general for statement.
Starting out as the promise of a straightforward For as seen in Visual Basic's For
and the Fortran Do, the for in C (as in for (intlndeXl = OJ intlndexl < intLimitj
intlndexl++» suddenly and without warning allows you to code a Do. That is, the
second semicolon-separated clause can be anything. If it returns anything but
zero, the loop starts, and to terminate the loop, the second clause must return
zero. This is asking for trouble because of the weak typing of C, in which numbers
can change, unpredictably, into truth values. But strengthening the type system
can correct this problem.

4. I use quotes because it's news to me that programs should be always easy to write. What's
worth doing well is worth doing slowly, and Ada and Eiffel impose constraints that have been
shown to create better software with less programmer self-abuse.
5. Although I will admit that the ?: operator has its own gnomic charm once you start dweeb-
ing out with it.

279
Chapter 11

To see an illustration of the real difference, bring up qbGUI.exe (the


QuickBasic engine's GUI) and key in the code in Figure 11-3. What will print?

:ot..l c!w..ud NUgCS' Ver5iwn 01 QUKk Bask, a Muosoft ,Jroduct : h~:s.-

LiJllit ~ 10
For I - 1 To Limit
Prine I
Limit - 5
Next I

m e~o a~ IP 30
3/21/2004 6 = 2': 30 PH Runnin9 code e.t Ii' 31
3/21/2004 6; 27: 3'0 PH Running codC! at Ill' 32

Inspect I T... I Ir T.. e""nlloo


I

{Scanned Tcken~

toten'IypeOpera.t:cr OD
t.oke~Un.lgDt!!!dln.u
toll: n'Z'ype:Na..,l.1.nc en :
toten:ypeJ'cienntlet' l..::j
P Repley

Figure 11-3. Testing For in qbGUI

The equivalent for loop in C is this:

for ( i = 1; i <= limit; i++ ) limit = 5

It will print 1 through 5, because the limit can be changed in the loop (which is
nearly universally unsafe practice), and because the second expression in the
semicolon-separated list of expressions in the for loop header is evaluated by
reference in such a way that it reads refreshed values of its operators.
But, if you run the preceding code in qbGUI, Visual Basic 6, or Visual Basic
.NET, the For will print a list of numbers from 1 to 10. It will ignore the change to
the limit.
C's approach is flawed because the Do construct of C already provides this
capability, and missing is the ability to provide a checkable for loop header.
The qbGUI implementation of the rule that for is by value is shown by zoom-
ing and examining the commented assembly language code for the preceding
example, as shown in Figure 11-4.

280
Language Design: Some Notes

1 opRem 0: ***** Limit 10 =


2 opPushLiteral 10: Push numeric constant
3 opPop 1: Assign expression 10 to Limit
4 opRem 0: ***** For I = 1 To Limit
5 opPushLiteral 2: Push the control variable I
6 opPushLiteral 1: ctlVariable-
>ctlVariable,initialValue
7 opPoplndirect : ctlVariable,initialValue-
>ctlVariable
8 opNop 0: ctlVariable->ctlVariable,finalValue
9 opPushLiteral 1: Push indirect address
10 opPushlndirect : Push contents of memory location
11 opRotate 1: ctIVariable,finalValue-
>finalValue,ctlVariable
12 opPushLiteral Byte:vtByte(1) :
finalValue,ctlVariable-
>finalValue,ctlVariable,stepValue
13 opRotate 1: finaIValue,ctlVariable,stepValue-
>finaIValue,stepValue,ctlVariable
14 opLabel "LBL1": For loop starts here
15 opForTest 29: Test For condition using the stack
frame
16 opRem 0: ***** Print I
17 opNop 0: Push IValue I contents of memory location
18 opPushLiteral 2: Push indirect address
19 opPushlndirect : Push contents of memory location
20 opPushLiteral String:vtString(ChrW(13) & ChrW(10»:
Terminate print line
21 opConcat : opConcat(s,s): Replaces stack (top) and
stack (top-1) with stack(top-1)&stack(top)
22 opPrint : opPrint(x): Prints (and removes) value at
top of the stack
23 opRem 0: ***** Limit 5 =
24 opPushLiteral 5: Push numeric constant
25 opPop 1: Assign expression 5 to Limit
26 opRem 0: *****
Next I
27 opForlncrement : For loop increment or decrement
28 opJump 14: Jump back to start of For loop
29 opLabel "LBL2": For loop exit target
30 opPopOff : Remove the For stack frame
31 opPopOff : opPopOff(x): Removes stack (top) without
sending it to a memory location
32 opPopOff : opPopOff(x): Removes stack(top) without
sending i t to a memory location
33 opEnd : Generated at end of code

Figure 11-4. Compiling the For statement

Notice in line 13 that, with some pain, we create a stack frame for the For, as
illustrated in Figure 11-5.

281
Chapter 11

ctlVariable Value of ctlVariable

stepValue
finalValue

Figure 11-5. Stack frame for the For statement

Take a look at finalValue. Its value is pushed on the stack in steps 5 and 10,
and this is why the loop will execute ten, not five, times.
Notice that ctlVariable (I) is pushed on the stack in step 5. Note that it is 2,
which is not 1's value but its location. This is because most dialects of Basic (includ-
ing our quickBasicEngine, Visual Basic .NET, and Visual Basic 6) allow change to the
control variable, although this is terrible practice.6
Change the code in Figure 11-3 as shown here:

Limit 10 =
For I =
1 To Limit
Print I
I = 10
Next I

In qbGU1, as well as in Visual Basic 6, Visual Basic .NET, and most versions of
Basic, this code will print I and stop. This is because, as Figure 11-3 shows, the
control variable will be referenced on the stack, not placed on the stack. Of course,
this is what we want if we need to use the variable in the loop, although to change
it is poor practice.
From the standpoint of syntax, we should, as this example shows, stay as close
as possible to the user's natural expectations of the semantics.
A final syntax consideration, seen in the credit evaluation application in
Chapter 10, is whether syntax is important if the users have a GUI that enters
rules. Generally speaking, it remains a good idea to have a documented "serial-
ization" standard for the business rules, to allow both power users and support
personnel to modify the rules in XML or as straight text files.

6. Unlike MIS programmers, compiler writers cannot be dissing code; we need to compile
pathological, if not psycho, programs.

282
Language Design: Some Notes

Documenting Your Language


If your language is actually for procedural programming, it's a good idea to write
at least two documents: a tutorial and a reference manual.
The tutorial should walk the new user through the creation of a set of simple
programs, starting perhaps with the infamous Hello World program, and then
some simple tasks germane to what the beginner wants to do. The tutorial needs
to be thorough enough so that as the tyro reads your article or book, he or she
can be simultaneously running code successfully as a powerful way to reinforce
your lessons and stay awake.
The job of the reference manual is very different and often neglected.
Experienced programmers tend to get through just enough of a "for dummies"
tutorial (just the Hello World), and then think they "grok" enough to start doing
what they really need to do to complete a job. These aren't folks in week-long or
semester-long computer classes. Rather, they have maybe a day to get up to
speed, and they often want to do their own projects in your language. They need
a comprehensive reference manual, rather than a partial tutorial.
The reference manual should include the formal, BNE definition of the lan-
guage, as described in Chapter 4. Surprisingly, this was never done for Visual
Basic before .NET. If it's possible that some of your audience won't understand
pure BNE you can use the bnfAna1yzer program to transform the BNF into a list
of the language nonterminals, a list of the terminals, and a list of the BNF rules
in an outline form, again as shown in Chapter 4.

Summary
This chapter addressed four important issues regarding language design: the
goals, the semantics, the syntax, and the documentation. The bottom line of all
these considerations is that you need to write the language reference manual
before writing the compiler. If you have a specific target audience in mind, host
a tea party, bun fight, or conference to get them to buy into your goals. Or, more
sensibly, you can just create the new language, unleash it on the Internet, and be
damned; in fact, this is how many useful new languages were created.
The lesson of the failure of Esperanto, an attempt to design a global language,
is applicable. In practice, programming languages behave like real languages,
with dialects, extensions, and pidgins proliferating. The Algol team attempted to
do it right according to the Eurocentric and social-engineering notions popular
in Europe and in American universities in the 1950s. They hosted any number of
international bun fights and meetings, only to discover (as have social reformers
throughout history) that actually getting people to "do it my way" is hard, if not
impossible.
The founder of modem Columbia and Venezuela, Simon Bolivar, compared
revolution to plowing the sea. Many programming managers find that managing

283
Chapter 11

programmers is like herding bobcats. Programming language design is difficult


for the same reason.
Indeed, the best way to be successful in this venture is Taoist in the sense
that you, like water, just follow your instincts with no expectation of riches or
fame. It isn't true that the best languages were designed in this way. C has defi-
ciencies directly related to the humility of its designers, and the development of
Algol was aborted by the marketplace, but might have worked. However, if we
follow the rule that we are happiest when we do what we want, then no one in
his right mind would ever want to create another Algol-another massive social-
engineering effort to get programmers to code one way.

Challenge Exercise
Repeat the experiment we did in the "Lazy vs. Busy And and Or" section of this
chapter with lazy and busy Or. Compile a Or eval(Ub U) to determine what will
happen when a is True and confirm that this will evaluate the eval. Then compile
a OrElse eval(Ub U) to confirm that this will not unnecessarily evaluate the eval.

Conclusion
I never blame myself when I'm not hitting. I just blame the bat, and if it
keeps up, I change bats. After all, if I know it isn't my fault that I'm not
hitting, how can I get mad at myselfl
-Yogi Berra

Many programmers, having learned on the job, are curious about computer "sci-
ence." This book, I hope, has motivated you to use .NET to investigate an area of
computer science unexplored by many programmers.
On September 11, I was appalled by the unprecedented loss oflife. I was
also saddened a few months later when one of the FBI field agents assigned to
tracking the highjackers reported in Congressional testimony that she had no
way to enter simple Boolean queries of the form terroristAssociation AndAlso
attendsFlightSchool. The separate queries were pOSSible, but their Boolean
combination was not, according to the FBI whistle blower, Colleen Rowley.
Had the system been anyone of a large number of mainframe or network-
based systems, it would be, as far as I can tell, simple for a programmer to
develop such queries by defining the BNF of the additional queries using the
techniques in Chapter 4, developing a scanner for identifiers and operators
using the techniques in Chapter 5, and developing a recursive-descent parser as
described in Chapter 7.

284
Language Design: Some Notes

However, the attitude that such techniques are rocket science seems to have
been a minor contributing factor in a tragedy, and if at a minimum, I can show
a proactive approach, I am more than satisfied.
On a more positive note, I feel confident that your new knowledge of the
DNA of computer science, indeed, the way it propagates, gives you a better sense
of how your source code actually runs and illuminates some of the darker comers
oftheCLR.
If you decide to write a production .NET compiler, I urge you to get your
hands on a copy of Aho, Sethi, and Ullman's "dragon book" (Compilers: Principles,
Techniques and Tools), to which I have referred more than once in this book.
That's because I've only scratched the surface and got you started, in the way we
programmers get started: hands-on examples.
When I started out, developing compilers was rocket science. In 1970, com-
piler developers were not in all cases fully aware of how choices made by the
coders of compilers (such as how to evaluate a Boolean operator) were not mere
crotchets and conveniences, but became part of the reality of the compiler. But
many years of intense development in the Unix world under the long-gone
corporate sponsorship of the former AT&T monopoly taught a generation of
programmers how compilers work and are best constructed. I have meant little
disrespect by characterizing these characters as gnomes of Unix (I meant some
disrespect, because that is healthy) .
.NET developers have, in their own quiet way, absorbed the lessons learned,
most especially the value of open standards in particular and glasnost in general.
You can find, for example, a large amount of useful source code in the .NET
releases, including a full C compiler. Partly due to the surprising success of
Linux, more and more products are available as source code, and this trend
will make compiler and parser development a growth field in the future.
In this book, I've shown you an extremely Basic approach towards compiler
design theory. I do not want to give the impression that this is all you need to
know. However, I have seen the power of a low-level, grassroots parser in simpli-
fying a genuine user problem, and this motivated me to write this book.
The technology we use every day should not be a sort of mystery accessible
only to a temple priesthood; this has always tended to retard and even reverse
progress. Although we need to use each other's production, it is nevertheless
good to know how things work. I demur from the Dilbert philosophy, that we
should not worry our pretty, little heads about what goes on under the hood of
society or its technology, and instead take our anger out on hard-working middle
managers for doing their rather thankless job (of herding polecats and losing
golf games with the CEO). In fact (and as Krishna admonishes Arjuna in the
qbGUI Easter egg), knowledge is freedom, for the man or woman who knows the
relations between the forces of nature is no longer their slave. A compiler,
although a mathematical artifact, is part of nature. The rest is television.

285
APPENDIX A

quickBasicEngine
Language Manual
Then anyone who leaves behind him a written manual, and likewise anyone
who receives it, in the belie/that such writing will be clear and certain, must
be exceedingly simple-minded.
-Plato

Plato was wrong. The attitude expressed has caused a lot of mystification and a lot
of damage. French philosopher Jacques Derrida has shown how Plato's preference
for speech over writing (which includes as a sub case the automatic preference for
tutorials "for dummies" over reference manuals) runs through our culture as a pre-
sumption that results in prejudice against a well-meaning reference manual.
But because real programmers (who Plato might consider Sophists) prefer
reference manuals for many purposes, this appendix forms the comprehensive
reference manual for the programming language that is actually supported by
quickBasicEngine. This appendix describes the low-level lexical syntax supported
by quickBasicEngine, the keywords and system functions of this language, and
the parser syntax in Backus-Naur Form (BNF). It then identifies each of the built-
in functions supported by quickBasicEngine.

NOTE The language of quick Basic Engine supports only a subset of the
QuickBasic language, with extensions including the AndOr and OrElse
operators. Also, QuickBasic remains, as a name and as a product, the
intellectual property of Microsoft. QuickBasic, in other words, refers to
the language that was supported by Microsoft's QuickBASIC for MS-DOS
and Windows. quickBasicEngine (expressed in camelCase) refers to the
.NET object that supports a dialect of QuickBasic, where a dialect of a
language is a language that overlaps it, containing most of its features
(but not necessarily all) and extensions.

287
Appendix A

Lexical Syntax
Input for quickBasicEngine consists of the string containing either an executable
program or expression. This string may contain blanks, tabs, and tokens. Outside
strings and comments, blanks and tabs are ignored. This string may consist of
multiple lines, and line breaks (see the newline token in Table A-I) are signifi-
cant. There is no limit on the length of a line.
Table A-I lists the supported token types.

Table A-l. Token TYpes Supported by quickBasicEngine


Token Notes
Identifiers Identifiers must start with a letter but may contain digits, the
underscore, and letters. There is no limit on the length of identifiers.
Some tokens, including Mod (division remainder), have the form of
identifiers but are recognized later by the parser as operators.
Operators The operators supported are +, -, *, I, \ (integer diviSion). Note
that the Mod operator is, from the point of view of lexical syntax,
an identifier.
Apostrophe The single quote is recognized as a separate token.
Ampersand The ampersand is recognized as a separate token.
Numbers Numbers may be integers with or without a leading plus or sign,
or floating-point numbers in the form <sign> <mantissainteger>
. <mantissaDecimal> (eIE) <exponentSign> exponent.
Strings Strings must be surrounded by double quotes. If they contain
double quotes, the inner double quotes must be repeated once.
Note that in addition to straight double quotes, Word "smart
quotes" may also be used.
Newline A logical newline separates distinct statements. This is either
a colon (allowing multiple statements to occur on the same line)
or a carriage return and linefeed (or a linefeed by itself) not
preceded by a space and an underscore.
Parentheses The left and right round parentheses characters.
Semicolon
Percent sign
Exclamation point
Pound sign
Dollar sign
Period

288
quickBasicEngine Language Manual

Keywords and System Functions


Table A-2lists the names that cannot be used in source code to identify data
because their meaning is predetermined. Note the following:

• You may actually be able to get away with using these names in certain con-
texts because these names are checked in certain contexts and not others.

• Some names might be problematic even though they do not appear in this
list. This applies to names not listed, but, like Option, perform a syntax role.

TIP The best policy is to use Hungarian names that start with an abbrevia-
tion (normally three characters long) for all data. Languages, including Basic,
that rely on keywords with identifier syntax have a slight inherent ambiguity
because the identifier syntax overlaps that of the keyword.

Table A-2. quickBasicEngine Keywords and System Functions

Abs And AndAlso Apostrophe As Asc


Boolean ByRef ByVal Byte Ceil Chr

Circle Colon Comma Cos Data Dim

Do Double Else End EndIf Eval


Exit False Floor For Function GoSub
GoTo If Iif Input Int Integer
Isnumeric Lbound Lcase Left Len Let

Like Log Long Loop Max Mid

Min Next Not Or OrElse Print

Randomize Read Rem Replace Return Right

Rnd Screen Sgn Sin Single Step

Stop String Sub Tab Then To

Trace Trim True Until Ubound Ucase

Until Variant While Wend

289
Appendix A

Parser Syntax (Backus-Naur Form)


The following shows the BNF of the quickBasicEngine.

NOTE Over and above the standard disclaimer of warranty concerning my


compiler as a whole, which is pretty sleazy but necessary in the time avail-
able, I should also mention that a downside of the manual method of
production of parsers is that I may have made mistakes such that the follow-
ing syntax fails to correspond to the parser. Therefore, the following reference
material may contain errors. Please let me know ifyoufind errors, and I will
fix them and make the corrections available from the Downloads section of
the Apress Web site (http://'fMW . apress. com). I'm sorry to say that I can't
afford to offer you a reward for your help, besides my acknowledgment at the
Website.

, --- Compiler input


compilerInput := sourceProgram I immediateCommand
, --- Immediate commands
immediateCommand := singleImmediateCommand (":" singleImmediateCommand)*
singleImmediateCommand := expression I explicitAssignment
, --- Source programs
sourceProgram := optionStmt
sourceProgram := sourceProgramBody
source Program := optionStmt 10gicalNewline sourceProgramBody
optionStmt := Option ( "Base" ("0" I "1") ) I Explicit I Extension
sourceProgramBody := ( openCode I moduleDefinition ) +
openCode := statement [ 10gicalNewline sourceProgram ]
10gicalNewline := Newline I Colon
statement := [UnsignedInteger I identifier Colon ] statementBody
statementBody := ctlStatementBody I unconditionalStatementBody I
assignmentStmt
ctlStatementBody := dim I
doHeader
else I
endIf I
forHeader
forNext I
ifl
whileHeader
loopOrWend
unconditionalStatementBody := circle I
comment I
data I

290
quickBasicEngine Language Manual

end I
exit I
goSub I
goto I
input I
print I
randomize
read I
return
screen
stop I
trace
, --- The statements
, Assignment
assignmentStmt :~ explicitAssignment I implicitAssignment
explicitAssignment := Let implicitAssignment
implicitAssignment :~ IValue "~" expression
IValue := typedIdentifier [ "(" subscriptList ")"
subscript List :~ expression [ Comma subscript List
, Circle
circle := Circle ( expression Comma expression) Comma expression
, Comment: note: NoNewLine is text that does not contain a newline
comment := Rem NoNewLine
comment :~ Apostrophe NoNewLine
comment :~ EmptyLine
, Data statement
data :~ Data constantList
constant List :~ constantValue [ Comma constant List ]
constantValue :~ number I string
number := [ sign ] unsignedNumber
sign := "+11 III_II

unsignedNumber := UnsignedInteger I UnsignedRealNumber


integer :~ [ sign ] UnsignedInteger
, Dim
dim :~ Dim dimDefinition
dimDefinition :~ identifier [ ( bound List ) ] [ asClause ]
asClause := As typeName
typeName := Boolean I Byte I Integer I Long I Single I Double I String I Variant
bound List := bound [ Comma bound List ]
bound :~ integer [ To integer ]
, Do loop header
doHeader :~ Do [ doCondition
, Do loop closure
do Loop :~ Loop [ doCondition

291
Appendix A

, Do condition
doCondition := While I Until expression
, else
else := Else
, End statement
end := End ' (followed immediately by newline)
, endIf
end If := End If
end If := EndIf
exit := Exit [ Do I For I While]
, For header
forHeader := For lValue "=" expression To expression [ Step expression ]
, For next
forNext := Next lValue
, GoSub
goSub := GoSub (UnsignedInteger I identifier I expression
, GoTo
goto := GoTo (UnsignedInteger I identifier I expression )
goto := UnsignedInteger
'If
if := If expression [ Then ] unconditionalStatementBody
if := If expression Then
, Input
input := Input lValueList
lValueList := lValue [ Comma lValue
, Loop or Wend
100pOrWend := Wend I ( Loop [ whileUntilClause ] )
whileUntilClause := ( WHILE I UNTIL ) expression
, Print
print : = Print expression List [ ";"
expression List := expression [ Comma expression List
, Randomize
randomize := Randomize
, Read data
read := Read lValueList
, Return from a GoSub
return : = Return
, SCREEN n command (does nothing)
screen := Screen UnsignedInteger
, Stop
stop := Stop
, Trace
trace := "Trace Push"
trace : = "Trace Off"

292
quickBasicEngine Language Manual

trace := "Trace Text"


("Source" I"Memory" I"Stack" I"Inst" I"Object" I"Line" I
Unsignedlntegerl
"NoBox")*
trace .- "Trace Headsup" ("Inst"I"Line"IUnsignedlnteger)*
trace .- "Trace HeadsupText"
( "Source" I"Memory" I"Stack" l"Inst" l"Object" I"Line" I
Unsignedlnteger I "NoBox" )*
trace := Trace Pop
, While loop header
whileHeader := While expression
, Do loop closure
wend := Wend
, --- Expressions
expression := orFactor [ orOp expression]
orOp := Or
orOp : = OrElse
orFactor := andFactor [ andOp orFactor ]
andOp := And
andOp := AndAIso
andFactor := [ Not] not Factor
notFactor := likeFactor [notFactorRHS]
notFactorRHS := Like likeFactor [notFactorRHS]
like Factor := concatFactor [likeFactorRHS]
likeFactorRHS := "&" concatFactor [likeFactorRHS]
concatFactor := relFactor [concatFactorRHS]
concatFactorRHS := relOp relFactor [concatFactorRHS]
relFactor := add Factor [ relFactorRHS ]
relFactorRHS := relOp reI Factor [ relFactorRHS
add Factor := mulFactor [addFactorRHS]
addFactorRHS := mulOp mulFactor [addFactorRHS]
mulFactor := powFactor [muIFactorRHS]
mulFactorRHS := powOp powFactor [muIFactorRHS]
powFactor := ("+" I "-")* term
term := unsignedNumber I
string I
IValue I
True I
False I
functionCall
( expression
functionCall := functionName "(" expression List ")"
functionName := Abs I Asc I Ceil I Chr I Cos I Eval
Evaluate I Floor lInt I

293
Appendix A

Iif Isnumeric Lbound 1 Lcase 1 Left 1 Len


Log Max 1 Min Mid 1 Replace 1 Right 1 Rnd
Run Sin 1 Sgn String 1 Tab 1
Trim 1 Ubound 1 Ucase
unsignedNumber:= ( UnsignedRealNumber 1 UnsignedInteger )
[ numTypeChar 1
typed Identifier := identifier [ typeSuffix 1
type Suffix := numTypeChar 1 CurrencySymbol
numTypeChar := PERCENT 1 AMPERSAND 1 EXCLAMATION 1 POUNDSIGN
identifier := Letter LettersNumbersUnderscores
string := DoubleQuote AnythingExceptDoubleQuote DoubleQuote
relOp := "(" I ")11 I 11=11 I 11<=" I 11>=" I "=" I "<>"
addOp := "+"1"-"
mulOp := "*"I"I"I"\"I"Mod"
powOp := "**"1"1\"
, --- Subroutines and functions
moduleDefinition := subDefinition 1 functionDefinition
subDefinition := Sub identifier [ formalParameterList 1
openCode logicalNewline "End" [ "Sub" 1
functionDefinition := Function identifier [ formalParameterList
openCode logicalNewline "End" [ "Function" 1
formalParameterList := ( formalParameterListBody )
formalParameterListBody := formalParameterDef
[ "," formal Parameter List Body ]
formalParameterDef : = [ ByVal 1 ByRef 1 identifier [" 0" 1 asClause

Built-In Functions
Table A-31ists the quickBasicEngine built-in functions and describes their use.

Table A-3: The quickBasicEngine Functions


Function Description
Abs(n) Returns the absolute value of the number n, which is n when n is
greater than or equal to 0, or -n when n is less than zero.
Asc(c) Returns the numeric code of the ASCII character c.
Ceil(n) Returns the smallest integer that is greater than or equal to n.
Note that when n is negative, this will still return the smallest
integer greater than or equal to n; for example, while ceil(2.os)
is 3, ceil( -2.05) is -2.
Chr(n) Returns the character with the ASCII value in n as a string.
Cos (x) Returns the cosine of x.

294
quickBasicEngine Language Manual

Table A-3: The quickBasicEngine Functions (continued)

Function Description
Eval(s) Evaluates the string s considered as an expression that is
acceptable to quickBasicEngine. The evaluation is performed
using the default properties of the quickBasicEngine class.
Evaluate(s) Evaluates the string s considered as an expression that is accept-
able to quickBasicEngine. The evaluation is performed using the
properties of the quickBasicEngine instance performing the
Evaluate function.

Floor(n) Returns the largest integer that is less than or equal to n. Note
that when n is negative, this will still return the largest integer
greater than or equal to n; for example, while floor(2.05) is 2,
floor( -2.05) is -3.
Iif(a,b,c) Evaluates the expression in a. If the value is True (any number
other than zero), returns the value of the expression in b. If the
value is False (0), returns the value of the expression in c. Note
that whether a is True or False, this function will fully evaluate
the b and the c expressions.

Int(n) Returns the value of n, rounded to the closest integer.


Isnumeric(s) Returns True when s is any number; False otherwise.

Lbound(a) Returns the lower bound of the array a.


Lcase(s) Converts the string in s to lowercase.

Left(s, n) Returns the substring of n characters in s commencing at


position 1.
Len(s) Returns the length of the string s.
Log(s) Returns the natural logarithm of 5.

Max(n, m) Returns the larger of n and m


Mid(s, n, L) Returns the substring of characters in s commencing at position n
for a length of L If L is omitted, returns the substring of characters
in 5, commencing at position n and proceeding to the end of the
string.

Min(n, m) Returns the smaller of n and m.


Replace(s, t, r) In the string s, replaces all occurrences of the target string twith
the replacement string r.
Right(s, n) Returns the substring of n characters in s commencing at position
Len (s) - n +1 and ending at the end of s.

295
Appendix A

Table A-3: The quickBasicEngine Functions (continued)


Function Description
Rnd Returns a random value in the interval 0.. 1.

Run(s) Where s consists of one or more executable commands, this func-


tion runs these commands. Their output, if any, will consist of
print Events. These print Events will be available to the code that
executes the Run function. In addition, the value of this function
will be the value it leaves on the stack. If the stack on exit from this
function is empty, the value of this function will be Nothing. The
execution will use the options and settings of the quickBasicEngine
instance that runs the Run command.
Sin (x) Returns the sine of x.
sgn(x) Returns the "signum" of x, where the signum of x is 1 when x> 0;
the signum of x is 0 when x = 0; and, the signum of x is -1 when
x<O.
string(c, n) Creates n copies of the character c.

Tab Returns the tab character.

Trim(s) Removes trailing and leading spaces from the string s.


Ubound(a) Returns the upper bound of the array a.

296
APPENDIX B

quickBasicEngine
Reference Manual
We have to be simple simply for lack of time
-Jacques Derrida

This appendix documents the properties and methods (known jointly as the pro-
cedures) exposed by quickBasicEngine, as well as its references, with the exception
of the utility OILs: utility, windowsUtilities, collectionUtilities, and zoom. Full
documentation of the utility OILs is available in the source code for these tools.
We should be as simple as possible, but no simpler (as AI Einstein said) in
the time available, as the French philosopher Jacques Oerrida implies.
This is the original design document for the compiler. When I sit down to
code, I first write a design document. It was kept up to date while coding, and
even after flooding my laptop with a Starbuck's vente.
This document describes the standards followed by each class and the pro-
cedures exposed by each class. The following classes are described:

• qbOp

• qbPolish

• qbScanner

• qbToken

• qbTokenType

• qbVariable

• qbVariableType

• quickBasicEngine

297
AppendixB

This appendix is useful if you need to understand the compiler in detail or


use its components. For example, the qbScanner object can be used to scan any
language that is lexically the same as the language supported by quickBasicEngine.

Class Standards
Properties of each class start with an uppercase letter; methods start with a low-
ercase letter. Any method, which does not otherwise return a value, will return
True on success or False on failure.
All classes except qbOp and qbtokentype have state in the form of variables in
General Declarations that persist between procedures but which goes away when
the class is destroyed. Stateful (as opposed to stateless) classes can be usable or
not usable.
During the execution of the constructor procedure for the stateful class, it is
unusable. On successful completion of the constructor, the class object instance
becomes usable, and it remains usable until the class is disposed (or otherwise
terminated) or a serious internal error is found. Serious internal errors include
bugs in the code of the object, whether from errors in the original code or through
modification, and "object abuse" (the use of the object after a serious error has
been discovered and reported). When the object is not usable, most Public prop-
erties and methods will report an error when called and return a suitable default
value.
All classes implement an informal interface known as the core methodology.
It is informal because classes don't implement a file containing procedures in the
methodology; instead, they tend to implement the core procedures shown in
Table B-1 consistently.

Table B-1. Class Core Procedures


Procedure Description
About Shared, read-only property implemented to provide information
about the class.
ClassName Shared, read-only property implemented to provide the name of the
class.
dispose Method implemented to cleanly dispose all reference objects in the
object state and mark the object as unusable.
inspect Method implemented to test a series of assertions about object state,
and to raise an error condition and mark the object unusable when
any assertion fails.
mkUnusable Method implemented to mark the object as not usable.

298
quickBasicfngine Reference Manual

Table B-1. Class Core Procedures (continued)


Procedure Description
Name Read-write property implemented to assign and return an object
instance name for identifying the object on debugging reports and
elsewhere. By default, the object name will be classNamennnn date
time, where nnnn is the object sequence number.
object2XML Method implemented to return the state of the object as anXML tag.
toString Method implemented to serialize part of the object's state and value.
Tag Read-write property implemented to assign and return user data that
is associated with the object in a specific application. Tag can be
a reference object. If so, it is treated as an honored guest by its host
object. While objects in this suite rather remorselessly destroy their
own reference variables when they are destroyed in a form of
electronic suttee, the Tag object isn't destroyed.
test Method implemented by some objects as a self-test. It runs a series of
tests on the object while placing the result in a strReport parameter
passed by reference. In most cases, the test methods will either create
an internal test object (so that tests do not disrupt the main object) or
provide the option to control this.
Usable Read-only property implemented to return True when the object is
usable; False otherwise

The stateless object qbOp is of necessity fully tbreadable; multiple copies may
run in multiple threads, and all procedures are Shared (Static in e# terms). Other
than the qbOp object, the other objects are serially threadable. Multiple copies
may coexist in parallel threads, but the same copy cannot run more than one
non-Shared method in the same thread. quickBasicEngine is stateful but fully
threadable.
Each serially threadable object organizes its state into a structure with the
name TYPstate and an instance of the TYPstate called USRstate. quickBasicEngine,
because it is fully threadable, organizes its state into an OBJstate object, which
contains the USRstate. This makes it much easier to lock the state using Synclock.
Note that each method that doesn't otherwise need to return a value is none-
theless coded as a Boolean function, and returns True on success and False on
an error. Although this standard produces, at times, some strange code (such as
functions that always return True), it is maintained for consistency.

299
AppendixB

qbOp
The qbOp stateless class identifies the operators supported by the non-CLR Nutty
Professor machine as a large enumerator, and it provides Shared conversion
tools for enumerator values.
qbOp includes references to utilities.DIl...
qbOp is stateless and is fully threadable. Multiple instances can run simulta-
neously in multiple threads, and multiple procedures may be executed in the same
instance in multiple threads.

Properties and Methods of qbOp


Table B-2 lists the properties and methods of the qbOp class.

Table B-2. qbOp Properties and Methods


Property/Method Description
Public Shared ReadOnly Shared, read-only property that returns
Property About As String information about the class.
Public Shared ReadOnly Shared, read-only property that returns the
Property ClassName As String class name qbOp.
Public Shared Function isJumpOp Shared, read-only property that returns True
(ByVal enuOpcode As ENUop) when the operator is a jump operator; False
As Boolean otherwise. Used to detect operators that include
a label that the assembler must resolve.
Public Shared Function Shared method that returns the opcode
opCodeFromString(ByVal enumerator for the opcode, where strOpcode is
strOpcode As String) As the case-independent op name.
ENUop
Public Overloads Shared Shared method that returns the opcode's
Function opCodeToDescription description, where strOpcode is an opcode
(ByVal strOpcode As String) specified as a string. The description will be in the
As String format op (template): text. The template describes
what the opcode requires on the stack. See the
"Stack Template" section following this table.

300
quickBasicEngine Reference Manual

Table B-2. qbOp Properties and Methods (continued)


Property/Method Description
Public Overloads Shared Shared method that returns the opcode's
Function opCodeToDescription description, where enuOpcode is an opcode spe-
(ByVal enuOpcode As ENUop) cified as an opcode enumerator. The description
As String will be in the format op(template): text. The tem-
plate describes what the opcode requires on the
stack. See the "Stack Template" section following
this table.
Public Shared Function Shared method that returns the template of
opCodeToStackTemplate(ByVal expected operands for this opcode, where
enuOpcode As ENUop) As String enuOpcode is an opcode specified as an enumer-
ator. See the "Stack Template" section following
this table for a description of the template.
Public Shared Function Shared method that returns the opcode's name
opCodeToString(ByVal only.
enuOpcode As ENUop) As String

Stack Template

The template describes what an opcode requires on the stack. The template is
a string containing the comma-separated list of expected stack values, from
lower down in the stack to the top of the stack. The template is defined inside
the op description statement in the opCodeToDescription method.
Each stack value must be one of the following:

• x: Any qbVariable is permitted at this position.

• s: Any scalar qbVariable is permitted.

• n: Any numeric qbVariable is permitted.

• i: Any numeric integer qbVariable is permitted.

• u: Operator expects the utility stack frame: stack(top) is an operand count:


stack(top+ 1) is the name of a utility: stack(top+n+ l) .. stack(top+2) are the
operands.

• <name>: Where name is the name of one of the values of the ENUvarType enu-
merator, this specifies that the stack value is restricted to the varType.

301
AppendixB

• a: An array index frame is expected at this location, in the form i(l),


i(2) .•. i(n), Count, array, where i(n) is the index at dimension n, Count
is the number of preceding indexes, and array is a qbVariable with the
type array.

qbPolish
The qbPolish class represents one instruction to our non-CLR Nutty Professor
machine.
References of qbPolish are qbOp.DLL, qbVariable.DLL, and utiIities.DLL.
qbPolish is serially threadable. Multiple instances can run simultaneously in
multiple threads, but errors will result if one object's procedures run in multiple
threads and in parallel.

The qbPolish Instruction Data Model and State


The state of this class consists of an opcode, an operand, a comment, an index
back to the source code responsible for the instruction (as stored by the qbScanner
object inside qbParser), and the length of the source code.
The state of each qbPolish instance, which represents one instruction to the
Nutty Professor machine, consists of the following.

• strName: Object instance name

• booUsable: Object usability switch

• enuOpCode: Operation code (see Chapter 8 for a list of the supported opcodes)

• enuOperand: Operand, which should be a .NET scalar

• strComment: Commentary about this instruction, which is set by the


Comment property

• intStartlndex: Start index of the source code responsible for this instruction

• intLength: Length of the source code responsible for this instruction

302
quickBasicEngine Reference Manual

qbPolish Inspection Rules


The following inspection rules are used by the inspect method as a check on
errors in the source code, whether as delivered or as changed, or due to object
abuse in the form of using the object after a serious user error has occurred:

• The object instance must be usable.

• The operation code can't be the Invalid enumerator value.

• The start index corresponding to the operation in the source code must be
I or greater.

• The length of the source code corresponding to the operation must be 0


or greater.

• If the inspection fails, the object becomes unusable.

An internal inspection is carried out in the constructor (after the object con-
struction steps are complete) and in the dispose method (before the reference
objects in the state are disposed of).

Properties and Methods of the qbPolish Class


Table B-3 lists the properties and methods of the qbPolish class.

Table B-3. qbPolish Properties and Methods


Property''''ethod Description
Public Shared ReadOnly Property Shared, read-only property that returns
About As String information about this class.
Public Shared ReadOnly Property Shared, read-only property that returns
ClassName As String the name of the class (qbPolish).
Public Property Comment As String Read-write property that can define and
return comments about the operation
suitable for the assembler listing.
Public Function dispose As String Method that disposes of the object and
cleans up any reference objects in the heap.
This method marks the object as unusable.
For best results, use this method when you
are finished using the object in code.

303
AppendixB

Table B-3. qbPolish Properties and Methods (continued)


Property/Method Description
Public Function inspect(ByVal Method that inspects the object, checking
strReport As String) As Boolean for errors that result from blunders in the
source code of this class or object abuse,
not simple user errors. The report param-
eter should be a string, passed by reference;
it is assigned an inspection report. See the
"qbPolish Inspection Rules" section preced-
ing this table.

Public Function mkUnusable Method that forces the object instance into
As Boolean the unusable state; it always returns True.

Public Property Name() As String Read-write property that returns and can
set the name of the object instance, which
will identify the object in error messages
and on the XML tag that is returned by
object2XML. The name defaults to
qbPolishnnnn date time, where nnnn
is a sequence number.

Public Overloads Function Method that converts the state of the object
object2XML() As String toXML.
Public Overloads Function Optional overload of object2XML that
object2XML(ByVal booHeaderComment controls the commenting of the XML strings
As Boolean) As String that are returned: object2XML(False)
returns XML with no header comment.
Public Overloads Function Optional overload of object2XML that controls
object2XML(ByVal booHeaderComment the commenting of the XML strings that are
As Boolean, ByVal booLineComments returned. The booHeaderComment parameter
As Boolean) As String controls the generation of the block header
comment. The booLineComments parameter
controls the generation of a line of explana-
tory comment for each XML element.

Public Property Opcode() As Read-write property that returns and can


ENUopcode assign the instruction's opcode as one of the
names listed in Chapter 8. See also
opcodeFromString and opcodeToString.

304
quickBasicEngine Reference Manual

Table B-3. qbPolish Properties and Methods (continued)


Property/Method Description
Public Function opcodeFromString Method that assigns the opcode from its
(ByVal strOpcode As String) As name. It can be used instead of the Opcode
Boolean property when the enumerator name is
undefined in your project.
Public Overloads Shared Function Shared method that obtains the description
opcodeToDescription As String of the opcode.
Public Function opcodeToString() Method that returns the opcode as a string.
As String It can be used instead of the Opcode property
when the enumerator name is undefined in
your project.
Public Property OperandO As Object Read-write property that returns and can
change the Polish operand.
Public Property TokenLength 0 Read-write property that returns and can
As Integer change the length, in tokens, of the source
code responsible for the Polish instruction.
Public Property TokenStartIndexO Read-write property that returns and can
As Integer change the token index from 1 of the source
code responsible for the Polish instruction.
Public Function toStringO As String Method that converts the Polish operation
to a string in the format op operand: com-
ment. The string is always suitable for display;
in particular, the operand is converted to
a number or a quoted string.
Public ReadOnly Property Usable() Read-only property that returns True if the
As Boolean object instance is usable; False otherwise.

qbScanner
The qbScanner class scans input source code for the quickBasicEngine and pro-
vides, on demand, scanned source tokens and scanned lines of source code. This
class uses "lazy" evaluation, scanning the source code only when necessary and
when an unparsed token is requested.
References of qbScanner include collectionUtilities.DLL, qbToken.DLL,
qbToken'JYpe.DLL, and utiIities.DLL.

305
AppendixB

qbScanner is serially threadable. Multiple instances can run simultaneously


in multiple threads, but errors will result if one object's procedures run in multi-
ple threads and in parallel.
The qbScanner class is ICloneable and IComparable; see its clone and compareTo
methods in Table B-4.
The compareTo and normalize methods make format-independent compari-
son of source code possible. compareTo will ignore white space when comparing
two scanned source code strings, and normalize will reduce white space to a
standard form in preparation for comparing source code strings.

The qbScanner Data Model and State


The state of this class consists of raw source code, and a series of qbTokens
indexed commencing at the start of the input code and accounting for all char-
acters of source code, comments, and white space. See qbToken.vb for the data
model of the token itself.
The state of the scanner consists of the following:

• strName: Object instance name

• booUsable: Object usability switch

• strSourceCode: Input source code

• intlast: Index of the last token parsed or zero when no tokens have
been parsed

• objQBtokenO: Array of scanned qbTokens

• objQBtokenO: Array of pending qbTokens, maintained during the lookahead


scan (see Chapter 5)

• intLineNumber: Current line number

• colLinelndex: Collection, relates line numbers to character positions. Its


key is _lineN umber (underscore followed by a line number). Each entry
contains a subcollection with two items: item(l) is the line number, and
item(2) is the start index, from 1, of this line number.

• booScanned: Indicates whether the strSourceCode has been completely


scanned

306
quickBasicEngine Reference Manual

qbScanner Inspection Rules


The following inspection rules are used by the inspect method as a check on errors
in the source code, whether as delivered or as changed, or due to object abuse in
the form of using the object after a serious user error has occurred:

• The object instance must be usable.

• Each token in both the array of scanned tokens and the array of pending
tokens must pass the inspect procedure of qbToken.

• The tokens in the scanned array must be in ascending order; gaps are
acceptable but not overlaps.

• No token's end index may point beyond the end of the source code in
either the scanned array or the pending array.

• The line number must be greater than or equal to o.

• The format of the line number index collection positive integers must be
valid. This is a collection of three-item subcollections. Item(l) must be
a string containing the key of the index entry. ltem(2) and item(3) must be
positive integers. Item(2) cannot be zero.

• If the (nonnull) code is fully scanned, the first token's start index should be
the same as the position of the first nonblank character in the source
code. The last token's end index should be the same as the position of the
last nonblank character.

• If the code is null and indicated as fully scanned, the scan count must
be empty.

An internal inspection is carried out in the constructor (after the object con-
struction steps are complete) and the dispose method (before the reference objects
in the state are disposed). Note that the dispose inspection may be suppressed
using the overload dispose(False).

307
AppendixB

Properties~ Methods~ and Events of the qbScanner Class


Table B-4lists the properties, methods, and events of the qbScanner class.

Table B-4. qbScanner Properties, Methods, and Events


Property/Method/Event Description
Public Shared ReadOnly Property About Shared, read-only property that returns infonnation
As String about this class.

Public Overloads Function checkToken Method that checks the scanned tokens for
(ByRef intIndex As Integer, ByVal strValueExpected. If it finds the expected value, it
strValueExpected As String, Optional increments a token index. intIndex should be an
ByVal intEndIndex As Integer = 0) Integer, passed by reference. The scan token at this
As Boolean index is checked. On success, this integer is incre-
mented; on failure, it is unchanged. strvalueExpected
is compared to the source code, disregarding case
differences. The optional parameter intEndIndex
can be used to restrict the check to all tokens up to
and including the token at the specified end index.
See also checkTokenByTypeName.

Public Overloads Function checkToken Method that checks the scanned tokens for the
(ByRef intIndex As Integer, ByVal type in enuTypeExpected. If it finds the expected
enuTypeExpected As qbTokenType. token type, it increments a token index. intlndex
qbTokenType.ENUtokenType, Optional should be an Integer, passed by reference. The scan
ByVal intEndIndex As Integer = 0) token type at this index is checked. On success this
As Boolean integer will be incremented. The optional param-
eter intEnd Index can be used to restrict the check
to all tokens up to and including the token the spe-
cified end index. See also checkTokenByTypeName.

Public Overloads Function Method that checks the scanned tokens for
checkTokenByTypeName(ByRef intIndex strValueExpected. If it finds the expected token
As Integer, ByVal strTypeExpected type (identified using its name), it increments
As String, Optional ByVal intEndIndex a token index. intIndex should be an Integer,
As Integer = 0) As Boolean passed by reference. The scan token type at this
index is checked. On success, this integer is incre-
mented. The optional parameter intEndIndex can
be used to restrict the check to all tokens up to and
including the token the specified end index. See
also checkToken.

Public Function clear As Boolean Method that clears the source code and resets the
scan.

308
quickBasicEngine Reference Manual

Table B-4. qbScanner Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Function clone As qbScanner Method that makes a clone of the scanner object.
The clone is guaranteed only to have the same
source code that will tokenize to the same source
code and contain the same white space patterns.
The clone, when passed to the compareTo method
as exposed by the source object, returns True. This
method implements ICloneable.

Public Function compareTo(ByVal Method that compares the object instance with the
objScanner As qbScanner) As Boolean scanner object passed in objScanner, returning
True when the source code in the instance is iden-
tical, after tokenization, to the object code. The
source code in the instance may have a different
white space pattern from the source code in
objScanner. The qbScanner clone always produces
an object that returns True when compared to the
source. (All objects that compare to a given object
are token-identical, but not all are clones, because
a clone will be white-space-identical in addition to
being token-identical.) This method implements
IComparable.
Public Overloads Function dispose Method that disposes of the object and cleans up
As String any reference objects in the heap. This method
marks the object as unusable. This overload will
always conduct an internal inspection of the object
instance (using the inspect method), and an error
is thrown if the inspection failed. For best results,
use this method when you are finished using the
object in code. See the next method for an overload
that allows inspection to be skipped.

Public Overloads Function dispose Method that disposes of the object and cleans up
(ByVal boolnspect As Boolean) As String any reference objects in the heap. This method
marks the object as unusable. This overload inspects
the object instance, unless dispose (False) is used.
For best results, use this method when you are fin-
ished using the object in code.

309
AppendixB

Table B-4. qbScanner Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Function findRightParenthesis Method that searches for a balancing right paren-
(ByVal intIndex As Integer, Optional thesis at the scanner position in i. On success, it
ByVal intEndIndex As Integer = 0) returns the index of the token containing the right
As Integer parenthesis. On failure, it returns one index past
the last parenthesis with no other error indication.
intIndex should normally point one character to
the right of the left parenthesis to be balanced. The
optional parameter intEndIndex can be used to
restrict the search to all tokens up to and including
the token at the specified end index.

Public Overloads Function findToken Method that searches the scanned tokens left to
(ByVal intIndex As Integer, ByVal right, starting at intIndex, for the expected value in
strValueExpected As String, Optional strValueExpected, ignoring case differences. Ifit
ByVal intEndIndex As Integer = 0) finds the expected value, it returns the scan index of
As Integer the token. If it does not find the expected value, it
returns O. The optional parameter intEndIndex can
be used to restrict the search to the token up to and
including the token at the specified end index.

Public Overloads Function findToken Method that searches the scanned tokens left to
(ByVal intIndex As Integer, ByVal right, starting at intIndex for the expected token
enuTypeExpected As qbTokenType. type. If it finds the expected type, it returns the
qbTokenType.ENUtokenType, Optional scan index of the token. If it does not find the
ByVal intEnd Index As Integer = 0) expected type, it returns O. The optional parameter
As Integer intEndIndex can be used to restrict the search to
the token up to and including the token at the
specified end index.

Public Overloads Function findToken Method that searches the scanned tokens left right,
ByTypeName(ByVal intIndex As Integer, starting at intIndex, for the expected type named
ByVal strTypeExpected As String, in strTypeExpected, ignoring case differences. If it
Optional ByVal intEndIndex As finds the expected value, it returns the scan index
Integer = 0) As Integer of the token. If it does not find the expected value,
it returns O. The optional parameter intEndIndex
can be used to restrict the search to the token up to
and including the token at the specified end index.
Public Function inspect(ByRef strReport Method that inspects the object. The report param-
As String) As Boolean eter should be a string, passed by reference; it is
assigned an inspection report. See the "qbScanner
Inspection Rules" section preceding this table.

310
quickBasicEngine Reference Manual

Table B-4. qbScanner Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Shared Function isInteger(ByVal Shared method that returns True when strInstring
strInstring As String) As Boolean is an unsigned integer in the syntactical sense of
containing no sign, no decimal part (including no 0
decimal part as in 1.0), and no exponent (including
no meaningless exponent as in .1eO.).
Public ReadOnly Property Line(ByVal Indexed, read-only property that returns the source
intLine As Integer) As String code contained in the line numbered intLine
(numbering starts at 1). Use of this property forces
a complete scan. Continuation lines count as
distinct lines.
Public ReadOnly Property LineCount Read-only property that returns the total number
As Integer of lines in the source code. Use of this property
forces a complete scan. Continuation lines count
as distinct lines.
Public ReadOnly Property LineLength Indexed, read-only property that returns the length
(ByVal intLine As Integer) As String of the source code contained in the line numbered
intLine (numbering starts at 1). Use of this
property forces a complete scan. Continuation
lines count as distinct lines.

Public ReadOnly Property Indexed, read-only property that returns the char-
LineStartIndex(ByVal intLine As acter starting index, from 1, of the source code
Integer) As String contained in the line numbered intLine (number-
ing starts at 1). Use of this property forces a complete
scan. Continuation lines count as distinct lines.
Public Function mkUnusable As Boolean Method that forces the object instance into the
unusable state; it always returns True.
Public Property Name() As String Read-write property that returns and can set the
name of the object instance, which identifies the
object in error messages and on the XML tag that is
returned by object2XML. The name defaults to
qbScannernnnn date time, where nnnn is a sequence
number.

311
AppendixB

Table B-4. qbScanner Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Overloads Function normalize() Method that returns the normalized form of the
As String source code in the object instance.! It places one
space between each token, and it removes all tokens
at the start and end of the source code. It does not
change the source code. The output of this method
is not especially readable, and it actually inserts
unneeded spaces between tokens. However, two
normalized source code sequences will be character-
identical, which means that normalization is a tool
for comparing source code in different formats for
logical identity.
Public Overloads Function object2XML Method that converts the state of the object to an
(Optional ByVal booAboutComment As XML string. The returned tag includes all source
Boolean = True, Optional ByVal code and parsed tokens, so it may be unmanage-
booSt ate Comment As Boolean = True) ably large for large source code files. See the next
As String overload of this method for a way to truncate the
source code and/or tokens. lWo optional param-
eters are exposed: booAboutComment: =False
suppresses a boxed comment at the start of the
XML containing the value of this object's About
property, and booStateComment: =False suppresses
comments that describe each state value returned.
Public Overloads Function object2XML Method that converts the state of the object to an
(ByVal intSourceTruncation As Integer, XML string. The returned tag includes all source
ByVal intTokenTruncation As Integer, code and all parsed tokens, so it may be unman-
Optional ByVal booAboutComment As ageably large for large source code files. Therefore,
Boolean = True, Optional ByVal this overload allows a maximum source length to
booStateComment As Boolean = True) be specified in intSourceTruncation, and/or the
As String maximum number of tokens to be specified in
intTokenTruncation. lWo optional parameters are
exposed: booAboutComment: =False suppresses a
boxed comment at the start of the XML containing
the value of the object's About property, and
booStateComment: =False suppresses comments
that describe each state value returned.

1. Normalization shouldn't be confused with prettyprinting or packing, although it may save


space.

312
quickBasicEngine Reference Manual

Table B-4. qbScanner Properties, Methods, and Events. (continued)


Property/Method/Event Description
Public ReadOnly Property QBToken(ByVal Indexed, read-only property that returns the
intIndex As Integer) As qbToken.qbToken indexed scanned token as an object of type
qbToken. It will cause a scan of tokens, up to and
including token i, when the token is not available.
Public Function reset() As Boolean Method that resets the scan. The reset method does
not clear the source code; it merely undoes all parsing
done prior to the reset. See also the clear method.
Public Overloads Function scan() Method that resets the scanner object and scans
As Boolean all characters in the source code as set by the
SourceCode property.
Public Overloads Function scan(ByVal Method that resets the scanner object, sets the
strSourceCode As String) As Boolean SourceCode property to strSourceCode, and scans
all characters.
Public Overloads Function scan(ByVal Method that scans existing source code from a
lngEndIndex As Long) As Boolean previous scan position to lngEndIndex, which must
be the Long precision end index for the scan (last
character to be scanned from 1). The scanner is not
reset. Tokens are appended from the source code
starting at 1 or the end of the previous scan, until
a token that ends at or after lngEndIndex is scanned
or the end of the source code is scanned, which-
ever comes first. A previous scan position exists
unless the reset method has been executed, where
the scan will start at 1. If your end value is not Long,
use CLng(end) to convert it to the required type to
avoid confusion with the overload scan(intCount),
which scans a specific number of tokens.
Public Overloads Function scan(ByVal Method that scans existing source code from
lngStartIndex As Long, ByVal lngStartIndex to lngEndIndex. The scanner is not
lngEndIndex As Long) As Boolean reset. Tokens are appended from the source code
starting at lngStartIndex until a token that ends at
or after lngEndIndex is scanned or the end of the
source code is scanned.

313
AppendixB

Table B-4. qbScanner Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Overloads Function scan(ByVal Method that scans existing source code from a
intCount As Integer) As Boolean previous scan position until intCount tokens have
been found or the end of the source code is reached.
The scanner is not reset. Tokens are appended from
the source code starting at the scan position until
intCount tokens have been scanned. A scan position
exists unless the reset method has been executed.
Immediately after a reset, the scan position will be
1. If your count value is not Long, use CInt (count)
to convert it to the required type, to avoid confu-
sion with the overload scan(lngEndIndex), which
scans to a specified end index.
Public Event scanErrorEvent(ByVal Event that occurs when an error is detected by the
strMsg As String, ByVal intIndex scanner. strMsg is the error message, intIndex is
As Integer, ByVal intLineNumber the character at which the error was detected,
As Integer, ByVal strHelp As String) intLineNumber is the line number, and strHelp
contains additional information. 2

Public Event scanEvent(ByVal Event that fires at completion of each successful


objQBtoken As qbToken.qbToken, ByVal scan of a token; useful for progress reporting. It
intCharacterIndex As Integer, ByVal passes the following to the delegate: objQBtoken is
intLength As Integer, ByVal the token object (of type qbToken); intCharacterIndex
intTokenCount As Integer) is the character index, from 1, of the token;
intLength is the length of the source code; and
intTokenCount contains the number oftokens
found so far, including this token.
Public ReadOnly Property ScannedO Read-only property that returns True when the
As Boolean source code has been fully scanned; False otherwise.

Public Property SourceCode () As String Read-write property that returns and may be set to
the source code for scanning. Assigning source
code clears the array of tokens in the object state,
but does not result in an immediate scan of the
source code. Scanning occurs when the QBToken
property is called and the token is not available.
Public Overloads Function sourceMid Method that returns the source code that com-
(ByVal intStartIndex As Integer) mences at the token at intStartIndex (a token
As String index, not a character index).

2. At this writing, the only error detected occurs when unrecognizable characters are found.

314
quickBasicEngine Reference Manual

Table B-4. qbScanner Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Overloads Function sourceMid Method that returns the source code that com-
(ByVal intStartIndex As Integer, ByVal mences at the token at intStartlndex and contains
intLength As Integer) As String intLength tokens.
Public Function test(ByRef strReport Method that tests the scanner and returns True
As String) As Boolean when all tests are passed or False when any test
fails. The by-reference string parameter r is set on
success or failure to a test report. When the test
fails, the object is marked unusable.
Public Shared ReadOnly Property Shared, read-only property that returns the test
TestString() As String string used in the test method. This string tests all
tokens for valid results.
Public Function token(ByVal intIndexl Method that returns the string value of the token
As Integer) As String indexed by intIndexl, where intIndexl is between
1 and TokenCount. 3
Public ReadOnly Property TokenCount() Read-only property that returns the number of
As Integer tokens. Calling TokenCount causes a complete scan
of the source code.
Public Function tokenEndIndex(ByVal Method that returns the character end index of the
intIndex As Integer) As Integer token at intIndex.
Public Function tokenLength(ByVal Method that returns the length of the token at
intIndex As Integer) As Integer intIndex.
Public Function tokenLinenumber(ByVal Method that returns the line number at which the
intIndex As Integer) As Integer token at intIndex starts.
Public Function tokenStartIndex(ByVal Method that returns the character start index of the
intIndex As Integer) As Integer token at intlndex.
Public Function tokenType(ByVal Method that returns the type of the token at
intIndex As Integer) As qbTokenType. intIndex as an enumerator of type ENUtokenType.
qbTokenType.ENUtokenType See the "qbTokenType" section for the possible
values of ENUtokenType enumerators.

Public Function tokenTypeAsString Method that returns the type of the token at
(ByVal intIndex As Integer) As String intIndex as a string. See qbTokenType for the
possible values of ENUtokenType enumerators,
which convert directly to string values.

3. At this writing, this method will result in a full scan of the input source code.

315
AppendixB

Table B-4. qbScanner Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Overloads Overrides Function Method that converts all tokens into a string con-
toString() As String taining their serialized values separated by newlines.
Each value will be in the form <type>@<startlndex> ••
<endlndex>: <lineNumber>:<sourceCode>.
Public Overloads Overrides Function Method that converts all tokens commencing with
toString(ByVal intStartIndex As Integer) the token at intStartIndex into a string containing
As String their serialized values separated by newlines. Each
value will be in the form <type>@<startIndex> ••
<endlndex>: <lineNumber>:<sourceCode>.
Public Overloads Overrides Function Method that converts intCount tokens commenc-
toString(ByVal intStartIndex As ing with the token at intStartIndex into a string
Integer, intCount) As String containing their serialized values separated by
newlines. Each value will be in the form
<type>@<startlndex> •• <endlndex>: <lineNumber>:
<sourceCode>.
Public ReadOnly Property Usable() Read-only property that returns True if the object
As Boolean instance is usable; False otherwise.

qbToken
The qbToken class defines one scan token as used in quickBasicEngine, including
its start index, length, type, and its line number.
References of qbToken include qbTokenType.DLL and utilities.DLL.
qbToken is serially threadable. Multiple instances can run simultaneously in
multiple threads, but errors will result if one object's procedures run in multiple
threads and in parallel.
The qbToken class is ICloneable: see its clone method.

The Token Data Model


For our purposes, the token consists of the following information: the token type,
the start index (from 1) of the token in the sOurce code, the length of the token,
and its line number. See the "qbTokenType" section for the token types.
The token data model does not include the value of the token, because this
would make the data structures in this class larger, by definition, than the source

316
quickBasicEngine Reference Manual

code. Instead, the user code is expected to use the start index and the length to
get the raw source code.

qbToken Inspection Rules


The following inspection rules are used by the inspect method as a check on
errors in the source code, whether as delivered or as changed, or due to object
abuse in the form of using the object after a serious user error has occurred:

• The object instance must be usable.

• The type must be a valid enumerator value, other than Invalid.

• The start index must be greater than or equal to zero.

• The length must be greater than or equal to zero.

• The line number must be zero or greater.

If the inspection fails, the object becomes unusable.


An internal inspection is carried out in the constructor (after the object con-
struction steps are complete) and the dispose method (before the reference
objects in the state are disposed). The dispose inspection may be suppressed
using the overload dispose(False).

Properties and Methods of the qbToken Class


Table B-5lists the properties and methods of the qbToken class.

Table B-S. qbToken Properties and Methods


Property/Method Description
Public Shared ReadOnly Property Shared, read-only property that returns
About As String information about this class.
Public Shared ReadOnly Property Shared, read-only property that returns the
ClassName As String name of this class (qbToken).
Public Function clone() As qbToken Method that creates a new and identical
token object based on the instance (since
tokens are ICloneable).

317
AppendixB

Table B-S. qbToken Properties and Methods (continued)


Property/Method Description
Public Overloads Function dispose Method that disposes of the object. This method marks
As String the object as unusable. 4 This overload always conducts
an internal inspection of the object instance (using the
inspect method), and an error will be thrown if the
inspection is failed. For best results, use this method
when you are finished using the object in code. See the
next method for an overload that allows inspection to
be skipped.
Public Overloads Function dispose Method that disposes of the object. This method marks
(ByVal booInspect As Boolean) the object as unusable. This overload inspects the
As String object instance unless dispose (False) is used. For best
results, use this method when you are finished using
the object in code.
Public Property EndIndex() As Read-write property that returns and can be set to the
Integer ending index, from 1, of the token. Changing this prop-
erty changes the length of the token. This property is
calculated from the start index and length of the token.
Public Function fromString(ByVal Method that sets the token to values created by the
strToString As String) As Boolean toString method. strToString must be in the format
<type>@<startlndex> •. <endlndex>:<lineNumber>. The
line number, and the colon preceding the line num-
ber, are optional.
Public Function inspect(ByRef Method that inspects the object. The report parameter
strReport As String) As Boolean should be a string, passed by reference; it is assigned an
inspection report. See "qbToken Inspection Rules" pre-
ceding this table.
Public Property length As Integer Read-write method that returns and may be set to the
token length.
Public Property lineNumber As Read-write method that returns and may be set to the
Integer number of the line containing the token.
Public Function mkUnusable As Method that forces the object instance into the unusable
Boolean state. It always returns True.

4. In the specific case of qbTokens, at this writing, there are no reference objects in the state for
cleanup. The dispose is provided for consistency, to allow for future growth and to mark the
object as unusable.

318
quickBasicfngine Reference Manual

Table B-S. qbToken Properties and Methods (continued)


Property/Method Description
Public Property Name() As String Read-write property that returns and can set the name
of the object instance, which identifies the object in
error messages and on the XML tag that is returned by
object2XML. The name defaults to qbTokennnnn date
time, where nnnn is a sequence number.
Public Property StartIndex As Read-write property that returns and may be set to the
Integer starting index, from I, of the token in the source code.
It may be set to 0, usually to indicate a nonexistent token.
Public Function object2XML As String Method that converts the state of the object to an XML
string.
Public Property StartIndex() As Read-write property that returns and may be set to
Integer the start index of the token in its source code
(position numbering is from 1).
Public ReadOnly Property Read-only property that returns the type of the token as
TokenType() As ENUtokenType an enumerator of type ENUtokenType. For a list of the
supported types, see the "qbTokenType" section.
Public Function tokenTypeMatch Method that matches the token in the instance with
(ByVal enuType As ENUtokenType) enuType. It returns True when the token types are
As Boolean identical, or when the range of the instance is a part of
the range of enuType. For example, if the instance is an
unsigned real number and enuType is "unsigned integer"
this method will return True.
Public Function toStringO As String Method that converts the token state into a string con-
taining its serialized value in the form
<type>@<startlndex> •• <endlndex>: <lineNumber>. This
state can always be assigned to a token index using the
fromString method.
Public Shared Property TypeCount Shared, read-only property that returns the number of
distinct types defined, excluding null, invalid, and
ampersand Suffix.
Public Shared Property Shared, read-only property that returns the number of
TypeCountActual distinct types defined, including null, invalid, and
ampersandSuffix.
Public Function typeFromString Method that sets the type using strType, after leading
(ByVal strType As String) As Boolean and trailing blanks and case differences are ignored.
Public Shared Function typeToEnum Shared method that returns the distinct ENUtokenType
(ByVal strType As String) As identified by a case-insensitive name, from which
ENUtokenType leading and trailing blanks are removed. If the prefix
"tokenType" is not provided in strType, it will be added.

319
AppendixB

Table B-5. qbToken Properties and Methods (continued)


Property/Method Description
Public Shared Function typeToIndex Shared method that returns the distinct index value
(ByVal strType As String) As type identified by a case-insensitive name in strType,
Integer from which leading and trailing blanks are removed.

Public Function typeToString(ByVal Method that returns the string value of the type
enuType As ENUtokenType) As Integer assigned to the current instance.

Public ReadOnly Property Usable() Read -only property that returns True if the object
As Boolean instance is usable; False otherwise.

qbTokenType
The qbTokenType class merely defines the token types recognized by the qbScanner
and qbToken classes.

Token Types

Table B-6 defines the token types in the ENUtokenType enumerator that is exposed
by the qbTokenType class.

Table B-6. Token TYpes


Type Description
tokenTypeAmpersand Ampersand

tokenTypeApostrophe Single quote

tokenTypeColon Colon

tokenTypeComma Comma

tokenTypeIdentifier Identifier

tokenTypeNewline Newline

tokenTypeOperator Operator

tokenTypeParenthesis Left or right parenthesis

tokenTypeSemicolon Semicolon

tokenTypeString String

tokenTypeUnsignedInteger Unsigned integer (sign is always an op)

tokenTypeUnsignedRealNumber Unsigned real number (sign is always an op)

320
quickBasicEngine Reference Manual

Table B-6. Token Types (continued)


Type Description
token Type Percent Percent
token Type Exclamation Exclamation point
tokenTypePound Pound sign
tokenTypeCurrency Dollar sign
tokenTypePeriod Period
tokenTypeNull Null value
tokenTypeInvalid Invalid
tokenTypeAmpersandSuffix Ampersand, preceded by an identifier

qbVariable
The qbVariable class represents the type, structure, and value of a quick basic
scalar, an n-dimensional QuickBasic array, or a user data type.
References of qbVariable include collectionUtilities.DLL, qbScanner.DLL,
qbTokenType.DLL, qbVariableType.DLL, and utilities.DLL.
This class implements IDisposable, ICloneable, and IComparable.
qbVariable is serially threadable. Multiple instances can run simultaneously
in multiple threads, but errors will result if one object's procedures run in multi-
ple threads and in parallel.

The qbVariable Data Model


Each type of QuickBasic variable has structure and data, as follows:

Scalar: Scalars are of type Boolean, Byte, Integer, Long, Single, Double, or
String. The structure of a scalar is just its type. Data is the data associated
with the variable. For a scalar, the data is represented by the correspond-
ing .NET type with two important exceptions: QuickBasic Integers are
represented by .NET Short integers, because .NET Integers are 32-bit,
while QuickBasic Integers are 16-bit. QuickBasic Longs are represented
by .NET Integers, because .NET Longs are 64-bit, while QuickBasic
Integers are 32-bit.

321
AppendixB

NOTE Mapping of QuickBasic variables to .NET variables is accurate unless


the variable is Single or Double. Single and double precision are not (at this
writing) mapped accurately, and Single and Double values will have a wider
range than the corresponding QuickBasic values. This means that numerical
results may differ when running old QuickBasic code using this object and
quickBasicEngine.

Variant: A Variant is a variable that contains (has a) nonvariant data item.


The structure of a Variant is the type (not including the value) of what it
contains. The structure of an ordinary Variant is "concrete," because it
cannot be specified unless the accompanying contained nonvariant
type is also specified. Variants actually contain an instance of this type
(qbVariable) in their state, which provides the type and the data of the
variant value. This qbVariable, however, is prevented from being itself
a Variant, but it is allowed to be an array.

Array: An Array is a collection of entries with a nonvariant type. The


structure of an array is its number of dimensions, and, in QuickBasic,
upper and lower bounds, which can be almost any positive or negative
numbers.s The structure of an array also includes the uniform (what we
refer to as orthogonal) type of all the data in the array. This type can be
an abstract Variant type that as a pure Variant type does not specify the
contained type, because this will vary in an array. One-dimensional array
data is represented by a collection of .NET objects. 1Wo-dimensional arrays
are represented by a collection that contains one or more orthogonal mem-
ber collections. 6 In general, n-dimensional arrays are represented by fully
orthogonal balanced trees of subcollections.

NOTE The Array structure is sometimes referred to as a dope vector. Here, the
dope concept is generalized to use dope as a synonym for the structure ofany
variable.

5. The only semi-useful ability to start an array at a lower bound, other than one, was dropped
by.NET.
6. These are orthogonal in the sense that each subcoUection has an identical number of
members.

322
quick8asicEngine Reference Manual

User Defined Type (UDT): A UDT is also represented as a collection of


entries. The structure of a UDT is the ordered collection of variables. In
QuickBasic, this collection cannot be nested. It cannot contain, directly,
definitions of further UDTs (as seen, for example, in Cobol). But it can
contain UDTs defined elsewhere. UDTs contain a collection of qbVariables
representing their components.

Unknown and Null: A variable can be an Unknown value or Null (repre-


sented by Nothing in .NET.). The structure of an Unknown or Null variable
is just its "being-Unknown" or its "being-Nothing."

Containment, Identity, and Isomorphism in qbVariable


The containedVariable, isomorphicVariable, and stringldentical methods sup-
port comparison of variables.
The variable a is "contained in" the variable b solely by virtue of its underly-
ing type; if all potential values of a can be assigned to b, then a is contained in b.
(See the "qbVariableType" section for more information.)
The variable a is isomorphic to b when the type of a is contained in b, the
type of b is contained in a, and the values of a and b (after conversion to a string)
are the same.
The conversion to a string is performed by calling the toString method for
serializing the variable to type:value(s), and throwing away the material to the left
of the colon as well as the colon. (toString is described in the next section.) The
stringldentical method will test two qbVariable objects for this type of identity.
In the case of scalar isomorphism, a and b will have identical type and iden-
tical value. But if a and b are arrays, they may differ in lowerBounds and
upperBounds, while retaining all other common properties.
UDTs are never contained or isomorphic (at this writing).

The /romString Expression Supported by qbVariable


The state of the variable is represented in an expression accepted by the
fromString method of this class and generated by the toString method, known
as the fromString expression.
The fromString expression contains both the type and values of the variable
in a string, in the form type: value.
If the type is present and the value is omitted, the variable will take on the
default contents for the type.
If the type is omitted and the value is specified, the type will default to the
narrowest QuickBasic type capable of containing the specified data. If the value
string is null or an asterisk, the type is Unknown. If the value string is a number or

323
AppendixB

quoted string, the type is the narrowest scalar QuickBasic type that can contain
the value. If the value string is in parentheses, contains a comma-separated list,
or both, the type is Array, and the array's entry type is determined by examining
the values in the array. If they all convert to a single type, the array's type is this
type. If they all convert to more than one type, the array's type is Variant. The
type may not be omitted when a UDT is specified.

jromString Types
The type should be the variable type in the syntax supported by
qbVariableType. fromString and one of the following values, depending of the
overall type:

• For a scalar type, the type should be one of Boolean, Byte, Integer, Long,
Single, Double, or String.

• For a Variant, the type should be Variant,scalarType, Variant, (arrayType)


or Variant, (userDataType). The scalar type should be as described for a
scalar type. The array type or UDT should be in parentheses and as de-
scribed in the following items.

• For an Array, the type should be Array, type, bounds, where type is the name
of a scalar type, the keyword Variant, or a parenthesized UDT definition.
The type of variant arrays is specified "abstractly" and with no associated
scalar type.

• For a UDT, the type should be VDT, memberlist, where the member list
consists of one or more comma-separated and parenthesized member
definitions. Each definition in the member list has the parenthesized form
(name, type), where name is the member name and type is its type. The type
must be scalar, abstract Variant, or Array.

• For the Unknown type, the type should be Unknown.

• For the Null type, the type should be Null.

jromString Values
The fromString expression value should specify the variable value(s). If the vari-
able is a scalar or a Variant that does not contain an array, the value may be the
scalar's value (compatible with its type) as True, False, a number, or a string,
quoted using Visual Basic's conventions.

324
quickBasicEngine Reference Manual

Alternatively the variable may be in "decorated" form as type ( value), where


the value is True, False, a number, or a string.
The variable may be represented as an asterisk. This will assign the appro-
priate default value for the type.
If the variable is an array, the value should be the list of array values. This is a
comma-separated list of scalar values (plain or decorated) for a one-dimensional
array. This is a comma-separated list of parenthesized rows for a two-dimensional
array. In general, this is a comma-separated list of array slices (arrays of one
dimension lower) for n-dimensional arrays. Each value in the array may option-
ally be followed by a repeat count in parentheses. The entry value will be repeated
until the end of the current slice or the indicated number of entries. The repeat
count may be an asterisk to repeat to the end of the current array slice.

NOTE When the variable is not otherwise known to be an array (when, for
instance, the variable rype is omitted from the fromString expression), the use
ofa repeat count will make the variable into an array.

For a UDT, the value should be the comma -separated list of member values.
Each member that is a scalar or the scalar value of a Variant member should be
its value in string form or in the decorated form type(value). Each member that
is an array should be the array's value, represented orthogonally (as described in
the previous paragraph) and enclosed in parentheses. Each member that is a UDT
should be the nested UDT specification, in parentheses.
For Unknown and Null types, values (and its preceding colon) should not be
specified.
The syntax: value (colon and value without a type) may be used to change
the value of the variable without altering the type. The value must be compatible
with the existing type, unless the existing type is Unknown; in this case, the type will
be changed to the narrowest QuickBasic type capable of containing the value.

Examples of fromString Expressions


This section presents examples of fromString expressions with various types
and values.

TIP You can run the qbVariableTest executable (provided with the sample
code) and try each example. Type it in the text box at the top of the screen and
click Create to make sure the example creates the qbVariable object. Then
click the toString button to verify that the fromString expression converts to
the variable and type specified in the examples.

325
AppendixB

Integer:4

specifies a I6-bit integer containing the value four.

Variant, Integer:4

specifies a variant that contains a I6-bit integer containing the value four.

Array,Integer,O,3:1,2,3,4

specifies a one-dimensional integer array.

Array,Integer,1,2,1,2:(1,2),(1,2)

specifies a two-dimensional integer matrix.

Array,Variant,1,2:Integer(1),Long(2)

specifies a one-dimensional variant array, and it uses decoration to be specific


about the type.

32768

specifies a Long integer containing 32768.

:32767

assigns 32767 to a prespecified type. When set after the previous example, : 32767
will preserve the type of Long integer. When assigned to an uninitialized variable,
: 32767 creates a 16-bit integer.

Array, Byte,o,1

specifies a Byte array that contains the Byte default values of O. The to String will
be Array, Byte, 0, 1:*.

Array,Byte,O,l:*,l

specifies a Byte array that contains the Byte default value of 0 followed by 1. The
toString will be Array, Byte, 0,1: *,1.

Array,Variant,o,l,l,2:(32767,IB"),(32768,l)

specifies a Variant array. The toString will be Array, Variant, 0,1,1,2 :


(System.lnt16(32767), System.String(IB"», System. Int32(32768), System. Byte(1».
Note that values are decorated, because the array has variant entries.

326
quickBasicEngine Reference Manual

UDT,(intMemberOl,Integer), (strMember02,Array,String,l,2), (typMember03,(udt,


(intMemberOl,Integer))) : l,("A","B"), (udt,(intMemberOl,Integer):l)

specifies a UDT containing an integer, a string array, and an inner UDT.

jromString Values Returned as Random Variables


Some of the qbVariable methods return a random variable, with random type
and value, as an expression that is valid input for the fromString method.
The fromString returned will have the following randomly selected
characteristics.

• With 10% probability, it will be Unknown.

• With 10% probability, it will be Null.

• With 20% probability, the fromString will represent a scalar, and with equal
subprobability this will be any of the types Boolean, Byte, Integer, Long,
Single, Double, or String.

• With 20% probability, it will be an array, and this array will contain a vari-
able that has 50% probability of being a Variant and will otherwise be
a random scalar.

• With 20% probability, it will be a UDT, and this UDT will randomly contain
1..10 scalars, Arrays, Variants, and UDTs. Each type will have 25% probability.

• With 20% probability, it will be a Variant, and this Variant will contain a
variable that has these type probabilities, with one exception: there is a 70%
probability that the variable will be a scalar, and no probability that the
variable will be a Variant.

fromString BNF Syntax in qbVariable


The lexical syntax of fromString expressions matches that of the quickBasicEngine
itself: blanks can be freely used, and strings are delimited by double quotes (with
doubled double quotes representing, inside strings, the occurrence of a single
double quote).
Note that this object is responsible only for scanning and parsing the
fromStringValue, which contains the value(s) of the variable. Parsing of the
fromStringType occurs inside the qbVariableType object.

327
AppendixB

fromString := fromStringType
fromString := fromStringValue
fromString := fromStringWithValue
fromString := fromStringType COLON fromStringValue
fromString := COLON fromStringValue
fromStringType := baseType I udt
baseType := simpleType I variantType I arrayType
simpleType := [VT] typeName
typeName := BOOLEAN IBYTE IINTEGER ILONG ISINGLE IDOUBLE ISTRING I
UNKNOWNINULL
variantType := abstractVariantType COMMA varType
varType := simpleTypel(arrayType)
arrayType := [VT] ARRAY,arrType,boundList
arrType := simpleType I abstractVariantType I parUDT
parUDT := LEFTPARENTHESIS udt RIGHTPARENTHESIS
udt := [VT] UDT,typeList
typeList := parMemberType [ COMMA type List ]
parMemberType := LEFTPAR MEMBERNAME,baseType RIGHTPAR
abstractVariantType := [VT] VARIANT
boundList := boundListEntry I boundListEntry COMMA bound List
boundListEntry := BOUNDINTEGER,BOUNDINTEGER
simpleType := [VT] typeName
typeName := BOOLEAN IBYTE IINTEGER ILONG ISINGLE IDOUBLE ISTRING I
UNKNOWNINULL
variantType := abstractVariantType,varType
varType := simpleTypel(arrayType)
arrayType := [VT] ARRAY,arrType,boundList
arrType := simpleTypelabstractVariantType
abstractVariantType := [VT] VARIANT
boundList := boundListEntry I boundListEntry, bound List
boundListEntry := BOUNDINTEGER,BOUNDINTEGER
fromStringValue := ASTERISK I fromStringNondefault
fromStringNondefault := arraySlice [ COMMA fromStringValue ] *
arraySlice := element Expression I ( fromStringNondefault )
element Expression := element [ repeater ]
element := scalar I decoValue
scalar := NUMBER I VBQUOTEDSTRING I ASTERISK I TRUE I FALSE
decoValue := quickBasicDecoValue I netDecoValue
quickBasicDecoValue := QUICKBASICTYPE ( scalar )
netDecoValue := netDecoValue := [ SYSTEM PERIOD ] IDENTIFIER
LEFTPARENTHESIS ANYTHING RIGHTPARENTHESIS
repeater := LEFTPAR ( INTEGER I ASTERISK ) RIGHTPAR

328
quickBosicEngine Reference Manual

qbVariable Inspection Rules


The following inspection rules are applied by the inspect method as a check on
errors in the source code whether as delivered or as changed, or object abuse in
the form of using the object after a serious user error has occurred:

• The object instance must be usable.

• The variable type object objDope must pass its own inspection procedure.
It must be Unknown or an array type. lfthe dope is Unknown, the objValue
must be Nothing and the following tests are skipped.

• objValue must be one of the following:

Nothing (when the type of the variable is Unknown or Null)


One of the types that represents, in .NET, a QuickBasic type (Boolean,
Byte, Short, Integer, Single, Double, or String (when the type of the
variable is scalar)

A collection-the type must be Array or UDT. If the type is Array, this


must be an orthogonal collection that contains a balanced structure of
elements representing an array. Each final element's type must either
match the nonvariant type in the variable's variableType, or, when the
variable's variableType is Variant, each final element's type must be the
.NET representation of a QuickBasic scalar. If it consists of subcollec-
tions, each sub collection must be balanced and orthogonal. If the type
is UDT, the collection must consist exclusively of qbVariable objects.
Each must be a scalar, an Array, a Variant, or a UDT.

NOTE The collection must be orthogonal in that it must be a balanced tree.


To be a balanced tree, the collection must either contain O.. n scalars or O.. n
balanced subcoLlections.

A variant qbVariable that is either abstract (containing no value) or of


a scalar or UDT type. 7

7. At this writing, qbVariable does not support variants that contain arrays, although the
fromString syntax allows their specification. This rule should be changed to allow variants
that contain arrays when code is added to fully support this feature.

329
AppendixB

• The toString serialization of the variable must create a clone of the vari-
able when used with fromString. However, Variants, Arrays, and UDTs are
not subject to this rule

• The empirical dope of the variable must be consistent with its recorded
type. The empirical dope (the type as determined by examination of the
value) must be either the same as or contained in the type. Only scalars
are subject to this rule.

• If the variable is a Variant, its Variant type must match the type of its entry
as seen in the decorated value when the variable is serialized using
toString. For example, Variant J Byte: Integer( 256) is not valid.

Properties, Methods, and Events of qbVariable


Table B-7lists the properties, methods, and events of the qbVariable class.

Table B-7. qbVariable Properties, Methods, and Events


Property/Method/Event Description
Public Shared ReadOnly Property Read-only Shared property that returns information
About As String about the class.
Public Shared Function Class2XML Shared method that returns information about the
As String class as an XML tag.
Public Shared ReadOnly Property Shared read-only property that returns the class
ClassName As String name qbVariable.
Public Shared Function clearVariable Method that clears the variable. If it is a scalar, it is set
As Boolean to the default appropriate to its type, which is False for
Booleans, 0 for numeric types, and a null string for
strings. If the variable is a Variant, it is set to the default
appropriate to its contained type. If the variable is an
Array, each entry is set to the appropriate default. If the
variable is a UDT, each member is cleared according to
its type. If the variable is Unknown or Null, no change is
made. See also resetVariable.
Public Function clone As qbVariable Method that implements ICloneable. It creates a new
qbVariable with identical type and value, returning it
as the function value.

330
quickBasicEngine Reference Manual

Table B-7. qbVariable Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Function compareTo(ByVal Method that compares the object instance to qbVariable2
objQBvariable2 As qbVariable) and returns True when the type and value of the variables
As Boolean in both are the same; False otherwise. This method is
a wrapper for the private compareTo_ method, which
implements IComparable.
Public Overloads Function Method that returns True when the qbVariable object in
containedVariable(ByVal objVariable2 is contained in the instance as described
objVariable2 As qbVariable) in the preceding section. If the object instance is aUnT
As Boolean or objVariable2 is a UnT, this method returns False.

Public Overloads Function Method that returns True when the qbVariable object in
containedVariable(ByVal objVariable2 is contained in the instance as described
objVariable2 As qbVariable, in the preceding section. If the object instance is aUnT
ByRef strExplanation As True) or objVariable2 is a UnT, this method returns False.
As Boolean The strExplanation parameter is set to an explanation
of why the containment relation is True or False.
Public Function derefMemberName Method that is valid only for variables that are UnTs.
(ByVal strName As String) As strName should be the name of a UnT member, and this
qbVariable method returns the qbVariable object, contained
directly or indirectly in the overall instance, identified
by n. strName may be a simple member name. If it
selects a member that is aUnT, strName may be simple,
in which case, it returns the UnTo strName may also
select submembers when periods separate names. For
example, if a unT contains UDT01, and UDTOl contains
intVal, then this method returns the object corres-
ponding to intVal when strName is udtOl. intVal.

Public Sub dispose() Method that disposes of the heap storage associated
with the object (if any) and marks the object as not
usable. For best results, use this method (or
disposelnspect) when you are finished with the object.

Public Function disposelnspect() Method that disposes of the heap storage associated
As Boolean with the object (if any) and marks the object as not
usable. For best results, use this method (or dispose)
when you are finished with the object. This dispose
method conducts a final object inspection. See
"qbVariable Inspection Rules" preceding this table.

331
AppendixB

Table B-7. qbVariable Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Property Dope() As Read-write property that returns and can change infor-
qbVariableType mation about the variable as an instance of the class
qbVariableType. The default Dope is the Unknown
qbVariableType. It may be set to any qbVariableType.
Changing this property usually clears the variable. If the
variable is scalar or Variant, it is set to its appropriate
default. If the variable is an Array or UDT, each entry is
set to its default. However, when an array structure is
changed to an isomorphic structure (same dimensions,
identical element types, and same size at each dimen-
sion), setting Dope does not clear the array. Otherwise,
the array is cleared.

Public Function empiricalDope() Method that returns a reconstruction of the variable's


As String type (including array bounds when the variable is an
array) from its data exclusively.s This reconstruction is
returned as a string acceptable to the fromString method
of qbVariableType. If the variable is not an array, the
empiricalDope will be identical to the variable's dope.
If the variable is an array, the empiricalDope will be iso-
morphic to the array dope of the variable; dimensions
and bound sizes will be the same, as well as the entry
type; but 10werBounds of the empiricalDope will be O.
Public Function fromString(ByVal Method that sets the type and the value of the variable
strFromstring As String) As Boolean to the value serialized in strFromstring. See the preced-
ing section "The fromString Expression Supported by
qbVariable" for the syntax requirements of strFromstring.

Public Function inspect(ByRef Method that inspects the object instance for errors
strReport As String) As Boolean resulting from bugs in the original code, bugs in the
code as changed, or object abuse in the form of using
the object after a serious error has already occurred. An
internal inspection is carried out when the object is
constructed and inside the disposelnspect method. If
the inspection fails, the object is marked unusable. See
the preceding section "qbVariable Inspection Rules."

8. Since a valid instance contains type information, this method is primarily a curio, for intemal
use and to clarify the concept of deriving a type from data only, which we need when chang-
ing the data of an array without, unnecessarily, changing its structure.

332
quickBasicEngine Reference Manual

Table B-7. qbVariable Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Function isANumber() Method that returns True when the object instance is of
As Boolean scalar, numeric type including Boolean; False other-
wise. This method will return False for strings that
contain numbers.
Public Function isAnUnsigned Method returns True when the variable is an unsigned
Integer() As Boolean integer of any scalar type. The variable can be Boolean
(but not True, since this converts to a signed integer),
a string, or a real number type, as long as its syntactical
representation as a string is that of an unsigned integer.
Public Function isClearO As Boolean Method that returns True when the object instance
contains the default value appropriate to its type; False
otherwise.
Public Function isomorphicVariable Method that determines whether the variable
(ByVal objVariable2 As qbVariable) objVariable2 is an isomorph of the variable in the
As Boolean instance. See the preceding "Containment, Identity, and
Isomorphism in qbVariable" section for the rules of
isomorphism.

Public Function isomorphicVariable Method that determines whether the variable


(ByVal objVariable2 As qbVariable, objVariable2 is an isomorph of the variable in the
ByRef strExplanation As String) instance according to the rules of the preceding section
As Boolean "Containment, Identity, and Isomorphism in qbVariable."
This overload places an explanation of why objVariable
is or is not an isomorph in its strExplanation parameter.

Public Function isScalarO As Boolean Method that returns True when the object instance
represents a scalar variable or a Variant that contains
a scalar value.
Public Shared Function Shared method that returns a random variable with
mkRandomVariable() As String random type and value, as an expression that is valid
input for the fromString method. See the preceding
section "The fromString Expression Supported by
q bVariable."

Public Function mkUnusable Method that forces the object instance into the unusable
As Boolean state. It always returns True.

333
AppendixB

Table B-7. qbVariable Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Shared Function mkVariable Shared method that creates and returns a new qbVariable
(ByVal strFromString As String) object with the specified type and value in strFromString.
As qbVariable strFromString may be in the syntax type: value or the
syntax value; but in the latter syntax, when the value is
not a number, it must be quoted using Visual Basic
conventions. This method may be used to make an
array by explicitly specifying array type and members,
as in mkVariable("Array, Integer ,0,1:0,1 "). Also see
mkVariableFromValue.

Public Shared Function Shared method that creates and returns a new
mkVariableFromValue(ByVal objValue qbVariable object with the specified scalar value. The
As Object) As qbVariable value operand may be any .NET scalar value of the type
Boolean, Byte, Short, Integer, long, Single, Double or
String. When the value is a string, it should not be
quoted. This method cannot create an array. For example,
mkVariableFromValue("O,l") creates a string. It also can-
not create a variant; the qbVariable will instead have the
narrowest possible scalar, nonvariant type. For example,
mkVariableFromValue(32768) creates a Long integer.
Also see mkVariable.
Public Event msgEvent(ByVal strMsg Event that provides general information. It exposes the
As String, ByVal intlevel As strMsg and intlevel parameters. strMsg is a general
Integer) information message. intlevel should contain a nesting
level starting at 0 and is useful in indenting displays.
To obtain the msgEvent, declare the qbVariable object
Wi thEvents and write the event handler.

Public Property Name() As String Read-write property that returns and can set the name
of the object instance, which identifies the object in
error messages and on the XML tag that is returned by
object2XMl. The name defaults to qbVariablennnn date
time, where nnnn is a sequence number. This property
identifies the object instance. The VariableName prop-
erty identifies its data.

334
quickBasicEngine Reference Manual

Table B-7. qbVariable Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Shared Function Shared method that converts the .NET object objNet to
netValue2QBvariable(ByVal objNet a qbVariable. objNet may be Nothing or a .NET scalar of
As Object) As qbVariable ~eBoolean, Byte, Short, Integer, long, Single, Double,
or String. If objNet is Nothing, the Unknown qbVariable
is created and returned. If objNet is a .NET scalar, it is
converted to a string, which is used as the fromString
expression to create a new qbVariable. Therefore, the
qbVariable returned will have the narrowest QuickBasic
~e possible given the value of the .NET scalar. An er-
ror will occur if the .NET value cannot be assigned to a
QuickBasic type, such as when the .NET value is an Integer
beyond the Long precision of QuickBasic (-21\31..21\31-1).

Public Sub new Object constructor that creates the qbVariable and
inspects its initial state.
Public Sub new(ByVal strFromString Overloaded object constructor that creates the
As String) qbVariable and inspects its initial state. It sets the ~e
and the value of the new qbVariab1e to the strFromstring.
For example, objQBvariable = New qbVariable
("Integer:4") creates the variable with ~e Integer
and value 4.
Public Overloads Function Method that converts the state of the object to XML.
object2XMl() As String
Public Event progressEvent(ByVal Event that indicates progress through a loop inside one
strActivity As String, ByVa1 of the stateful procedures of qbVariab1e. strActivity
strEntity As String, ByVal describes the activity or goal of the loop. strEntity
intEntityNumber As Integer, identifies the entity being processed. intEnti tyNumber
ByVal intEntityCount As Integer, is the entity sequence number from 1. intEntityCount
ByVa1 intleve1 As Integer, ByVa1 is the number of entities. intleve1 is the nesting level of
strComments As String) the loop (starting at 0). strConvnents may supply addi-
tional information about the processing in the loop. To
obtain the progress Event, declare the qbVariable object
WithEvents and write the event handler. See also
progress Event Shared.

335
AppendixB

Table B-7. qbVariable Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Shared Event progress Event Event that indicates progress through a loop inside one
Shared(ByVal strActivity As String, of the stateless procedures of qbVariable. strActivity
ByVal strEntity As String, ByVal describes the activity or goal of the loop. strEntity
intEntityNumber As Integer, ByVal identifies the entity being processed. intEnti tyNumber
intEntityCount As Integer, ByVal is the entity sequence number from 1. intEnti tyCount
intLevel As Integer, ByVal is the number of entities. intLevel is the nesting level of
strComments As String) the loop (starting at 0). strComments may supply addi-
tional information about the processing in the loop. See
also progress Event.
Public Function stringIdentical Method that returns True when the value(s) of the
(ByVal objValue2 As qbVariable) instance is identical to the value(s) of the qbVariable
As Boolean object 0; False otherwise. The instance and objValue2
are converted to string format by the toString method.
The type information (as well as the colon separator) is
removed for the comparison.
Public Property Tag() As Object Read-write property that returns and can be set to user
data that needs to be associated with the qbVariable
instance. Tag can be a reference object. If so, when the
object is destroyed, the Tag object is not destroyed.
Public Overloads Function test Method that runs tests on the object. It returns True to
(ByRef strReport As String) indicate success or False to indicate failure. The strReport
As Boolean parameter is set to a test report. The tests are carried
out on an internal instance of the object, so their results
do not affect the main instance.
Public Overloads Function test Method that runs tests on the object. It returns True to
(ByRef strReport As String, ByVal indicate success or False to indicate failure. The
booMkObject As Boolean) As Boolean strReport parameter is set to a test report. The tests are
carried out on an internal instance of the object if
booMkObject is True; when booMkObject is False, the
tests use the object instance and their results will affect
the state of the object.
Public Function toDescription() Method that returns a description of the value consisting
As String of the description ofits type, followed by either "is
empty" or "contains nondefault values."
Public Function toMatrixO As String Method that returns a multiline, multicolumn display
of the array indexes and values suitable for display in
a monospace font.

336
quickBasicEngine Reference Manual

Table B-7. qbVariable Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Function toString 0 As String Method that returns the type and value of the qbVariable,
in the format described for fromString. lithe variable is
an array, the representation returned is packed, con-
densing series of identical elements using parenthesized
repetition counts as described under fromString. The
representation returns default values as asterisks if the
value of a scalar variable contains the default value
appropriate to its type, or each member in an array
value contains the default; or if the value of a scalar
variable contains the default value appropriate to its
type, or each member in an array value contains the
default. The variables in the output string are decorated
(using type (value) syntax) when the variable type is
either variant or variant array.
Public Function toStringTypeOnly() Method that returns only the type of the variable in the
As String serialized format acceptable to fromString.
Public Function toStringType Method that returns a string in the form type, toString,
WithType() As String where type is the string returned by toStringTypeOnl y
and toString is the string returned by toString.
Public ReadOnly Property UDTmember Indexed, read-only property that returns the qbVariable
(ByVal objMemberID As Object) As object that corresponds to a member of a UDT.
qbVariable objMemberID may be an index between 1 and the value
of Dope. UDTmemberCount for the object, or it may be
a member name. When objMemberID is a member name,
it may address submembers of nested UDTs if the path
to the submember is a series of period-separated
member names.
Public ReadOnly Property Usable() Read-only property that returns True if the object
As Boolean instance is usable; False otherwise.

Public Overloads Function value() Method that returns the value of the qbVariable as long
As Object as the object instance represents a scalar value, Unknown,
or Null or a Variant that contains a scalar value, Unknown,
or Null.

Public Overloads Function value Method that returns the value of the qbVariable when it
(ByVal intIndex As Integer) As is a one-dimensional array, at the entry indexed by
Object intIndex.

Public Overloads Function value Method that returns the value of the qbVariable when it
(ByVal intIndexl As Integer, ByVal is a two-dimensional array, at the entry indexed by
intIndex2 As Integer) As Object intIndexl and intIndex2.

337
AppendixB

Table B-7. qbVariable Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Overloads Function value Method that returns the value of the qbVariable. If
(ByVal strID As String) As Object strID is a null string, the object instance must repre-
sent a scalar value, Unknown, or Null or a Variant with
a value that is a scalar, Unknown, or Null. If strID is not
a null string, it must be a comma-separated list of array
indexes to access an array value or a UDT member
name. If strID is a UDT member name, it may be a
period-separated series of member names to get to
UDT submembers.

Public Overloads Function valueSet Method that assigns the .NET value objValue to the
(ByVal objValue As Object) As qbVariable as long as the object instance represents
Boolean a scalar value, Unknown, or Null or a Variant that
contains a scalar value, Unknown, or Null.
Public Overloads Function value Set Method that assigns the .NET value objValue to the
(ByVal objValue As Object, ByVal qbVariable when the object instance represents a one-
intIndex As Integer) As Boolean dimensional array, at the entry indexed by intIndex.
Public Overloads Function valueSet Method that assigns the .NET value objValue to the
(ByVal objValue As Object, ByVal qbVariable when the object instance represents a two-
intIndexl As Integer, ByVal dimensional array, at the entry indexed by intIndexl
intIndex2 As Integer) As Boolean and intIndex2.
Public Overloads Function valueSet Method that assigns the .NET value objValue to the
(ByVal objValue As Object, ByVal qbVariable. If strID is a null string, the object instance
strID As String) As Boolean must represent a scalar value, Unknown, or Null or a
Variant with a value that is a scalar, Unknown, or Null. If
strID is not a null string it must be a comma-separated
list of array indexes to access an array value or a UDT
member name. If strID is a UDT member name, it may
be a period-separated series of member names to
modify UDT submembers.

338
quickBasicfngine Reference Manual

Table B-7. qbVariable Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Property VariableName Read-write property that returns and can change the
As String name of the variable. VariableName defaults to typennnn,
such as intoOol or vntIntegeroOo2. type is the three-
character Hungarian prefix designating the variable
type: boo, byt, int, lng, sgl, dbl, str, vnt, arr, typ, unk, or
nul. For a variant or array, the prefix vnt or arr is followed
by the propercase full name of the variant's contained
type or the array's entry type. nnnn is the variable's se-
quence number. The name must conform to QuickBasic
rules: from 1 to 31 characters long, start with a letter,
and contain only letters, numbers, and the underscore.
VariableName the variable and changes when it has not
been assigned except by default or the variable Dope
(type) is changed in any way. See also the Name property.

qbVariableType
This qbVariableType class represents the type of a quickBasicEngine variable, includ-
ing support for an unknown type and Shared methods for relating .NET types to
QuickBasic types.
References of qbVariableType include collectionUtilities.DLL, qbScanner.DLL,
qbToken1)rpe.DLL, and utilities.DLL.
qbVariableType is serially threadable. Multiple instances can run simultane-
ously in multiple threads, but errors will result if one object's procedures run in
multiple threads and in parallel.
Note that this class implements IDisposable, ICloneable, and IComparable.

The Variable Types of the quickBasicEngine


The variable types supported by quickBasicEngine fall into these general classes:

• Scalars: Ordinary values with no structure. They can have the type
Boolean, Byte, Integer, Long, Single, or String.

• Variants: Variables capable of containing variables including scalars and


even arrays.

339
AppendixB

• Arrays: Variables with l..n dimensions. In QuickBasic and this implemen-


tation, at each dimension, an array has flexible lower and upper bounds.

NOTE In QuickBasic and this implementation, variants cannot contain vari-


ants. The elegance of such an idea is totally outweighed by its uselessness.
Flexible lower bounds are another nearly useless idea, but here they were
a part of both QuickBasic and Visual Basic up to .NET.

• User Data Types (UDTs): Variables that contain 1 ..members, which may
be a mix of scalars, Variants, or Arrays but cannot be nested UDTs.

• Unknown: As its name implies, the type we don't know. In this implemen-
tation, Unknown is assigned to the variable type in the constructor. 9

• Null: A variable type primarily for assignment as the initial value of


a Variant value.

When an Array, Variant, or UDT is represented, this qbVariableType class also


contains the Variant's type, the Array type, and array dimensions or the collec-
tion of member types. These are delegates within the main object.
Table B-8 shows the types exposed by the qbVariableType object.

Table B-8. '/}'pes Exposed by qbVariable'/}'pe


Type Description
ENUvarType.vtBoolean True or False. The default is False.
ENUvarType.vtByte Unsigned integer in the range 0.. 255. The default is O.
ENUvarType.vtlnteger Integer in the range -32768 .. 32767. This is different from
VB .NET and like VB 6. The Integer is a Short in VB .NET.
The default is O.
ENUvarType.vtLong Integer in the range -2"31..2"31-1. This is different from
VB .NET and like VB 6. The Long is an Integer in VB .NET.
The default is O.
ENUvarType.vtSingle Real number in the Single precision range of VB .NET. This
is not fully compatible with Microsoft's old QuickBASIC.
The default value of Single is O.

9. In a planned future EGN implementation of a language for symbolic computation (which will
probably be called FOG), this will be used to actually calculate with mystery values.

340
quick8asicEngine Reference Manual

Table B-B. Types Exposed by qbVariableType (continued)


Type Description
ENUvarType.vtDouble Real number in the Double precision range of VB .NET.
This is not fully compatible with Microsoft's old
QuickBASIC. The default value of Double is O.

ENUvarType.vtString String restricted to 64KB when quickBasicEngine is com-


piled with QUICKBASICENGINE_EXTENSION set to False. The
string restricted to the VB .NET string limit when this
compile-time symbol is True. The default value of string
is the null string.

ENUvarType.vtVariant Variant proto-object container for another value, which


can be any type (including Array) except Variant itself. The
default value of Variant is Nothing. Most variants will occur
set to a contained type; but the abstract variant exists as a
valid special state of this data type for variants inside arrays.
This is a vtVariant variable type for which the Abstract
property will return True.

ENUvarType.vtArray Array of any dimensionality. It has no default. Unlike the


Variant, an abstract array is not supported. This is because
an array in this implementation of QuickBasic is always an
array of a definite type specified for each entry, including
a Variant type, which is abstract.

ENUvarType.vtUDT UDT container for I..n scalars, Variants and/or Arrays.

ENUvarType.vtUnknown Default special value. No default for this "default" is


defined.

ENUvarType.vtNull Used primarily for certain Variants. It is a dummy,


uninitialized value. There is no default.

Containment and Isomorphism of Variable Types


The containedType and isomorphicType methods of the qbVariableType class
ensure that one type can be converted safely to another type.
Type a is contained in type b when:

• Type a and type b are scalars (Boolean, Byte, Integer, long, Single, Double, or
String) and all possible values of type a convert without error to type b.

341
AppendixB

• Type a and type b are arrays, and each dimension of type b contains the
same number of elements as the corresponding dimension of a, or more
elements. The array entry type of a is contained in the array entry type
of b according to this overall definition. Note that lowerBounds of a and b
may differ.

• a is a scalar, a Variant, or an Array, and b is a Variant.

Type a and type b are isomorphic types when a is contained in b, and b is


contained in a. Scalar types are isomorphic only when identical, but array defini-
tions may differ in lower bounds.
If either a or b is a UDT, Null, or Unknown, the types are never considered to
contain each other.

The fromString Expression Supported by qbVariableType


The state of the variable type is represented in an expression accepted by the
fromString method of this class and generated by the toString method, known as
the fromString expression.
The fromString expression contains the type of a variable as the overall type
name, extended (for Variants, Arrays, and UDTs) with additional type information:

• For a scalar type, the type should be one of Boolean, Byte, Integer, Long,
Single, Double, or String.

• For a Variant, the type should be Variant,sca]arType, Variant, (arrayType)


or Variant, (userDataType). The scalar type should be as described in the
previous item. The array type or UDT should be in parentheses and as
described in the following items.

• For an Array, the type should be Array, type, bounds, where type is the name
of a scalar type, the keyword Variant, or a parenthesized UDT definition.
The type of variant arrays is specified "abstractly" and with no associated
scalar type.

• For a UDT, the type should be UDT, memberlist, where the member list
consists of one or more comma-separated and parenthesized member
definitions. Each definition in the member list has the parenthesized form
(name, type), where name is the member name and type is its type. The
type must be scalar, abstract Variant, or Array.

• For the Unknown type, the type should be Unknown.

• For the Null type, the type should be Null.

342
quickBasicEngine Reference Manual

The following are some examples of fromString expressions with various types.

TIP You can run the qbVariableTypeTester executable (provided with the sam-
ple code) and try each example. 1Ype it in the text box at the top of the screen and
click Create Variable 1Ype to make sure the example creates theqbVariableType
object. Then click the Describe button to verify that the fromString expression
converts to the variable type specified with the exampl£s.

Integer

specifies the 16-bit integer type.

Variant, Integer

specifies a variant that contains a 16-bit integer.

Array,Integer,o, 3

specifies a one-dimensional integer array.

Array, Integer, 1,2, 1,2

specifies a two-dimensional integer matrix.

UDT,(intMember01,Integer), (strMember02,Array,String,1,2) , (typMember03,


(udt,(intMember01,Integer)))

specifies a UOT containing an integer, a string array, and an inner UOT.

Cache Considerations for qbVariableType


The qbVariableType object avoids excessive parsing of fromString expressions to
create types by using a cache to save parsed types. The cache is a keyed Collection
in Shared storage. Each item of this collection contains the clone of a preexisting
variable type object; the key of each item is its fromString expression. The cache
will contain a maximum of 100 entries, and the oldest entries are dropped when
the cache is full.
Information about the cache, in the form of a list of its entries and its maxi-
mum size, is available with a special parameter of the object2XML method for
converting the object state to an XML tag. See the object2XML method for more
information.

343
AppendixB

/romString BNF Syntax in qbVariableType


The lexical syntax of fromString expressions matches that of the quickBasicEngine
itself: blanks can be freely used, and strings are delimited by double quotes (with
doubled double quotes representing, inside strings, the occurrence of a single
double quote).

typespecification := baseType I udt


baseType := simpleType I variantType I arrayType
simpleType := [VT] typeName
type Name := BOOLEAN IBYTE IINTEGER ILONG ISINGLE IDOUBLE ISTRING I
UNKNOWNINULL
variantType := abstractVariantType COMMA varType
varType := simpleTypel(arrayType)
arrayType := [VT] ARRAY,arrType,boundList
arrType := simpleType I abstractVariantType I parUDT
parUDT := LEFTPARENTHEsIs udt RIGHTPARENTHEsIs
udt := [VT] UDT,typeList
typeList := parMemberType [ COMMA type List ]
parMemberType := LEFTPAR MEMBERNAME,baseType RIGHTPAR
abstractVariantType := [VT] VARIANT
bound List := boundListEntry I boundListEntry COMMA bound List
boundListEntry := BOUNDINTEGER,BOUNDINTEGER

qbVariableType Inspection Rules


The following inspection rules are used by the inspect method:

• The object instance must be usable.


• The type must be compatible with the contained type and, when the type is
Array, with the bounds. If the type is scalar (Boolean, Integer, Long, or String),
the contained type must be Nothing. If the type is Variant, the contained
type must be Null, a scalar, or Array. If the type is Array, the contained type
must be Null, a scalar, or a Variant. If the type is UDT, the contained type
must be a collection of scalar, Variant, or Array types.

• The type that is contained in the Variant or Array must pass its own
inspection; each type in a UDT must likewise pass its own inspection.
• When the object is cloned, the clone must return the same tostring value
as the original object.

• When the fromString value of the object is used to set the value of a new
instance, the compareTo method must indicate that the original instance
and the new instance are identical.

344
quickBasicEngine Reference Manual

An internal inspection is carried out when the object is constructed and inside
the disposeInspect method. If the inspection fails, the object is marked as unusable.
qbVariableType has the capability, supported by the optional parameter
booBasic of the inspect method, to carry out the default, extended inspection or
a basic inspection. If basic inspection is in effect only the first three inspection
rules are applied.

Properties, Methods, and Events of qbVariableType


Table B-9lists the properties, methods, and events of the qbVariableType class.

Table B-9. qbVariable1}pe Properties, Methods, and Events


Property/Method/Event Description
Public Shared ReadOnly Property Shared, read-only property that returns information
About As String about the class.
Public ReadOnly Property Abstract Read-only property returns True when the object
As String instance represents a variant array.lo
Public Function arraySlice() As Method that creates a new variable type based on the
qbVariableType type of the instance, which must be an array. The new
type is created by removing the top-levellowerBound and
the upperBound. The returned variable type is an array of
one lower dimension, a Variant, or a scalar. When the
instance type is not an array, this method returns
a variable type of Unknown with no other error indication.
Public ReadOnly Property Read-only property that returns a null string when the
BoundList() As String instance type is not an array, or it returns the bounds of
the array type as a comma-delimited list oflower and
upper bounds. For each dimension in the array, the
expression lower, upper is returned.
Public ReadOnly Property BoundSize Indexed, read-only property that returns the size of an
(ByVal intDimension As Integer) array variable at dimension d (upperBound -lowerBound
As Integer + 1). An error occurs when the variable is not an array
or the dimension is not defined.

10. Variant arrays do not have a contained type.

345
AppendixB

Table B-9. qbVariableType Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Shared Function changeArray Shared method that modifies an array bound in
Bound(ByVal strFromstring As a fromString expression, returning the modified
String, ByVal intRank As Integer, fromString expression. strFromstring is the fromString
ByVal intLU As Integer, ByVal expression of an array type, intRank is the dimension at
intChange As Integer) As String which the bound should be changed. intLU is 0 to mod-
ify the lower bound or 1 to modify the upper bound.
intChange is a positive value to increase the lower or
upper bound or a negative value to decrease the bound.

Public Shared ReadOnly Property Shared read -only property that returns the class
ClassName name qbVariableType.

Public Shared Function clone As Method that implements ICloneable. It creates a new
qbVariableType qbVariableType with identical type information,
returning it as the function value. ll

Public Overloads Function Method that compares the object instance to


compareTo(ByVal objQBvariableType2 qbVariableType2 and returns True when the types in
As qbVariableType) As Boolean both are the same; False otherwise. This method is
a wrapper for the private compareTo_ method, which
implements IComparable.

Public Overloads Function compareTo Method that compares the object instance to qbVariable2
(ByVal objQBvariableType2 As and returns True when the types in both are the same;
qbVariableType, ByRef strExplanation False otherwise. This method is a wrapper for the pri-
As String) As Boolean vate compareTo_ method, which implements IComparable.
This overload places an explanation of why the types
are identical or different in its strExplanation parameter.
Public Shared Function contained Shared method that returns True when the type identi-
Type(ByVal enuTypel As ENUvarType, fied by enuTypel is contained in the type identified in
ByVal enuType2 As ENUvarType) As enuType2; False otherwise. See the preceding section
Boolean "Containment and Isomorphism of Types."

Public Shared Function contained Shared method that returns True when the type identi-
Type(ByVal enuTypel As ENUvarType, fied by enuTypel is contained in the type identified in
ByVal objType2 As qbVariableType) objType2; False otherwise. See the preceding section
As Boolean "Containment and Isomorphism of Types."

11. At this writing, the clone method runs slowly because it (1) serializes the type information using
tostring into a fromString expression for the type, and (2) uses fromString on the new object to
parse and set the type. A Friend variant of clone is used internally. It copies the state directly, but
this has not been fully tested and is not ready for prime time. It should be fully tested as a replace-
ment for the original clone to make qbVariable1yPe applications run faster.

346
quickBasicEngine Reference Manual

Table B-9. qbVariableType Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Shared Function contained Shared method that returns True when the type identi-
Type(ByVal objTypel As fied in objTypel is contained in the type identified by
qbVariableType, .ByVal enuType2 enuType2; False otherwise. See the preceding section
As ENUvarType) As Boolean "Containment and Isomorphism of Types."
Public Shared Function contained Shared method that returns True when the type identi-
Type(ByVal objTypel As fied in objTypel is contained in the type identified in
qbVariableType, ByVal objType2 objType2; False otherwise. See the preceding section
As qbVariableType) As Boolean "Containment and Isomorphism of Types."
Public Function containedTypeWith Method that returns True when the type identified by
State(ByVal enuTypel As ENUvarType, enuTypel is contained in the type identified in enuType2;
ByVal enuType2 As ENUvarType) As False otherwise. This method works the same way as
Boolean the corresponding overload of containedType, but it
creates the reusable containment matrix in the state of
the object using it, which will result in faster processing.

Public Function containedTypeWith Method that returns True when the type identified by
State(ByVal enuTypel As ENUvarType, enuTypel is contained in the type identified in objType2;
ByVal objType2 As qbVariableType) False otherwise. This method works the same way as
As Boolean the corresponding overload of containedType, but it
creates the reusable containment matrix in the state of
the object using it, which will result in faster processing.
Public Function containedTypeWith Method that returns True when the type identified in
State(ByVal objTypel As objTypel is contained in the type identified by
qbVariableType, ByVal enuType2 enuType2; False otherwise. This method works the same
As ENUvarType) As Boolean way as the corresponding overload of containedType,
but it creates the reusable containment matrix in the
state of the object using it, which will result in faster
processing.
Public Function containedTypeWith Method that returns True when the type identified in
State(ByVal objTypel As objTypel is contained in the type identified in objType2;
qbVariableType, ByVal objType2 As False otherwise. This method works the same way as
qbVariableType) As Boolean the corresponding overload of containedType, with the
difference that it creates the reusable containment
matrix in the state of the object using it, which will
result in faster processing.

347
AppendixB

Table B-9. qbVariable1Ype Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Function containedTypeWith Method that returns True when the type identified in
State(ByVal objTypel As qbVariable objTypel is contained in the type identified in objType2;
Type, ByVal objType2 As qbVariable False otherwise. See the preceding section "Containment
Type, ByRef strExplanation As String) and Isomorphism of1YPes." This method creates the
As Boolean reusable containment matrix in the state of the object
using it, which will result in faster processing. This
method also places an explanation of why or why not
the state is contained in strExplanation.

Public Overloads Function Method that returns the default value associated with
defaultValue As Object the type in the object instance. For all scalars, this method
returns False (for the type Boolean), String (for the type
String) or 0 (for the numeric scalar types). For Variants,
Null types, and UDTs, this method returns Nothing. For
nonvariant arrays, this method returns the default type
of the array's entry.

Public Overloads Shared Function Shared method that returns the default value associ-
defaultValue(ByVal enuType As ated with the type specified in enuType. For all scalars,
ENUvarType) As Object this method returns False (for the type Boolean), String
(for the type String) or 0 (for the numeric scalar types).
For Variants, Null types, and UDTs, this method returns
Nothing. enuType may not specify an array.
Public ReadOnly Property Read-only property that returns the number of dimen-
Dimensions() As Integer sions associated with an Array. This property returns 0
with no other error indication when the variable is not
an Array.
Public Sub dispose() Method that disposes of the heap storage associated with
the object (if any) and marks the object as not usable. For
best results, use this method (or disposeInspect) when
you are finished with the object.

Public Function disposeInspect() Method that disposes of the heap storage associated
As Boolean with the object (if any) and marks the object as not
usable. For best results, use this method (or dispose)
when you are finished with the object. This method
conducts a final object inspection. See the preceding
"qbVariable Inspection Rules" section.
Public Function fromString(ByVal Method that sets the variable type to the value serialized
strFromstring As String) As Boolean in strFromstring. See the preceding section "The
fromString Expression Supported by qbVariable1YPe"
for the syntax requirements of strFromstring.

348
quickBasicEngine Reference Manual

Table B-9. qbVariableType Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Overloads Function Method that returns Nothing when the object instance
innerType() As qbVariableType type is Null, Unknown, or scalar. For a Variant or an Array,
this method returns the variable type of the value con-
tained in the Variant or the Array entry. This overload
of innerType may not be called when the object instance
type is a UDT.
Public Overloads Function Method that returns the type of a member of a UDT.
innerType(ByVal objlndex As Object) If the object instance type is not UDT, this method
As qbVariableType overload causes an error. objlndex may be the UDT
member sequence number or the member name. If it
is a member name, objlndex may identify submembers
by separating distinct member names with periods.
Public Function inspect(ByRef Method that inspects the object instance for errors. See
strReport As String, Optional the preceding section "qbVariableType Inspection Rules."
ByVal booBasic As Boolean = False) The optional booBasic parameter can be passed as True
As Boolean to suppress the extended inspection rules identified in
that section.
Public Function isArrayO As Boolean Method that returns True for all array types; False
otherwise.
Public Overloads Function Method that returns True when objNetValue is the
isDefaultValue(ByVal objNetValue default value for the object instance's type; False
As Object) As Boolean otherwise.
Public Overloads Shared Function Shared method that returns True when objNetValue is
isDefaultValue(ByVal objNetValue the default value for the type identified in enuType;
As Object, ByVal enuType As False otherwise.
ENUvarType) As Boolean
Public Function isNullO As Boolean Method that returns True when the type is Null; False
otherwise.
Public Overloads Function Method that returns True when all values of the object
isomorphicType(ByVal objType2 instance type can be converted without loss of infor-
As qbVariableType) As Boolean mation or error to values of objType2 and all values of
objType2 can be converted to the object instance type.
For more on isomorphism, see the preceding section
"Containment and Isomorphism of Variable Types."

349
AppendixB

Table B-9. qbVariable1}pe Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Overloads Function Method that returns True when all values of the object
isomorphicType(ByVal objType2 instance type can be converted without loss of infor-
As qbVariableType, ByRef mation or error to values of objType2 and all values of
strExplanation As String) objType2 can be converted to the object instance type.
As Boolean This overload of the isomorphicType method provides
an explanation of why the types are or are not isomorphic.

Public Function isScalar() Method that returns True when the type is scalar; False
As Boolean otherwise. This method returns True only when the type
is one of the scalar types Boolean, Byte, Integer, Long,
Single, Double, or String. This method returns False
for a variant with a scalar contained type.

Public Shared Function Shared method that returns True when the type identi-
isScalarType(ByVal enuType fied in enuType is scalar; False otherwise. This method
As ENUvarType) As Boolean returns True only when enuType is one of the scalar types
Boolean,Byte,Integer, Long,Single,Double,orString.

Public Shared Function Shared method that returns True when the type identi-
isScalarType(ByVal strType As fied by name in strType is scalar; False otherwise. This
String) As Boolean method returns True one when strType is one of the
scalar types Boolean, Byte, Integer, Long, Single, Double,
or String.
Public Function isUDT 0 As Boolean Method that returns True when the type is a UDT; False
otherwise.
Public Function isUnknown() Method that returns True when the type is Unknown;
As Boolean False otherwise.
Public Function isVariant() Method that returns True when the type is Variant;
As Boolean False otherwise.

Public Property LowerBound(ByVal Indexed, read-write property that returns and can change
intDimension As Integer) As Integer the lower bound of an array at the dimension d, which
starts at 1 for the major dimension. If the qbVariableType
isn't an array or d is invalid, an error occurs. The lower
bound may not be changed to a value that is greater
than the upper bound, because this would leave the
qbVariableType object in an invalid state. See redimension
for a method that can change the lower and upper bounds
of an array in one statement.

350
quickBasicEngine Reference Manual

Table B-9. qbVariable'!Ype Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Overloads Shared Function Shared method that creates a random array specifier (as
mkRandomArray() As String its toStringlfromString string in the form Array, <type>,
<dimensions», with one to three random dimensions.
The array size at each dimension is restricted to 20
elements, and the lower and upper bounds of each
dimension are random, in the range -5 .. 5. The array will
randomly be anyone of the types Boolean, Byte, Integer,
Long, Single, Double, String, or Variant. If it is Variant,
it will randomly contain one of the scalar types.
/
Public Overloads Shared Function Shared method that creates a random afray specifier (as
mkRandomArray(ByVal strScalarTypes its toStringlfromString string in the form Array,
As String) As String <type>, <dimensions». This overload works in the same
way as the previous one, but it allows the type to be
restricted to one of a comma-separated list of type
names made up of the types Boolean, Byte, Integer,
Long, Single, Double, String, or Variant.
Public Overloads Shared Function Shared method that creates a random variable type as
mkRandomDomain() As ENUvarType one of Boolean, Byte, Integer, Long, Single, Double,
String, Variant, Array, Unknown, or Null.
Public Overloads Shared Function Shared method that creates a random variable type as
mkRandomDomain(ByVal booScalar As one of Boolean, Byte, Integer, Long, Single, Double,
Boolean) As ENUvarType String, Variant, Array, Unknown, or Null. If its booScalar
parameter is True, this method won't return Variant,
Array, Unknown, or Null.
Public Overloads Shared Function Shared method that creates a random variable type as
mkRandomScalar() As String one of Boolean, Byte, Integer, Long, Single, Double, or
String.
Public Overloads Shared Function Shared method that creates a random variable type as
mkRandomScalar(ByVal strTypes As one of the types identified in the comma-delimited list
String) As String in strTypes.

Public Overloads Shared Function Shared method that creates a random scalar value in
mkRandomScalarValue() As Object .NET form as one of Boolean, Byte, (I6-bit Short) Integer,
(32-bit) Long, Single, Double, or String.

351
AppendixB

Table B-9. qbVariable1}pe Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Shared Function Shared method that returns a random QuickBasic type
mkRandomType() As String expressed as the fromString for that type. With 20%
probability, this type will be one of Array, scalar, Variant,
or UDT. With 10% probability, this type will be Unknown
or Null.
Public Shared Function Method that creates a random UDT specifier (as its
mkRandomUDT() As String toString/fromString string). This UDT will have 1..10
members. Each member will be a random type selected
with equal probability from scalar, Variant, Array, or UDT,
except that not more than five nested UDTs are allowed
Public Shared Function Shared method that creates a random value of the Variant
mkRandomVariant() As String type, containing a nonvariant type. The Variant is re-
turned as a string in the form type (value), where type
is one ofthe QuickBasic scalar types Boolean, Byte,
Integer, Long, Single, Double, or String; and value is the
value, which will be True, False, a number, or a string.
This notation is referred to as decorated notation.
Public Shared Function Shared method that creates a Variant type (containing
mkRandomVariantValue() As String a random scalar value). Note that this method cannot
create a Variant that contains an array.
Public Shared Function Shared method that creates a new qbVariableType
mkType(ByVal enuType As ENUvarType) based on the ENUvarType enumerator enuType, which
As qbVariableType may not be an Array but can be a Variant.
Public Function mkUnusable Method that forces the object instance into the unusable
As Boolean state. It always returns True.

Public Property Name() As String Read-write property that returns and can set the name
of the object instance, which identifies the object in
error messages and on the XML tag that is returned by
object2XML. The name defaults to qbVariableTypennnn
date time, where nnnn is a sequence number.
Public Shared Function name2NetType Shared method that converts the system's name for a
(ByVal strSystemName As String) .NET type (such as System. Int32) to the generic name of
As String one of the .NET types used to support QuickBasic
variables (such as Integer). name2NetType will convert
System. Int64 to Integer.
Public Shared Function netType2Name Shared method that converts the generic name for a
(ByVal strNetType As String) As .NET type (such as Integer) to the system name of the
String .NETtype (such as System. Int32).

352
quickBasicEngine Reference Manual

Table B-9. qbVariable1Jpe Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Shared Function Shared method that returns the QuickBasic type used
netType2QBdomain(ByVal strType As to represent the .NET scalar value as an ENUvarType. See
String) As ENUvarType also qbDomain2NetType.

Public Shared Function Shared method that returns the narrowest QuickBasic
netValue2QBdomain(ByVal objValue type to which the .NET object in objValue converts
As Object) As ENUvarType without error as an ENUvarType enumerator. This method
returns ENUvarType. vtUnknown when the .NET object does
not convert to any QuickBasic type. It does not return
Boolean, Array, or Variant. It returns one of Byte, Integer,
long, Single, Double, or String. It returns Null when the
.NET value is Nothing. It returns Unknown (with no other
error indication) when the .NET value converts to no
other values. See also netValueInQBdomain.

Public Shared Function Shared method that converts the .NET value in
netValue2QBvalue(ByVal objValue objValue to a .NET value in the narrowest .NET type
As Object) As Object that corresponds to a QuickBasic type.

Public Shared Function Shared method that converts the .NET value in objValue
netValue2QBvalue(ByVal objValue to the .NET type that corresponds to the QuickBasic
As Object, ByVal enuType As type specified in enuType.
ENUvarType) As Object

Public Shared Function Shared method that converts the .NET value in
netValue2QBvalue(ByVal objValue objValue to the .NET type that corresponds to the
As Object, ByVal strType As String) QuickBasic type identified by strType.
As Object
Public Overloads Function Method that returns True when the .NET value in
netValueInQBdomain(ByVal objValue objValue may be converted without error to the
As Object) As Boolean variable type in the object instance.
Public Overloads Shared Function Shared method that returns True when the .NET value
netValueInQBdomain(ByVal enuType in objValue may be converted without error to the
As ENUvarType, ByVal objValue As variable type identified by enuType.
Object) As Boolean

Public Shared Function Shared method that returns True when the .NET value
netValueIsScalar(ByVal objValue in objValue is one that can represent one of the
As Object) As Boolean QuickBasic scalar values.

Public Sub new Object constructor that creates the qbVariableType and
inspects its initial state.

353
AppendixB

Table B-9. qbVariableTYpe Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Sub new(ByVal strFromString Overloaded object constructor that creates the
As String) qbVariableType and inspects its initial state. It sets
the type to the strFromstring. For example,
objQBvariableType = New qbVariableType("Integer")
will create the variable type Integer.

Public Shared Function object2Type Method that returns the corresponding fromString
(ByVal objValue As Object) As String expression for its type for any .NET object. If objValue is
a scalar with a .NET scalar type that corresponds to a
QuickBasic type (Boolean, Byte, Short, Integer, Single,
Double, or String), that type is returned. If objValue is
a .NET Long but in the range -2"31..2"31-1, the Long
type is returned. If objValue is any other value, Unknown
is returned.

Public Overloads Function Method that converts the state of the object to XML. By
object2XML(Optional ByVal default, information concerning the variable type cache
booIncludeCache As Boolean = True) won't be included in the output XML, but the optional
As String parameter booIncludeCache may be passed as True to
include the serialized cache information. See the previous
section "Cache Considerations."

Public Shared Function Shared method that converts the type identified by
qbDomain2NetType(ByVal enuDomain enuDomain to the .NET type that is used to contain values
As ENUvarType) As String of this type. The enuDomain cannot be Unknown, Null, or
Array. The .NET type is returned as the string name of
the type; it will be in the form SYSTEM. type.

Public Function redimension(ByVal Method that redimensions an array type to the lower
intDimension As Integer, ByVal bound and upper bound specified. It doesn't allow the
intLowerBound As Integer, ByVal lower bound to be greater than the upper bound, but it
intUpperBound As Integer) As Boolean does avoid the problem that occurs when you need to
sequentially change the lower bound to a value that is
higher than the upper bound, and the upper bound to
a new valid value, or vice versa, using the LowerBound
and UpperBound properties.

Public Function scalarDefault() Method that returns the default value applicable to the
As Object type. If the type is Null or Unknown, it returns Nothing. If
the type is scalar, it returns the default for the scalar type.
If the type is array, it returns the default for the array entry.
If the type is concrete variant (a Variant with a known
embedded type), it returns the default for the embedded
type. If the type is abstract variant (a Variant with an
unknown embedded type), it returns Nothing.

354
quick8asicEngine Reference Manual

Table B-9. qbVariable1}'pe Properties, Methods, and Events (continued)


Property/Method/Event Description
Public ReadOnly Property Read-only property that returns the total number of
StorageSpace() As Integer abstract cells occupied by the variable type. Scalar and
Variant values occupy one cell. Arrays occupy the
number of cells equivalent to their total size in array
elements. Unknown values occupy 0 cells. Null values
occupy 1 cell. UDTs occupy the sum of space occupied
by their members.

Public Shared Function Shared method that converts a string type name to an
string2enuVarType(ByVal strInstring enuVarType enumerator. strInstring is one ofvtBoolean,
As String) As ENUvarType vtByte,vtInteger,vtlong,vtSingle,vtDouble,vtString,
vtVariant, vtArray, vtUDT, vtNull, or vtUnknown. The pre-
fix vt may be omitted, and the name is case-insensitive.

Public Property Tag() As Object Read-write property that returns and can be set to user
data that needs to be associated with the qbVariableType
instance. Tag can be a reference object. If so, the Tag ob-
ject isn't destroyed when the object is destroyed.

Public Overloads Function Method runs tests on the object, and returns True to
test(ByRef strReport As String) indicate success; False otherwise. The strReport refer-
As Boolean ence parameter is set to a test report. The test consists of
four phases: a series of random fromString expressions
are built and used to create new qbVariableType ob-
jects,l2 the defaultValue method is tested to make sure
it provides valid defaults, the testing type containment
methods are run for a series of known results, and finally,
the various "domain-mapping" methods (which convert
.NET values and types to QuickBasic values and types)
are tested. If the test fails, the object instance is marked
as not usable. A test method will be exposed by this object,
unless the compile-time symbol QBVARIABlETYPE_NOTEST
is set explicitly to True in the project properties for
qbVariableType. 13

Public Shared ReadOnly Property Shared method that returns True if the version of
TestAvailable() As Boolean qbVariableType, running this method, was compiled
with the compile-time symbol QBVARIABlETEST_NOTEST
either omitted or set to False.

12. Of course, these new qbVariableType objects are inspected for validity when created.
13. Set the compile-time symbol QBVARIABLETYPE_NOTEST to True in the project to suppress the
generation of the test method.

355
AppendixB

Table B-9. qbVariable1}rpe Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Event testEvent(ByVal Event that is fired during the execution of the test method
strDesc As String, ByVal intLevel for testing stages and testing events. strDesc describes
As Integer) the stage or event, and intLevel is a nesting level that
starts at O. To obtain this event, the qbVariableType in-
stance must be declared Wi thEvents, a handler for the
testEvent must be supplied, and the compile-time
symbol QBVARIABLETEST_NOTEST must be omitted or set
to False.
Public Event test Progress Event Event that is fired during the execution of the test
(ByVal strDesc As String, ByVal method. It reports progress inside loops. strDesc
strEntity As String, ByVal describes the loop goal. strEntity describes the entity
intEntityNumber As Integer, ByVal being processed in the loop. intEntityNumber is the
intEntityCount As Integer) number of the entity. intEntityCount is the total num-
ber of entities. To obtain this event, the qbVariableType
instance must be declared WithEvents, a handler for the
testEvent must be supplied, and the compile-time
symbol QBVARIABLETEST_NOTEST must be omitted or set
to False.
Public Overloads Function Method that returns the name of the type contained in
toContainedName() As String a Variant or the entry type of an Array. If the type in the
object instance does not represent a Variant or an Array.
this method returns a null string.
Public Overloads Function Method that returns the name of the type contained in
toContainedName(ByVal objIndex As a UDT at the indexed member, where objIndex may be
Object) As String the position (from 1) of the member or the name of the
member. Submembers of member UD1S can be accessed
as a series of period-separated names.

356
quickBasicEngine Reference Manual

Table B-9. qbVariableType Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Function toDescription() Method that returns a readable description of the variable
As String type. For a scalar, it returns the scalar type name, such
as Integer. For an abstract variant, it returns '~stract
Variant."14 For a concrete variant, it returns "Variant
containing desc," where desc is the description of the
contained type. For a one-dimensional array, it returns
"I-dimensional array containing n elements from m to
p: each element has the type desc." For a two-dimensional
array, it returns "2-dimensional array with n rows (from
m to p) and q columns (from rto s): each element has
the type desc." For an n-dimensional array, it returns
"n-dimensional array with these bounds: list: each ele-
ment has the type desc." For a UDT, it returns "UDT;
list," where listis a list of scalar, Array, and Variant de-
scriptions. For an Unknown type, it returns "Unknown
type." For a Null type, it returns "Null type."

Public Function toNameO As String Method that returns the name of the variable type in
the object instance. Unlike toString, it returns only
Variant, Array, or UDT.

Public Overrides Function Method that returns the type in the qbVariableType, in
toString() As String the format described for fromString in the preceding
section "The fromString Expression Supported by
qbVariableType." If the variable is an array, the repre-
sentation returned is packed, condensing series of
identical elements using parenthesized repetition
counts. The representation returns default values as
asterisks if the value of a scalar variable contains the
default value appropriate to its type, or each member
in an array value contains the default. The variables in
the output string are decorated (using type( value)
syntax) when the variable type is either Variant or
Variant Array.

14. An abstract variant is a variant with an unknown contained type.

357
AppendixB

Table B-9. qbVariab/eType Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Function toStringVerify Method that returns the type and value of the
(Optional ByVal booVerify As qbVariableType, in the format described for fromString
Boolean = False) As String in the preceding section "The fromString Expression
Supported by qbVariableType." Setting toStringVeri fy
(booVerify:=True) causes the objectto try to create an
object using the synthesized toString, and the object
will then verify that the created object clones the original
object. If this test fails, an error is thrown and the object
marks itself as unusable.
Public ReadOnly Property Read-only property that returns the number of members
UDTmemberCount() As Integer when the object instance represents a UDT. If the object
instance does not represent a UDT, an error occurs.
Public Function udtMemberDeref Method that "dereferences" strMemberName, because it
(ByVal strMemberName As String) converts it to the index (from 1) of the corresponding
As Integer UDT member in the variable type instance. If the object
instance does not represent a UDT, an error occurs.
Public Function udtMemberName(ByVal Method that "dereferences" objIndex, which is normally
objIndex As Object) As String the index (from 1) of a UDT member, and this method
converts objIndex to the member name. The index may
also be the actual member name in which case this
method performs no useful work.
Public Function udtMemberType Method "dereferences" objIndex, which will be either
(ByVal objIndex As Object) As the index (from one) of a UDT member, or a member
qbVariableType name, and this method returns the type at the indexed
UDTmember.
Public Property UpperBound(ByVal Indexed, read-write property that returns and can change
intDimension As Integer) As Integer the upper bound of an array at the dimension d, which
starts at 1 for the major dimension. If the qbVariableType
isn't an array or d is invalid, an error occurs. UpperBound
may not be changed to a value that is less than LowerBound,
because this would leave the qbVariableType object in
an invalid state. See redimension for a method that can
change the lower and upper bounds of an array in one
statement.
Public ReadOnly Property Usable() Read-only property that returns True if the object
As Boolean instance is usable; False otherwise.
Public Overloads Shared Function Method that returns True when strFS contains a valid
validFromString(ByVal strFS As fromString expression; False otherwise. Validity is
String) As Boolean determined by creating a qbVariableType, which is then
destroyed.
358
quickBasicEngine Reference Manual

Table B-9. qbVariable'JYpe Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Overloads Shared Function Method that returns True when strFS contains a valid
validFromString(ByVal strFS As fromString expression; False otherwise. All overloads of
String, ByRef objTest As validFromString determine validity by actually creating
qbVariableType) As Boolean a qbVariableType, and this overload returns the variable
created.
Public ReadOnly Property Read-only property that returns the type inside the
VariableType() As ENUvarType object instance. VariableType is the main variable type.
See also VarType.

Public Shared ReadOnly Property Shared, read-only property that returns a space-
VariableTypeList() As String delimited list of all the variable types supported:
Boolean Byte Integer Long Single Double String
Variant Array Unknown Null.
Public ReadOnly Property VarType() Read-only property that returns the type contained
As ENUvarType inside the object instance type. For Variants, this is the
type of the Variant's value. For Arrays, it is the entry
type. For everything else, this property returns
ENUvarType. vtUnknown, with no other error indication.
VarType is the contained variable type. See also
VariableType.
Public Shared Function Shared method that adds the vt prefix to strName,
vtPrefixAdd(ByVal strName As unless it is present, and returns the result. IS
String) As String
Public Shared Function Shared method that removes the vt prefix from
vtPrefixRemove(ByVal strName As strName, when it is present, and returns the result.
string) As String

quickBasicEngine
The quickBasicEngine class does all scanning, parsing, and interpretation for this
version of QuickBasic. It may be dropped in to a .NET application, and it will
provide the ability to evaluate immediate Basic expressions, as well as compile
and run Basic programs.
References of quickBasicEngine include collectionUtilities.DU., qbOp.DU.,
qbPolish, qbScanner.DU., qbToken, qbToken'JYpe.DU., qbVariable, qbVariable'JYpe,
and utilities.DU..

15. Variable type enumerator names are prefixed by vt.

359
AppendixB

quickBasicEngine is fully thread-safe. See the documentation header in the


source code for instructions on modifying the source code while maintaining the
thread safety of this code.
Note that this class implements ICloneable and IComparable.

The quickBasicEngine Data Model and States


The state of the quickBasicEngine class consists of all source code for a program,
its scanned representation, and the names and the structured values of all vari-
ables found in the code. It also includes various processing details, such as the
trivial queue maintained for the legacy Read Data instruction.

NOTE No explicit parse tree is built, because this is unnecessary. Parse infor-
mation is available just-in-time through the parse Event, which is fired for
each distinct grammar symbol.

At any time, instances of this class are usable or nonusable, and one of the
following states:

• Ready to run: When the object is fully initialized, and after normal proce-
dures have terminated, it is Ready to run.

• Running (n threads): When the object is running normally, it is Running.


It may be simultaneously running procedures in more than one thread.

• Stopping: When the user has requested a stop, through the stopQBE
method, but threads are still running, the object is Stopping.

• Stopped: When the user has requested a stop, through the stopQBE
method, and no threads are still running, the object is Stopped.

The usability and running states, and the number of running threads, are
available through methods and properties.
At the end of a successful New constructor, the instance becomes usable and
Ready to run. Asuccessful or failed execution of dispose makes the object unusable.
A serious internal error, such as failure to create a resource, or using the object
after a serious error is reported makes the object not usable. The mkUnusable method
may also be used to force the object into the unusable state. The Usable property
tells the caller whether the object is usable.

360
quickBasicEngine Reference Manual

Usability makes run status moot because an unusable instance won't run. It
should be disposed, a new instance should be created, and the processing that
created the error should not be repeated.
The stopQBE method places the object in a Stopping state immediately, and it
puts the object in a Stopped state when all running threads have terminated. While
an unusable object cannot be made usable, a Stopped object can be restored to
active duty using the resumeQBE method
When the object instance is Stopped, the state of the engine becomes imme-
diately Stopping as an atomic operation. Then the following occurs:

• Any executing For or Do loop in the engine, which issues loop events, is
exited as soon as the loop event is issued.

• If the interpreter is running, the interpreter is exited after an undefined


amount of time, after the object is stopped. The time is the duration from
the issuance of stopQBE and the point at which the interpreter arrives at
the head of its For loop.

• No Public procedure will execute, and all Public procedures will return
default values, until Stopped is set to False.

The resumeQBE method places the object in the Ready state if the object is
stopped. The resume method has no effect when the object is already Ready or is
in the Running or Stopping states.
The getThreadStatus method returns the run status as one of ready, running,
stopping, or stopped.
When the quickBasicEngine is Running, the runningThreads method returns
the number of threads that are running methods and properties as a number
between 1 and n. When quickBasicEngine is Stopped or Ready, runningThreads
returns o.

Inspection Rules of the quickBasicEngine


The following inspection rules are used by the inspect method:

• The object instance must be usable.

• The scanner object must pass its own inspection.

• The collection of qbPolish instructions must contain qbPolish objects


exclusively. If the collection contains fewer than 10 1 objects, each object
must pass the qbPolish. inspect inspection; if more 100 objects exist, a ran-
dom selection of objects is inspected.

361
AppendixB

• The collection of qbVariable variables must conform to its expected struc-


ture; see the source code for details.

• The subroutine and function index must conform to its expected struc-
ture; see the source code for details.

• The constant expression index must conform to its expected structure; see
the source code for details.

• If the inspection is failed, the object becomes unusable.

An internal inspection is carried out in the constructor and inside the


dispose method.

Properties, Methods, and Events of quickBasicEngine


Table B-IO lists the properties, methods, and events of the quickBasicEngine class.

Table B-1 O. quickBasicEngine Properties, Methods, and Events


Property/Method/Event Description
Public Shared ReadOnly Property Shared, read-only property that returns information
About As String about the class.
Public Function assemble() As Method that assembles the Polish tokens, replacing
Boolean symbolic labels with numeric addresses and removing
comment lines inserted by the compiler. Assembly is a
two-pass process: pass one converts the dictionary map-
ping labels to addresses, and pass two replaces the labels.
Public Function assembled () As Method that returns True if the current source code has
Boolean been assembled already; False otherwise.

Public Property AssemblyRemovesCodeO Read-write property that returns and may be set to True
As Boolean to remove remarks inserted by the compiler and label
statements inserted during assembly, or False to sup-
press this removal. By default, this removal occurs. Setting
this property to False does not change the effect of
QuickBasic code, only its efficiency.
Public Shared ReadOnly Property Shared read-only property that returns the class
ClassName name: quickBasicEngine.
Public Function clearO As Boolean Method that resets the engine to a start state by ensur-
ing all reference variables are cleared. You don't need to
execute it in the normal case, as long as you use dispose
to responsibly clean up the compiler.

362
quickBasicEngine Reference Manual

Table B-1 O. quickBasicEngine Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Function clearStorage() As Method that sets all variables in the current program to
Boolean the default value for their type.

Public Shared Function clone As Method that implements ICloneable and returns a
quickBasicEngine clone of the instance object. The clone consists ofiden-
tical code (including comments and white space patterns)
and identical run mode options, including optimization,
but it may not be in the same state as the cloned object.
The clone is always n the initial, unexecuted state. When
passed to the compareTo method, the clone returns True.

Public Event codeRemoveEvent(ByVal Event that is triggered whenever code is removed from
objQBsender As qbQuickBasicEngine, the compiled set of qbPolish tokens: objQBsender iden-
ByVal intOpIndex As Integer) tifies the quickBasicEngine, and intOplndex identifies
the index of the operation removed.

Public Shared Function codeType Shared method that returns the type of strCode as a
(ByVal strCode As String) As String string. immediateCommand is returned when the code is
a valid expression. program is returned when the code
is a valid executable program. Otherwise, invalid is
returned.

Public Function compareTo(ByVal Method that compares the instance object to the
objQBE As quickBasicEngine.qbQuick quickBasicEngine identified in objQBE, and returns True
BasicEngine) As Boolean when objQBE clones the instance. objQBE clones the in-
stance when the source code of objQBE and that of the
instance are identical, except for white space, and all
global options such as the Constant Folding property are
the same. The compilation and assembly of the two ob-
jects and their storage values may differ. The compareTo
method implements the IComparable interface.

Public Function compileO As Boolean Method that compiles the source code to unassembled
interpretive code. This method won't proceed to assembly
on a successful compile, it will scan code that has not
been scanned already.

Public Function compiled() As Method that returns True if the current source code has
Boolean been compiled already; False otherwise.

363
AppendixB

Table B-lO. quickBasicEngine Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Event compileErrorEvent(ByVal Event that is triggered by errors in the source code.
objQBsender As qbQuickBasicEngine, objQBsender identifies the quickBasicEngine reporting
ByVal strMessage As String, ByVal the error. strMessage is the error text, which might have
intIndex As Integer, ByVal more than one line. intIndex is the index of the char-
intContextLength As Integer, ByVal acter at which the error occurs. intContextLength is the
intLinenumber As Integer, ByVal length of the error's context (the source code probably
strHelp As String, ByVal strCode responsible for the error). intLineNumber is the line
As String) nwnber of the error. strCode is the line of source code
at which the error occurs.
Public Property ConstantFolding () Read-write property that returns and may be set to True
As Boolean or False, to control constant folding. When Constant Folding
is True, all expressions and subexpressions in the source
code that consist exclusively of constants and operators
are evaluated by the compiler, not at runtime. IS When
Constant Folding is False, all subexpressions in the
source code that consist exclusively of constants and
operators are compiled normally to code.
Public Property DegenerateOpRemovalO Read-write property that returns and may be set to True
As Boolean or False to control the removal of degenerate operations.
When DegenerateOpRemoval is True, all operations known
to have no effect at compile time (including addition of
the constant 0 and multiplication by the constant 1) are
removed. When DegenerateOpRemoval is False; all degen-
erate operations generate code for runtime evaluation.
Public Overloads Function dispose Method that disposes of the object and it cleans up any
As String reference objects in the heap. This method marks the
object as unusable. This overload always conducts an
internal inspection of the object instance (using the
inspect method), and throws an error if the inspection
is failed. For best results, use this method when you are
finished using the object in code. See the next method
for an overload that allows inspection to be skipped.
Public Overloads Function dispose Method that disposes of the object and cleans up any
(ByVal booInspect As Boolean) As reference objects in the heap. This method marks the
String object as unusable. This overload inspects the object
instance unless dispose(False) is used. For best results,
use this method when you are finished using the object
in code.

16. This can speed up runtime. For example, in a+1+1 when Constant Folding is True, the sub-
expression 1+1 is evaluated by the compiler. This example is contrived (as in stupid) but
many code and business rule generators may yield such contrived, stupid examples.
364
quickBasicEngine Reference Manual

Table B-1 O. quickBasicEngine Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Shared ReadOnly Property Shared, read-only property that returns the
EasterEgg() As String quickBasicEngine's Easter egg of dedicatory, ceremonial
language and quotes, setting a dignified, high-toned
atmosphere.

Public Overloads Shared Function Shared, "lightweight" method that evaluates the string
eval(ByVal strExpression As String) s as a single expression in QuickBasic notation or as a
As qbVariable.qbVariable series of statements (separated by colons), followed by
a colon and then a final expression. For example, s may
be a series of let assignment statements that set variable
values, followed by an expression. The value of the expres-
sion is returned as a qbVariable object. This method
creates a new quickBasicEngine with all of the default
values and default properties, and the evaluated string
is executed using default values and default properties.
See also evaluate.
Public Overloads Shared Function Shared, "lightweight" method that evaluates the string
eval(ByVal strExpression As String, s as a single expression in QuiclcBasic notation or as a
ByRef strlog As String) As series of statements (separated by colons), followed by
qbVariable.qbVariable a colon and then a final expression. This overload works
like the previous overload, but it places the evaluation
event log in the strlog string, passed by reference. See
the Event log property for details on the format of event
logs. See also evaluate.
Public Overloads Shared Function Shared, "lightweight" method that evaluates the string s
eval(ByVal strExpression As String, as a single expression in QuiclcBasic notation or as a
ByRef booError As Boolean) As String series of statements (separated by colons), followed by
a colon and then a final expression. The value of the ex-
pression is returned, as a string. The reference parameter
booError is set to True on any error (and a null string is
usually returned). booError is set to False on an error-
free evaluation. The eval method creates a new
qUickBasicEngine with all of the default values and
default properties, and the evaluated string is executed
using default values and default properties. See the
Eventlog property for details on the format of event
logs. See also evaluate.

365
AppendixB

Table B-1 O. quickBasicEngine Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Overloads Function evaluate Shared, "heavyweight" method that evaluates the string
(ByVal strExpression As String) As strExpression as a single expression in QuickBasic no-
qbVariable.qbVariable tation or as a series of statements (separated by colons),
followed by a colon and then a final expression. For
example, strExpression may be a series of Let assign-
ment statements that set variable values, followed by an
expression. The value of the expression is returned as a
qbVariable object. The evaluate method uses the current
state of the object running the evaluation. See also eval.
Public Sub evaluate{ByVal Shared, "heavyweight" method that evaluates the string
strExpression As String, ByRef strExpression as a single expression in QuickBasic no-
objValue As qbVariable) tation or as a series of statements (separated by colons),
followed by a colon and then a final expression. For
example, strExpression may be a series of Let assign-
ment statements that set variable values, followed by an
expression. The value of the expression is returned in
objValue as a reference parameter. The evaluate method
uses the current state of the object running the evalua-
tion. This version of evaluate is provided primarily to
allow threads to easily use it through the standard call.
See also eva!.
Public Overloads Shared Function Shared, "heavyweight" method that evaluates the string
evaluate(ByVal strExpression As strExpression as a single expression in QuickBasic no-
String, ByRef strLog As String) tation or as a series of statements (separated by colons),
As qbVariable.qbVariable followed by a colon and then a final expression. For
example, strExpression may be a series of Let assign-
ment statements that set variable values, followed by an
expression. The value of the expression is returned as
a qbVariable object. The evaluate method uses the
current state of the object running the evaluation. This
overload places the evaluation event log in the strLog
string, passed by reference. See the EventLog property
for details on the format of event logs. See also eval.

Public ReadOnly Property Read-only property that returns the result of the most
EvaluationO As qbVariable.qbVariable recent evaluate method or Nothing when no such
result exists.
Public Function evaluationValue{) Method that returns the result of the most recent
As qbVariable.qbVariable evaluate method, or Nothing when no such result
exists.

366
quick8asicEngine Reference Manual

Table B-1 O. quickBasicEngine Properties, Methods, and Events (continued)


Property/Method/Event Description
Public ReadOnly Property EventLog() Read-only property that returns a Collection of items,
As Collection each of which represents an event raised by a Ra iseEvent
in the QuickBasic engine. The event log is populated
only when the EventLogging parameter is set to True.
This property is useful in finding events issued by a
quickBasicEngine that has not been defined using
WithEvents. Each item in the event log is a three-item
subcollection: item(l) identifies the event, item(2) is
the event date and time, and item(3) is a list of the
event operands. In item(3), each operand is separated
by a newline and is in the format name=value.
Public Overloads Function Method that returns a list suitable for display in a mono-
eventLog2ErrorList() As String spaced font (such as Courier New) of all compiler and
interpreter error events in the object instance event log.

Public Overloads Shared Function Shared method that returns a list suitable for display in
eventLog2ErrorList(ByVal colEventLog a monospaced font (such as Courier New) of all compiler
As Collection) As String and interpreter error events in the object instance event
log passed as colEventLog. colEventLog must be in the
format described for the EventLog property.
Public Overloads Function Method that formats the event log in the object instance
eventLogFormat() As String in a way best viewed in a monospace font such as Courier
New, and returns the formatted log as a string.
Public Overloads Function Method that formats the event log in the object instance
eventLogFormat(ByVal intStartIndex in a way best viewed in a monospace font such as
As Integer) As String Courier New, and returns the formatted log as a string.
The returned log starts at intStartIndex.
Public Overloads Function Method formats the event log in the object instance in
eventLogFormat(ByVal intStartIndex a way best viewed in a monospace font such as Courier
As Integer, ByVal intCount As New, and returns the formatted log as a string. The
Integer) As String returned log starts at intStartIndex and contains at
most intCount entries.

Public Overloads Shared Function Shared method that formats the event log passed as
eventLogFormat(ByVal colEventLog colEventLog in a way best viewed in a monospace font
As Collection) As String such as Courier New, and returns the formatted log as
a string.

Public ReadOnly Property Read-write property that returns and may be set to True
EventLogging() As Boolean or False to control the generation of event logs.

367
AppendixB

Table B-1 o. quickBasicEngine Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Overloads Function Method that returns the thread status, while discounting
getThreadStatus() As String its own effect, as one of the strings Initializing, Ready,
Running, Stopping, or Stopped. The value returned is non-
deterministic, because while getThreadStatus discounts
its own effect, the status may change while it is execut-
ing.l7 See also runningThreads.
Public Function inspect(ByRef Method that inspects the object instance for errors. An
strReport As String, Optional internal inspection is carried out when the object is
ByVal booBasic As Boolean = False) constructed and inside the dispose Inspect method. If
As Boolean the inspection fails, the object is marked as unusable.
See the preceding section "quickBasicEngine Inspection
Rules." The optional booBasic parameter can be passed
as True to suppress the extended inspection rules.

Public Property InspectCompiler Read-write property that returns and may be set to True
Objects() As Boolean when objects created by the compiler need to be in-
spected when disposed. Its default is False. Set this
option to True when testing the compiler and modifi-
cations as a way to be sure that objects dont include
buggy code. Setting this option will slow the compiler
down. When this option is True, the following com-
piler object types will be inspected when they are
disposed: The scanner, each variable that is created
during compilation and interpretation (including its
type), and the quickBasicEngine.
Public Function interpretO As Object Method that interprets the compiled code (it will
scan, compile, and assemble the source code as
needed). This method will return an Object. If the
stack is empty at the end of interpretation, this method
returns True. If the stack contains one entry at the end
of interpretation, it returns that entry, which will be
a qbVariable. If the stack contains multiple entries at
the end of interpretation, it returns False. This method
does QuickBasic input and output by means of events.
See interpretInputEvent and interpret Print Event for
details.

17. For this reason, getThreadStatu5 should be used for entertainment purposes only; for exam-
ple, to display the nondeterministic status in a GUI.

368
quickBasicEngine Reference Manual

Table B-J o. quickBasicEngine Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Event interpretErrorEvent Event that is triggered when an error occurs in the
(ByVal objQBsender As interpreter. objQBsender identifies the quickBasicEngine.
QuickBasicEngine, ByVal strMessage strMessage is the error message (which may contain
As String, ByVal intIndex As multiple lines). intIndex identifies the position of the
Integer, ByVal strHelp As String) Polish instruction causing the error. strHelp may
contain additional error information.
Public Event interpretInputEvent Event that is triggered when an Input statement is
(ByVal objQBsender As executed and the interpreter needs input. The event
QuickBasicEngine, ByRef strChars handler should usually prompt through the GUI for
As String) input and place the input characters in strChars.
objQBsender identifies the quickBasicEngine that
requires the input.

Public Event interpret Print Event Event that is triggered when a Print statement is exe-
(ByVal objQBsender As cuted. The event handler should usually display the
QuickBasicEngine, ByVal strOut string output string as-is, or the Print statement may be in
As String) use to return results to a business rules interface.
Public Event interpretTraceEvent Event that is triggered prior to each interpreter execu-
(ByVal objQBsender As tion of each Polish opcode. objQBsender identifies the
qbQuickBasicEngine, ByVal quickBasicEngine. intIndex is the index of the Polish
intIndex As Integer, ByVal opcode. objStack is the stack prior to executing the
objStack As Stack, ByVal opcode. colStorage is the variable collection prior to
colStorage As Collection) executing the opcode. The Shared stack2String method
is available for serializing the stack, and the Shared
storage2String method is available for serializing
variable storage.
Public Event loopEvent(ByVal Event that is triggered inside loops inside the
objQBsender As qbQuickBasicEngine, quickBasicEngine.objQBsenderidentifiesthe
ByVal strActivity As String, ByVal quickBasicEngine. strActivity identifies the loop
strEntity As String, ByVal intNumber activity. strEntity identifies the entity being processed.
As Integer, ByVal intCount As intNumber identifies the number of the current entity.
Integer, ByVal intLevel As Integer, intCount identifies the total number of entities.
ByVal strComment As String) intLevel identifies the nesting level starting at O.
strComment may provide additional information about
the loop.

Public Function mkUnusable Method that forces the object instance into the
As Boolean unusable state. It always returns True.
Public Event msgEvent(ByVal Event that is triggered by general messages inside the
objQBsender As qbQuickBasicEngine, quickBasicEngine.objQBsenderidentifiesthe
ByVal strMessage As String) quickBasicEngine. strMessage is the message.

369
AppendixB

Table B-lO. quickBasicEngine Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Function msilRun() As Object Method that translates the interpretive Nutty Professor
code into MSIL, generates a simple dynamic assembly,
and runs the code. This method returns the final stack
value as a function value, which is a .NET value. IS If
Polish operations exist that cannot be translated, this
method throws an error and returns Nothing. If the
MSIL code leaves an empty stack, this method throws
an error and returns Nothing.
Public Property Name() As String Read-write property that returns and can set the name
of the object instance, which will identify the object in
error messages and on the XML tag that is returned by
object2XML. The name defaults to quickBasicEnginennnn
date time, where nnnn is a sequence number.
Public Sub new Object constructor that creates the quickBasicEngine
and inspects its initial state.
Public Overloads Function object2XML Method that converts the state of the object to XML. By
(Optional ByVal booAboutComment As default, the About information of this class is not in-
Boolean = False, Optional ByVal cluded in the returned tag, but it can be included using
booStateComment As Boolean = True) booAboutComment : =True. By default, line-by-line comments
As String describing each state variable are included in the
returned tag, but they can be suppressed using
booStateComment:=False.
Public Event parseEvent(ByVal Event that is triggered when a terminal or nonterminal
objQBsender As qbQuickBasicEngine, grammar category is parsed. objQBsender identifies the
ByVal strGrammarCategory As String, quickBasicEngine. strGrammarCategory identifies the
ByVal booTerminal As Boolean, ByVal grammar category. booTerminal is True (grammar cate-
intSrcStartIndex As Integer, ByVal gory is a terminal) or False. intSrcStartIndex is the
intSrcLength As Integer, ByVal start index of the code corresponding to the grammar
intTokStartIndex As Integer, ByVal category as a character index from 1. intSrcLength is
intTokLength As Integer, ByVal the character length of the code corresponding to the
intObjStartIndex As Integer, ByVal grammar category. intTokStartIndex is the start index
intObjLength As Integer, ByVal of the code corresponding to the grammar category as
strComment As String, ByVal a token index from 1. intTokLength is the token length
intLevel As Integer) of the code corresponding to the grammar category.
intObj StartIndex is the start index of the output Polish
code corresponding to the grammar category from 1.
intObjLength is the length of the Polish code corres-
ponding to the grammar category. strHelp may contain
additional information about the parse. intLevel is the
parse nesting depth from O.

18. At this writing, only a few Polish opcodes are trarIslatable to MSIL.

370
quickBasicEngine Reference Manual

Table B-1 o. quickBasicEngine Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Event parseFailEvent(ByVal Event that is triggered when the parser has attempted
objQBsender As qbQuickBasicEngine, to parse a grammar category and failed. This doesn't
ByVal strGrammarCategory As String) necessarily indicate an error. objQBsender identifies the
quickBasicEngine. strGrammarCategory identifies the
grammar category.
Public Event parseStartEvent(ByVal Event that is triggered when the parser starts a parse
objQBsender As qbQuickBasicEngine, attempt. objQBsender identifies the quickBasicEngine.
ByVal strGrammarCategory As String) strGrammarCategory identifies the grammar category.

Public ReadOnly Property Read-only property that returns the collection of


PolishCollection() As Collection qbPolish objects corresponding to the compiled code.
This collection may be Nothing if there is no compiled
code.

Public Function reset() As Boolean Method that resets the quickBasicEngine.

Public Function resumeQBE() Method that puts the quickBasicEngine in the Ready
As Boolean state when it is in the Stopped state. If the object is in
any other state, resume has no effect and results in no
error. For best results, clear the quickBasicEngine after
resuming it.

Public Overloads Function rune) Method runs the immediate command or program in
As Boolean the quickBasicEngine. The code will be scanned, com-
piled, and! or assembled as needed.
Public Overloads Function run(ByVal Method that runs the immediate command or program
strRunType As String) As Boolean in the quickBasicEngine. The run type of immediateCommand
or program may be specified in strRunType.
Public Function runningThreads() Method that returns the number of threads that are
As Integer running procedures inside the quickBasicEngine as a
number between 0 and n. This method includes its own
thread. The value returned is nondeterministic, because
the status may change while it is executing. The value
will always be one or greater because runningThreads
includes its own thread. See also getThreadStatus.

Public Function scan() As Boolean Method that scans the source code.

Public Event scanEvent(ByVal Event that is triggered when the scanner has found the
objQBsender As qbQuickBasicEngine, next token. objQBsender identifies the quickBasicEngine.
ByVal objToken As qbToken.qbToken) objToken identifies the token.

Public Function scanned() As Boolean Method that returns True when the current source code
has been scanned; False otherwise.

371
AppendixB

Table B-IO. quickBasicEngine Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Function scanner() As Method that returns the qbScanner object associated
qbScanner with the source code. If no scanner has been created, it
may return Nothing.
Public Property SourceCode() As Read-write property that assigns and returns the current
String source code. When it is assigned, storage is emptied of
all variables, and the indicators showing that lexical
analysis, compile, and assembly are complete are set to
False.

Public Shared Function stack2String Shared method that formats a stack of qbVariables
(ByVal objStack As Stack) As String such as the stack returned by the interpretEvent. The
formatted stack is best viewed in a monospace font
such as Courier New.
Public Function stopQBE As Boolean Method that puts the quickBasicEngine in the Stopped
state when it is in the Ready or Running state. If the
object is in the Stopped state already, it has no effect
and results in no error. If the object is in the Running
state, (1) any executing For or Do loop is exited as soon
as the loop event is issued, (2) if the parser is running,
the compiler is exited when the next grammar category
is recognized, (3) if the interpreter is running, the inter-
preter is exited as soon as the interpreter's loop event is
issued.
Public Shared Function Shared method that formats a collection of qbVariables
Storage2String(ByVal col Storage such as is returned by the interpret Event as the inter-
As Collection) As String preter storage. The formatted storage is best viewed in
a monospace font such as Courier New.
Public Property Tag() As Object Read-write property that returns and can be set to user
data that needs to be associated with the
quickBasicEngine instance. It's a kind of post-it note.
The Tag can be a reference object. If so, the Tag object
is not destroyed when the object is destroyed.
Public Overloads Function test(ByRef Method that runs tests on the object. It returns True to
strReport As String) As Boolean indicate success or False to indicate failure. The
strReport reference parameter is set to a test report.
Public Overloads Function test Method that runs tests on the object. It returns True to
(ByRef strReport As String, indicate success or False to indicate failure. The strReport
ByVal booEventLog As Boolean) reference parameter is set to a test report. The booEventLog
As Boolean parameter may be specified as True to get an event log
inside the report.

372
quickBasicfngine Reference Manual

Table B-1 O. quickBasicEngine Properties, Methods, and Events (continued)


Property/Method/Event Description
Public Shared ReadOnly Property Shared method that returns True if the version of
TestAvailable() As Boolean quickBasicEngine running this method was compiled
with the compile-time symbol QBVARIABlETEST_NOTEST
either omitted or set to False.
Public Event testEvent(ByVal strDesc Event that is fired during the execution of the test
As String, ByVal intlevel As Integer) method for testing stages and testing events. strOesc
describes the stage or event, and intlevel is a nesting
level that starts at O. To obtain this event, the
quickBasicEngine instance must be declared WithEvents,
a handler for the testEvent must be supplied, and the
compile-time symbol QBVARIABlETEST_NOTEST must be
omitted or set to False.

Public Event testProgressEvent(ByVal Event that is fired during the execution of the test
strOesc As String, ByVal strEntity method. It reports progress inside loops. strOesc
As String, ByVal intEntityNumber As describes the loop goal. strEntity describes the entity
Integer, ByVal intEntityCount As being processed in the loop. intEntityNumber is the
Integer) number of the entity. intEnti tyCount is the total number
of entities. To obtain this event, the quickBasicEngine
instance must be declared Wi thEvents, a handler for the
testEvent must be supplied, and the compile-time
symbol QBVARIABlETEST_NOTEST must be omitted or set
to False.

Public Event threadStatusChangeEvent Event that is raised when the number of threads nmning
(ByVal objQBsender As quickBasicEngine code changes or the quickBasicEngine
qbQuickBasicEngine) is stopped. objQBsender is the handle of the sender
quickBasicEngine.
Public Event userErrorEvent(ByVal Event that is triggered when there is an error in using
objQBsender As qbQuickBasicEngine, the procedures of this object, as opposed to an error in
ByVal strOescription As String, the QuickBasic source code. objQBsender identifies the
ByVal strHelp As String) quickBasicEngine. strOescription identifies the error
(and it may contain more than one line). strHelp
identifies additional help information.

373
AppendixB

END START
-Fragment of IBM 1401 assembler code

This is the end.


-Jim Morrison, The Doors

The rest is silence.


-Shakespeare, Hamlet

Of that which we cannot speak of thereupon must we be silent.


-Wittgenstein

That's all folks.


-Porky Pig

In my end is my beginning.
-T. S.Eliot

374
Index
Symbols assemble method, quickBasicEngine , 214
assemblers, 205-218
.NET See under N history, 205-212
09/11, system implications, 284-285
machine language and, 206-207
macro assemblers, 210
A quickBasicEngine, 212
for statement, C language, 280-281
abstractmachdnes, 98-101 lazy evaluation example, 277
abstract variant types, 136, 143 removing comments and labels, 217
access rule enforcement, 43 assertions tested by
action object, CreditEvaluation, 250 qbScannerTest.inspect: 123-:126 .
addFactor, IntegerCalc, 38 assignment statements, BasIC, QuiCkBasIC
addFactorRHS procedure, 175-179 compilerdefirrltion, 71-72
addition operations, subtraction associative operators, 178-179
regarded as, 38-39 asterisks
ADO (Active Data Objects), 16 in regular expressions, 33, 95
Aho, Alfred et. al., Compilers: Principles, qbVariable defaults, 146
Techniques and Thols, 49, 90, 106, auto manufacturer example, 243-244
203,285
Algol programming language, 2, 12,
283-284 B
Algorithm Design Manua~ the, by backslashes
Stephen S Skiena, 137, 170 escaping metacharacters, 33, 35
algorithms in regular expressions, 96-97
Alan Thring on , 205 backtracking problem, 75
hash algorithms, 208 Backus, John, programming language
recursive-descent algorithms, 172 pioneer, 2, 12
aliasing and the C language, 5 Backus-Naur Form. See BNF
alternation stroke operator, 55 bankruptcy rule, benign contradiction
in regular expressions, 96 illustration, 258
ambiguity banks, Ogden Nash on, 247
in regular expressions, 98,131 Basic programming language, 5
intersecting nonterminals, 75-76 alleged deficiencies, 51-52
ampersand character, QuickBasic, 112 compilers for, 8, 9
Anatomy of a Compiler, by John A.N. suitability ofVB.NET for, 10
Lee,3,12 need for formal definitions, 69
And operator, lazy vs. busy And, benign contradictions, 257, 258-261
275-278 binary files, .NET, 24
AndAlso operator, 48, 79, 276-277 bison tool, 81, 113, 181
anticipatory scans, qbScanner, 116, 120 See also yacc
Appleman, Dan, 130,131 . blanks between tokens, 115
APR contradictions, credit evaluation block structured languages, 2, 4-5
application,258,261 Blunden, Bill, Software Exorcism:
Arithmetic, .NET and QuickBasic A Handbook for Debugging and
differences, 185 Optimizing Legacy Code, 211
array variables, QuickBasic, 138 BNF (Backus-Naur Form)
convertibility, 140 analysis using the bnfAnalyzer tool,
justification for qbVariable storage, 52-67
163 avoiding looping, 175
mapping to .NET objects, 150-151 Basic language and, 52
qbVariable. toString/fromString, 146 capabilities, 54
qbVariableType serialization, 143

375
Index

coding rules, 56 Programming Languages and Their


compared with regular expressions, RElation to Automata, by Jeff
100-101 Hopcroft, and Jeff Ullman, 100
design approach, 37-40 Programming Systems and
as a top-down process, 69 Languages, by Saul Rosen, 180
exercise in using, 90 The Psychology of Computer
guidelines for effective use, 81 Programming, by Gerald M.
indirect exposure to users, 82 Weinberg, 243, 265
language definition using, 283 Software Exorcism: A Handbook for
operators, 55 Debugging and Optimizing Legacy
parse results, representations, 39 Code, by Bill Blunden, 211
parsing phase use, 28, 35-44 The Trouble with Dilbert: How
QuickBasic compiler Corporate Culture Gets the Last
analysing, 77--81 Laugh, by Norman Solomon,
building, 68-76 245-246,266
quickBasicEngine, 290-294 Visual Basic.NET and the .NET Plat-
regular expression rules, 94 form, by Andrew Troelsen , 26, 229
requirements analysis based on, 52 What Not How: The Business Rules
resources on, 90 Approach to Application Develop-
syntax, 54-56 ment, by C.J. Date, 247, 266
outline derived from bnfAnalyzer, booUsable variable, 124
63 boxed comments, 64
qbVariable fromString bug in regular expressions, 103, 105
expressions, 331-332 built-in functions, quickBasicEngine,
qbVariablelYPe fromString 294-296
expressions, 343 business rules
tools, 36-37 assembly code for an example, 257
transformation to code, 40-44 capturing, as QuickBasic expressions,
valid specifications of, 39-40 200
validation, importance of, 81 contradictory and redundant rules,
bnfAnalyzer tool 257-263
About screen,58, effects on programs, 263
BNF analysis using, 52-67 credit evaluation application, 251-252
documenting languages, 283 documenting, 252-253
grammar test using, 57 implementing, 243--266
main screen, 58 logically sealed, 252
outline BNF syntax from, 63 real word examples, 243--245
Reference Manual Options screen, resources on, 265
59, 77 tiered architectures and, 246
regular expression syntax manual, 94 translating to QuickBasic, 250
status report, 88-90
technical notes, 82-90
tools, 85--88 c
bnfGrammar, bnfAnalyzer tool, 55 C programming language
books basis of CTS, 19
The Algorithm Design Manual, by external library use, 274
Stephen S Skiena, 137, 170 flaws in, 279-281
Anatomy ofa Compiler, by John A.N. and languages derived from, 5
Lee, 3, 12 redefinition capability, 269
The C Programming Language, by C Programming Language, The, by Brian
Brian Kernighan, 11 Kernighan, 11
Compilers: Principles, Techniques and C++ programming language
Tools, by Alfred Abo et. al. , 49, 90, popularity for writing compilers, 181
106,203,285 advantages ofVB.NET over, 10
A History ofModem Computing, by caching parsed fromString expressions,
Paul E. Ceruzzi, 206 160,343
Inside Microsoft .NET ILAssembler, Cain's amulet analogy, 173, 174
by Serge lidin, 229 caret character in regular expressions, 104
376
Index

case study. See credit evaluation collectionUtilities tool, 150


case-sensitivity, BNF identifiers, 56 colon, as delimiter, 145
Ceruzzi, Paul E., A History ofModem COLparse'Itee collection, bnfAnalyzer, 83
Computing, 206 string conversion, 85
characters sets colPolish collection, quickBasicEngine,
ASCII and Unicode, 279 msilRun_O function and, 238, 240
distinguished from strings, 97 colVariables collection,
in regular expressions, 35, 97 quickBasicEngine, 226
checkToken method, IntegerCalc, 42 COM, bnfAnalyzer tool and, 53
class libraries, .NET, 23-24 COM functions, Visual Basic, regular
cloneable objects, 142 expressions for, 101, 107
CLR (Common Language Runtime) comments
code generation to, 229-242 BNF, 56, 81
constant loading and push opera- Visual Basic, regular expressions for,
tions,240-241 106
introduced, 20 comparable objects, 142
mapping QuickBasic variables to, compiler generators
149-151 code generators and, 7
not an interpreter, 28, 44 yacc tool and, 37
portability and, 22-23 compiler_addFactorRHS procedure,
reliability, 23 example quickBasicEngine
resources on, 229, 242 procedure, 175-179
RPNand,44 compiler_binaryOpGeIL procedure,
CLS (Common Language Specification), 185, 186
20,26 compiler_checktoken_ procedure,
clsUtilities class, bnfAnalyzer, 87 terminal value checking, 174
Cobol programming language, 4 compiler_constantEval_ procedure,
digital switch usage example, 244-245 186
code generation compiler~enCode_ procedure,
for complete MSIL code, 242 complete MSIL code and, 242
Or operator and, 74 compiler_mulOp procedure, 177
quickBasicEngine,194-198 compiler_term_ procedure, multiply
examining the generated code, 197 factor check, 184
recursive-descent algorithms and, compilers
180-181 See also IntegerCalc; QuickBasic
code optimization compiler; quickBasicEngine
code optimization phase, 27 design, resources on, 132
constant folding technique, 184-185 developments by the Unix
credit evaluation application and, 263 community, 285
inline optimization, 187 history of the technology, 1-11
lazy evaluation, 186 resources on, 12-13
need for code generator wrapper, 178 Just In Time (JIT) compilation, 6-7, 20
optimizers in comercial compilers, 92 phases of operation, 27-28, 91-92
problems with stacks, 47, 48 predating assemblers, 206
quickBasicEngine,183-187 RISC machines, 132
code reuse, .NET Framework approach, Compilers: Principles, Techniques and
15 Tools, by Alfred Aho et. al. , 49, 90,
collection structure 106,203,285
qbScannerTest inspection, 125 computer languages. See programming
collections languages
alternatives to hash tables, 209 concatenation in regular expressions,
collections of, 126 96,98
compared with user types, 88 concrete variant types, 136, 141
conversion to strings, 84-85 conditional macro assemblers, 210
collections, .NET constant folding technique, 184-185
representing QuickBasic arrays as, 150 containedTypeO method,
collectionUtilities project, qbGUI, 189 qbVariableType, 161

377
Index

containment, QuickBasic data types, Date, C.]., What Not How: The Business
139,322,341 Rules Approach to Application
context ignored by regular expressions, Development, 247, 266
103 debugging
continuation lines, BNF, 56 C preprocessor example, 211
contradictory situations, credit linkage to source code and, 174-175,
evaluation application, 257-263 183
control structures and Thring- phone billing problem, 197
completeness, 99 regular expressions, 32, 105
convertibility, QuickBasic data types, stack code, 48
140-142 decorated notation, 85, 146
core object design approach, 121 default policy, credit evaluation
credit evaluation case study, 247-264 application, 252
credit evaluation application, 250-263 defaultValueO method, qbVariableType,
assessing an applicant's standing, 162
253-254 degenerate operations, 92
possible enhancements, 263-264 delegation, choice between inheritance
qbGUI view, 255, 257 and, 151-152
Show Basic code button, 254 device drivers, use with DLLs, 16
creditworthiness assessment, 248-249 differently abled programmers, 269
cross-platform operation and the CLR, digital switch usage example, 244-245
22-23 Dijkstra, Edsger W.
crs (Common Type Specification), 19,26 articles by, 12
curly braces in regular expressions, 95 on Cobol and Basic, 51
Customer Engineering Zone, on computer science as applied
quickBasicEngine, 198 math, 127
on GoTo statements, 2
on program evolution, 124
D programming "a radical novelty", 1
dashes in regular expressions, 97 Dilbert Factor, 245-246, 285
data, treating logic as , 246-247, 261, disjoint handles, 75-76, 81
263-264 dispatcher routine, quickBasicEngine,
data-driven processes, 270 232-234
data modeling dispose methods
See also object modeling exposing, 21-22
left to right scanning, 112 qbPolish object, 215
qbPolish class, 302 self-inspection and, 125
qbScanner object, 306 division operations, unsymmetrical
qbToken object, 316 operator problem, 75
qbVariable class, 321-322 DLLs (Dynamic linking Ubraries)
quickBasicEngine class, 360-361 code reuse and, 15
resources on, 170 DIl..Hell,17-19
data types Eleven Commandments of, 17
abuse of, CLR and, 23 documentation and language design, 283
QuickBasic variables, 134 .NET See under N
containment and convertibility, DotNetAssembly namespace, 235
139-142 due diligence, 245
serialization,142-148 Dvorak keyboards, network externality
data typing and acceptance, 268
debugging ease vs. efficiency, 272
empirical type determination, 146-147 E
importance of, 133
risks associated with new types, Easter egg, qbGUI, 190
273-274 efficiency and execution time formulae,
database tables compared to objects, 169 137
emulation, 219
entity multiplication in MIS programs,
271

378
Index

ENUtokenType enumerator, 320--321 GoTo commands, 2, 4, 12


ENUvarType objects, 140--141 error handling use, 71
error handling quickBasicEngine, need for an
quickBasicEngine, 202 assembler, 212
use of tine numbers and, 118 quickBasicEngine, rationale for use,
escaping regular expression 214
metacharacters, 33, 35, 96-97 grammar categories, BNF, 36
Esperanto, 283 bnfAnalyzer tool and, 55, 60--61
evaluate method, quickBasicEngine, reference manual view, 78-79
185-186 parse index position and, 42
event model, qbScanner, 128-130 grammar symbols, quickBasicEngine, 174
execution time formulae, 137 grammar test, BNF, using bnfAnalyzer, 57
explicit assignments, 70, 71 group theory and lazy evaluation, 186
expression function, IntegerCalc, 41-42
expression operators, Basic, 73-75
expressionRHS, IntegerCalc, 38 H
expressions hash tables, 208-209
definition in QuickBasic compiler, heaps
72-76 CLR, distinguished from stacks, 20
definition in QuickBasic reference quickBasicEngine object model and,
outline, 80 202
extemal facilities, checking success of, 45 'Hello world' program
effect ofrunning the compiler, 193
examining the generated code, 197
F opcode display, 182-183
fatal contradictions, 258, 262-263 viewing parsing and code generation,
feature bloat, 273-274 194-198
findRightParenthesis procedure, History ofModem Computing, A, by Paul
IntegerCalc, 40, 42 E.Ceruzzi,206
finite automata Hopcroft, Jeff and Ullman, Jeff,
regular expressions and, 100--101 Programming Languages and Their
relab conversions and, 106 Relation to Automata, 100
unsigned numbers and, 114 Hopper, Grace Murray, programming
floating-point numbers, 135-136 language pioneer, 2, 206
language design and, 278 Hungarian notation, 122
for statement, C language, 279-281 hash algorithms and, 208
formal definitions as language
requirement, 69
formatting the bnfAnalyzer reference I
manual, 64-67 ICloneable interface, 142
Forth programming language, 8, 180, 268 IComparable interface, 143
Fortran programming language, 2 identifiers
author's early experiences, 3-4 BNF, 56
forward jumps, GoTo commands in .NET
quickBasicEngine,212 BNF and regular expression
fromString expression, qbVariable class, requirements, 93
323-327 regular expression for start of, 97
fromString expression, q bVariableType Visual Basic
class, 342-343 length limits, III
fromString methods, See to String and regular expressions for, 106-107
fromString methods IDisposable interface, 21-22
IIf operator, 279
immediateCommand, bnfAnalyzer, 69-70
G implicit assignments, 70, 71
garbage collection, 21-22 indexing
genCode method, IntegerCalc, 42, 43, 45 characters in source code, 115
generic types, 209, 240 tokens in quickBasicEngine, 174
goals of language design, 267-270 infix expressions, contrasted with RPN, 44
379
Index

inheritance, choice between delegation intersecting nonterminals, 75-76, 81


and, 151-152 intIndex variable, quickBasicEngine,
inline optimization, 187 174,176
Inside Microsoft .NET IL Assembler, by isomorphism
Serge Udin, 229 qbVariable class, 323
inspect methods qbVariableType class, 341
qbPolish class, inspection rules, 303 iteration counts in regular expressions, 95
qbScannerTest,122-126
inspection rules, 307
usability, 124 J
qbToken class, inspection rules, 317 jump-style operations, 216
qbVariable,163 Just In TIme (JIT) compilation, 6-7, 20
checking type and value, 151 just in time variable allocation, 177
constraints on arrays, 150
inspection rules, 329-330
qbVariableType, 155-156 K
inspection rules, 344 Kemeny, John, Basic language inventor,
quickBasicEngine class, inspection 5-6
rules, 361-362 kernel operating systems, 275
variants and, 138 Kernighan, Brian, The C Programming
inspection Language, 11
report, bnfAnalyzer, 83-84 keyboard characters, regular
scan table, 88 expressions for, 106
scalability and, 156 keyboard design, 268
state, quickBasicEngine, 200 keyed collections and assemblers, 207
instance object, IL code generation, 241 keywords, quickBasicEngine, 289
IntegerCalc example compiler
enhancement to process real num-
bers,48-49 L
Interpreter region, 45 language design, 267-283
introduced, 29 backdoor problems, 273
Parser region and BNF design, 37 data typing, 273-274
regular expression example, 35 determining goals, 267-270
RPN box, More display, 47 documentation, 283
RPNuse,45 semantics, 270--279
simple epxression function, 41-42 string handling, 278-279
tokenization, 31 syntax, 279-282
intEndIndex variable, quickBasicEngine, language reference, quickBasicEngine,
177 287-296
interfaces, consistent, using virtual See also reference manuals
hardware, 16 laptops,pa~g,53,82
interpreted languages lazy evaluation, 186
design choice between compiled lazy vs. busy And and Or, 275-278
and, 272 lazy vs. busy Or, exercise on, 284
interpreter method, quickBasicEngine, Ldc_R8 opcode, CLR, 240-241
224 Lee, John AN., Anatomy ofa Compiler,
interpreterPrintEvent handler, 255 3,12
resolving contradictions, 260 left recursion, 74
interpreters, 6-7, 218-228 lexical analysis
CLR not an interpreter, 28 analyzer construction options, 32
Nutty Professor interpreter bnfAnalyzer output, 87
introduced, 219 IntegerCalc enhancement,49,
Visual Basic releases and, 11 qbScanne~109-120
interpreffixpression method, regular expressions and, 29-31
IntegerCalc, 45 theory, 92-101
enhancement to process real
numbers, 49

380
Index

lexical analyzers mkRandomType method,


bnfAnalyzer, recognition of terminals, qbVariableType, 162
62 Mono open .NET version, 271
QuickBasic compiler, 91-132 More display, IntegerCalc RPN box, 47
translating business rules to mreFactor, bnfAnalyzer tool, 55
QuickBasic, 254 MSIL (Microsoft Intermediate Language),
lexical rules, BNF, 56 24
lexical syntax, quickBasicEngine, 288 code optimization and, 183
lexx tool, 7, 49 quickBasicEngine code generation, 231
See also yacc MSILrun method, quickBasicEngine,
Lidin, Serge, Inside Microsoft .NET IL 231-232
Assembler, 229 msilRun_O function, quickBasicEngine,
life assurance exercise based on credit 235-238
evaluation application, 265 msilRun_QBOpcode2MSI~ procedure
line numbers complete MSIL code and, 242
error handling use, 118 multiple inheritance, 129
qbScannerTest inspection, 125 multiplication operators,
linked lists, 169 compiler_mulOp procedure, 177
loans, creditworthiness assessment for, multiply factors
248-249 compiler_addFactorRHS procedure,
local variable deleting problem with 178
assemblers, 208-209 compiler_tel1IL procedure, 184
location values, 72
Visual Basic nonterminals, 36
locking behavior, quickBasicEngine N
dispatcher routine, 234-235 Name property, stateful objects, 122
logic treated as data, 246--247, 261, namespaces required by
263-264 quickBasicEngine, 235
data-driven processes, 270 Nash, Ogden, on banks, 247
logical newlines, 71 negative numbers, QuickBasic
logically sealed business rules, 252 representation, 135-136
lookahead, QuickBasic compiler, 196 .NET Framework, 15-26
looping binary files, 24
avoiding in qbScanner, 115 class libraries, 23-24
avoiding in quickBasicEngine, 175 CLR and CLS introduced, 20
distinguished from recursion, 74 CTS introduced, 19
for statement, C language, 279-281 DLL Hell and, 19
loops, variable definition inside, 178 open version, 271
LValues,72 openness of, 242
Visual Basic nonterminals, 36 performance penalties, 24-25
support for QuickBasicEngine, 201
.NET languages
M specifying using BNF, 52
machine language and assembly .NET procedures
language,206--207 regular expressions for, 107
macro assemblers, 210 netValue2QBdomain method,
mapping QuickBasic variables to .NET qbVariableType, 162
objects, 148-151,321 network externality, 268-269
math and language design, 275-278 newlines, BNF, 56
math expressions, QuickBasic subset detection in bnfAlalyzer lexical
implemented for CLR, 230 analyzer, 62
memory availability and compilers for logical newlines, 71
Basic, 8, 9 newlines, Windows, 31
Mercator projections, 200 'Hello world' program, 198
metacharacters in regular expressions, regular expressions for, 107
32,95-97 nFactorial program
MIS programs and object orientation, 271 assembler and interpreter
illustration, 212-213
381
Index

execution by the interpreter, 224-225 objects


locking behavior, 235 defined by msilRUll.-O function,
performance effects of caching, 160 238-239
qbGUI results, 218 desire for formal existence, 167-168
nonprintable characters, qbScanner qbScanner classification, 121
treatment, 118-119 OB}state object, quickBasicEngine, 176
nonterminals, BNF objTag, qbVariable, 162
ambiguity of intersecting objTag, qbVariableType, 153
nonterminals, 75-76 objValue object, qbVariable, 162-163
bnfAnalyzer reference manual as one-trip expressions, 95, 105
XML,66 ontological moments, 245
bnfAnalyzer reference manual view, opcodes
60--61,78,79 feature bloat and, 274
definitions, 57 'Hello world' program display,
identifiers identified by case, 56 182-183
nested representation, 43 qbPolish object supported by the
nonterminal recognizers, 41 Nutty Professor interpreter,
terminals and, 36 220-224
typedIdentifier , 72 open code, source program bodies, 70-71
null data type, QuickBasic, 139 open standards, IEEE on floating-point
mapping to .NET objects, 151 math,278
null strings, 33 operating systems, kernel, 275
numberscanr.dng,qbScanne~ 119-120 operators
number tokens, problems with signed Basic expressions
numbers, 113-114 associative operators, 178-179
numbers, QuickBasic representation, precedence, 72
135-136 BNE55
numeric codes, interpreter replacement supported by the Nutty Professor
of operators by, 208 interpreter, 220-224
Nutty Professor interpreter, 219-228 opPushUteral instruction, 198
lazy vs. busy And and Or, 276-277 opRem and OpNop instructions, 214, 215
possible symbolic version, 264 optimization. See code optimization
testing the nFactorial program, Option statements, Basic, 70
224-225 Or operator
as a virtual machine, 181 lazy vs. busy Or, 275-278, 284
code generators and, 74
o OrElse operator, 48, 79, 276
outline representation, BNF parse, 39
objConstantValue parameter, 184, 187
object code optimization phase, 27-28
object modeling
p
qbScanner, 120-130 parentheses
QuickBasic, 133-170 balancing,40,76,196
object orientation BNF operator precedence, 55
C language and, 5 BNF support, 76
language design choice, 271 in regular expressions, 96
nested UDTs and, 144 RPNand,44
00 deSign, digression on, 167-168 parse events, quickBasicEngine, 177
QuickBasic compiler, 129-130 parse outline, 'Hello world' program,
variants and, 137 195
VB versions and, 52 parse trees, QuickBasic compiler, 196
object trace, nFactorial program, 227 See also tree representation
object2XML method parseExpression method, IntegerCalc,
qbScanne~ 126-127 42
serialization compared to to String parser generators
methods, 142 BNF validation, 81
qbVariableType, 153-155 yacc tool as, 37
quickBasicEngine, 181 parser, bnfAnalyzer, status report, 89
382
Index

parser, quickBasicEngine, 194-198 MSIL (Microsoft Intermediate


procedurestnlcture, 175-176 Language),24
syntax, 290-294 need for formal definitions, 69
tactical parsing, 196-197 Pascal language, 276
parsing operations PL/1Ianguage,4,210,269
dangers of ad hoc code, 108 Rexxlanguage,186,273
top-down and bottom-up shortCOmings of Cobol and Basic, 51
approaches, 172 Simulalanguage, 271
parsing phase, 27 Thring-completeness test, 99
BNF and, 35-44 Visual Basic language, 9-11,196
lexical analysis and parsing, 29-30 Programming Languages and Their
running in threads, 41 Relation to Automata, by Jeff
Pascal programming language, 276 Hopcroft, and Jeff Ullman, 100
performance Programming Systems and Languages,
improvement, by caching fromString by Saul Rosen, 180
results, 160 progress reporting
penalties, .NET Framework, 24-25 q bScanner event model, 128-130
phone billing reminiscence quickBasicEngine, 180
solution following business rules, Psychology o/Computer Programming,
244-245 the, by Gerald M. Weinberg, 243,
tactical parsing illustration, 197 265
phone numbers, regular expressions for, PUFFT (Purdue University Fast Fortran
107 Translator), 6,272
PL/lprogramminglanguage,4,210 push operations and CLR load
redefinition capability, 269 operations, 240-241
Plager, Max, 3, 206-207 pushUteral opcode, qbPolish object,
plus signs in regular expressions, 95 183
Polish code. See qbPolish object; RPN pushStack method, IntegerCalc , 45
popStack method, IntegerCalc , 46
portability and the CLR, 22-23
preprocessors Q
code optimization and, 184 QbGUI testing interface
as macro assemblers, 210-211 credit evaluation application, 255,
unpopularity, 211 257
presentation, separation from logic, 129 lazy vs. busy And and Or, 276-277
printing, credit evaluation application, projects in solution architecture, 189
254-255 removing comments and labels, 216
Private methods, identifying, in source code changes and testing, 228
quickBasicEngine, 174 testing CLR generation, 230
productions, BNF testing GUI for the quickBasicEngine,
explicit and implicit assignments, 70 188
grammar category component, 55 testing the for statement in C,
proceduresrecognUting, 176 280-281
requiring different algorithms, 172 qbOp class, properties and methods,
in XML reference manual, 67 300-301
programming, "elimination" of, 264 qbPolish object, 181
programming languages basis of Nutty Professor interpreter,
See also assemblers; QuickBasic 220
AJgollanguage,2,12,283-284 data model and state, 302
Basic language, 5, 51-52, 69 instnlctions as XML, 199
block stnlctured languages, 2, 4-5 introduced,214
Clanguage,5,269,274,279-281 properties and methods, 303-305
C++language,19,181 table of supported opcodes, 220-224
Cobol language, 4, 51,244-245 qbScanner object, 109-120
Forthlanguage,8,180,268 analyzing
Fortranlanguag~2-4 qbVariable1YPe.toString/fromStri
interpreted languages, 272 ng,144-145
language design, 267-283 anticipatory scans, 116, 120
383
Index

data model and state, 306 'Hello world' program results, 193
event model, 128-130 lexical analyzer for, 91-132
'Hello world' program results, 193 parser and code generator, 171-202
object2XML method, 126-127 scanner implementation, 114-117
properties, methods and events, syntax, 51-82
308-315 syntax outline derived from
restoring nonstandard programs, 269 bnfAnalyzer, 80
scanner object model, 120-130 testing the complete compiler, 227-228
translating business rules to token types, 111-112
QuickBasic, 250, 252 QuickBasic compiler (Microsoft), 9
qbScannerTest tool QuickBasic language
inspect method, 122-126 abstract variable type model,
test method, 127-128 134-139
verify utility, 114 differences from Visual Basic, 196
qbToken object, qbScanner, 120-121 quickBasicEngine support for, 287
data model and state, 316 translating business rules to , 250, 254
properties and methods, 317-320 variable types, qbVariable data
qbVariable objects, 162-166 model, 321-322
data model, 321-322 variables, mapping to .NET objects,
evaluate method, quickBasicEngine 148-151
produces, 186 quickBasicEngine
fromString expression, 323-327 assemblers, 212
inspect method, 163 built-in functions, 294-296
objConstantValue parameter, 184 class standards and core
properties, methods and events, methodology, 298-299
330-339 CLR generation, 230-242
quickBasicEngine interpreter collection use, 209
method, 224 converting state to XML, 198-200
serialization and, 145-148 credit evaluation application use, 250
testing, 164-166 dynamic big picture, 189-190
valueSetO method, 164, 256 error taxonomy, 202
qbVariableTest.exe,164-166 full BNF listing for, 290-294
qbVariableType objects, 152-162 grammar symbols as Booleans, 174
E~varTypecompared,140-141 inspecting the state, 200
fromString expression, 342-343 keywords and system functions, 289
inspect method, 155-156 language reference, 287-296
integers as, 141-142 lexical syntax, 288
object2XML method, qbScanner, namespace imports, 235
153-155 object overview, 188
properties, methods and events, reference manual, 297-373
345-359 simple interface, 191
serialization and, 143-148 strongly typed storage, 226
shared methods, 161-162 testing GUI, 188
state, 152-153 threading capability, 232
stress testing, 160-161 utility DLLs, 297
testing, 152-162 viewing parsing and code generation,
types exposed by, 340-341 194-198
qbVariableType procedure, qbScanner, quickBasicEngine class, 363
121-122 properties, methods and events,
qbVariableTypeTester.exe,152-162 362-373
quality control, 84 quoted strings, 57
time-to market and, 130
QuickBasic compiler (author's)
analyzing the BNF, 77-81 R
architecture, 187-188 random variables, qbVariable
building the BNF, 68-76 fromString values as, 331
conceptual stages, 91-92

384
Index

readability advantages relab tool, 34,101-105


BNF, over regular expressions, 100-101 regression test feature, 108-109
XML, 166-167 regular expression conversion to
real nwnbers VB.NET,105-106
qbScanner treatment of, 119-120 regular expressions for common
regular expressions for, 49 tasks, 106-109
recursion distinguished from looping, 74 requirements definition
recursive-descent algorithms, 172-174 guidelines for BNF use, 81
automatic and manual approaches, programmers exceeding, 168
180-181 reserved names, quickBasicEngine, 289
refactoring objects, 130 reverse engineering exercise, regular
reference manuals expressions, 131
documenting languages, 283 Reverse Polish Notation. See RPN
quickBasicEngine,297-373 Rexx programming language, 186, 273
language reference, 287-296 RISC (Reduced Instruction Set
reference manuals, bnfAnalyzer Computing), 47, 132,274-275
BNF language syntax outline, 63 Rosen, Saul, Programming Systems and
content, 53 Languages, 180
display formats, 64, 78 RPN (Reverse Polish Notation), 44
functions operating on, 86-87 code optimization phase and, 28
setup for QuickBasic BNF, 77 credit evaluation application, 257
start of, 60-61 generated code for 'Hello World', 197
reference objects, disposal of, 21-22 network externality and acceptance,
Regex object, VB.NET, 95 268
registry use, qbGUI, 190 nFactorial program, 218
regression testing qbPolish object, 181
qbScanner,118 Ruby form engine, 10
using relab, 108-109 rule manual, QuickBasic, 80
regular expression laboratory. See relab run method, quickBasicEngine, 185
tool
regular expression processors, 97-98
regular expressions S
ambiguity in, 131 scalar variables, QuickBasic, 134-136
BNF translation, 38 mapping to .NET objects, 149-150, 322
bug in, 103, 105 scan tables, bnfAnalyzer
compared with BNF, 100-101 functions operating on, 87-88
context insensitivity of, 103 inspection, 88
example, 35 scan test, qbScanner, 117-118
finite automata and, 100 scanEvent and scanErrorEvent,
included with relab, 106-109 qbScanner,129
lexical analysis using, 29-31, 32-35, scanExpression, IntegerCalc, 49
93 scanned tokens inspection,
metacharacters, 95-97 qbScannerTest, 125
operators, and BNF operators, 55 scanner server concept, 28
parsingVB6, shortcomings, 107-108 scannec function, qbScanner, 117
real nwnbers and, 49 scanning phase
relab tool, 34 See also lexical analysis
resources, 131 introduced, 27
reverse engineering exercise, 131 lexical analysis, 29-31
rules of, expressed in BNF, 93 running in threads, 41
scanning phase and, 28 scanning tokens, quickBasicEngine,
testing using relab, 101-105 193-194
1Uring machines and, 98-100 security of interpreted languages, 272
VB.NET conversion, using relab, semantics and language design, 270-279
105-106 sequence factors, BNF, 55
Visual Basic COM function example, serialization
101 qbVariable objects, 145-148
reification, 113 qbVariableType objects, 143-148
385
Index

QuickBasic variable types, 142-148 stress testing


oftokens, 118 qbVariableTypeTester,160-161
set operations and the credit evaluation credit evaluation application and, 264
application, 264 string20bject function, qbScanner,
shared methods 118-119
eval method, quickBasicEngine, 185 Stringbuilder object, System.Text, 19
qbVariable'JYpe object, 161-162 strings
shared variables, qbScanner object inBNF and VB, 57
model, 122 character encoding, 279
simplicity and language design, 274-275 collection conversion to, 84-85
Simula programming language, 271 in CTS and VB6, 19
Sinclair PC, 9 disposal not required, 22
Skiena, Stephen S., The Algorithm distinguished from characters sets, 97
Design Manua~ 137, 170 identifier length and, III
Software Exorcism: A Handbook [or language design and, 278-279
Debugging and Optimizing Legacy structured programming
Code, by Bill Blunden, 211 See also block structured languages
Solomon, Norman, The Trouble with object orientation and, 168
Dilbert: How Corporate Culture Gets complexity and size, 243
the Last Laugh, 245-246, 266 subtraction operations, 38-39
source code unsymmetrical operator problem,
availability and compiler 74-75
devlopment, 285 syntax
delivery to customers, 210 language design and, 279-282
qbScannerTest inspection, 126 unforseen consequences of
source programs changing, 113
definition, bnfAnalyzer, 70 system functions, quickBasicEngine, 289
possible structures, 195 System.Reflection namespace, 235
special characters and BNF operators, System.Text namespace, Stringbuilder
57 object, 19
square brackets System.Wmdows.Forms namespace, 128
BNF operator precedence, 55-56
BNF optional symbols, 37
in regular expressions, 35, 97 T
stack frame, C language for statement, temporary variables, lntegerCalc, 46
282 terminals, BNF
stack template, qbOp class, nonterminals and , 36
quickBasicEngine, 301 bnfAnalyzer reference manual view,
stack underflow, 45-46 61--62
stacks bnfAnalyzer XML reference manual,
CLR, distinguished from heaps, 20 66
code optimization problems, 47, 48 identifiers identified by case, 56, 62
lntegerCalc use of, 45 test method, qbScannerTest, 127-128
UFO stack use by opJumpZ, 214 testing
popularity of registers and, 180 qbVariable'JYpe object, 152-162
superiority to registers, 23 stress testing, 160-161
use by CLR and Nutty Professor QuickBasic compiler, 227-228
interpreter, 242 theory
start symbols, BNF, 61 group theory and lazy evaluation, 186
state, inspecting, quickBasicEngine, 200 lexical analysis, 92-101
stateless and stateful objects phases of compiler construction and,
qbScanner classification, 121 28
threadable objects, 122 resources on, 49
static evaluation and the credit threading
evaluation application, 263 capability, quickBasicEngine, 232, 299
storage locations, Visual Basic invention of, 8
assignments, 36 model, sequence number
incrementation, 122
386
Index

Three Stooges, 104 U


token types
UDTs (User Data Types), 138-139
qbTokenType object, 320-321
introduced, 134
QuickBasic language, 111-112
array of, 116 mapping to .NET objects, 151
unsigned numbers, 113-114 object2XML method, qbScanner, 155
supported by quickBasicEngine, 288 qbVariableType serialization, 143-144
tokenization user types, compared with
collections, 88
IntegerCalc lexical analysis and, 30-31
purpose oflexical analysis, 110 Ullman, Jeff, Programming Languages
scanning phase, 27 and Their Relation to Automata, 100
tokens underscores
indexing in quickBasicEngine, 174 identifying private methods, 174
shared variable prefix, 122
qbScanner object model, 120
Unicode, 279
scanned tokens, qbScannerTest
unknown data type, QuickBasic, 139
inspection, 125
mapping to .NET objects, 151
scanning by qbScanner, 193
serialization, 118 unnecessary evaluation. See lazy
evaluation
whitespace between, 115
unsigned numbers, 113-114, 119-120
toString and fromString methods
uns~etrical operator problem, 74-75
caching parsed fromString results, 160
qbVariable object, 142, 143 usability, qbScannerTest inspection, 124
BNF, 147-148 user data types. See UDT
usrRPN section, IntegerCalc, 45
qbVariableTest.exe and, 164-166
USRscanned table, bnfAnalyzer 82
syntax, 145
utilities class, qbScannerTest, li4
qbVariableType object, 142, 143
utilities project, qbGUI, 189
subset of qbVariable's, 169
BNF definition, 144-145
XML format and, 167 V
toString method, qbVariable object
'Hello world' program, 198 validation rules, .NET, and the credit
nFactorial program, 227 evaluation application, 263
trace information, Nutty Professor valueSetO method, qbVariable, 164, 256
interpreter, 227 variables
tree representation, BNF parse, 39 43 allocation in .NET and VB6, 177-178
83-84 ' , empirical type determination, 146-147
bnfAnalyzer functions operating on, modification in qbVariable, 164
86-87 multiplicity of, before OOF, 124
bn~alyzertooland,55
need for complete and strong typing
133 '
Troelsen, Andrew, Vzsual Basic.NET and
the .NET Platform, 26, 229 QuickBasic variable types, 134-139,
Trouble with Dilbert, The: How Corpor- 321-322
ate Culture Gets the Last Laugh, by QuickBasic, mapping to .NET
Norman Solomon, 245-246, 266 objects, 148-151
Try.. Catch blocks random, returned by qbVariable, 331
IntegerCalc popStack method, 46 types supported by
IntegerCalc pushStack method, 45 quickBasicEngine, 339-340
msilRun_O function, 239-240 variant data type, 136-138
Thring machines and regular credit evaluation data compiled as,
expressions, 98-100 255
'furing, Alan, on rule formulation, 205, 219 introduced, 134
tutorials, 283 VBA (Visual Basic for Applications) and
two's complement notation, 135-136 the overuse of variants, 137
type conversion vbNewline function, 71
convertibility, QuickBasic types, verify utility, qbScannerTest, 114
140-142 vertical stroke alternation operator
msilRun_O function, 240 inBNF, 55
typedIdentifier nonterminal, 72 in regular expressions, 96

387
Index

virtual hardware, Wmdows, 16 word size, and mappings to .NET


virtual machines objects, 149
CLRas,20 word wrapping problems, 64
interpreters as, 219 wrapper for binary operations, 178
Nutty Professor interpreter as, 181
Visual Basic for Applications (VBA) and
the overuse of variants, 137 X
Visual Basic language, 9-11 XMLformat
converting regular expressions to, 34 bnfAnalyzer reference manual view,
CTS data types and, 19 64-67
QuickBasic differences from, 196 capturing types and values using,
VB5 as a compiled language, 11 166-167
VB6 as a bnfAnalyzer tool converting quickBasicEngine state to,
requirement, 53 198-200
Visual Basic.NET, regular expression nonprintable characters, 118-119
conversion to, 105-106 object2XML method, qbScanner,
Visual Basic.NET and the .NET Platform, 126-127, 153-155
by Andrew Troelsen , 26, 229 object2XML method,
Visual Studio.NET, 65 quickBasicEngine, 181-182
regular expressions for common qbVariable representation, 165-166
tasks, 107 qbVariableTypeTester.exe output, 159
resources on CLR and lLASM, 242 readability advantages, 166
von Neuman, John, 1-2,206
y
W yacc program, 36-37
Web services, progress reporting, 129 as a code generator, 7, 76, 92, 181
Weinberg, Gerald M., The Psychology of resources on, 49
Computer Programming, 243, 265
What Not How: The Business Rules
Approach to Application Z
Development, by C.J. Date, 247, 266 zero-trip spaces, 105
white box tools, 7 ZIP codes, regular expressions for, 107
whitespace between tokens, 115 zoom project, qbGUI, 189
windowsUtilities project, qbGUI, 189 Zuse, Konrad, compiler pioneer, 206
WIrth, Niklatis, on recursive-descent
algorithms, 172

388
ASP Today
ASPToday is a unique solutions library for professional ASP Developers, giving
quick and convenient access to a constantly growing library of over 1000 practical
and relevant articles and case studies. We aim to publish a completely original
professionally written and reviewed article every working day of the year.
Consequently our resource is completely without parallel in the industry. Thousands
of web developers use and recommend this site for real solutions, keeping up to
date w~h new technologies, or simply increasing their knowledge.

Find it FAST!
Powerful full-text search engine so you can find exactly the solution you need.
Printer-friendly!
Print articles for a bound archive and quick desk reference.
Working Sample Code Solutions!
Many articles include complete downloadable sample code ready to adapt
for your own projects.

~ ASP. NET
1.x and 2.0 ~Security
~ ADO.NET and SQL ~ Site Design
~XML ~SiteAdmin
~ Web Services ~ SMTP and Mail
~ E-Commerce ~ Classic ASP and ADO

and much, much more ...

To receive a FREE two-month subscription to ASPToday, visit


www.asptoday.comlsubscribe.aspx and answer the question about this bookl

The above FREE two-month subscription offer is good for six months from original copyright date of book this ad appears in.
Each book will require a different promotional code to get this free offer- this code will determine the offer expiry date. Paid
subscribers to ASPToday will receive 50% off of selected Apress books with a paid 3-month or one-year subsCription.
Subscribers will also receive discount offers and promotional email from Apress unless their subscriber preferences indicate
they don't wish this. Offer lim~ed to one FREE two-month subscription offer per person.
JOIN THE APRESS FORUMS AND BE PART OF OUR COMMUNITY. You'll find discussions that cover topics
of interest to IT professionals, programmers, and enthusiasts just like you. If you post a query to one of our
forums, you can expect that some of the best minds in the business-especially Apress authors, who all write
with The Expert's Voice™-wili chime in to help you. Wrry not aim to become one of our most valuable partic-
ipants (MVPs) and win cool stuff? Here's a sampling of what you'll find:

DATABASES PROGRAMMING/ BUSINESS


Oa1a drives everything. Unfortunately, It Is.
Share information. exchange ideas, and cflSCUSS any database Talk about the Apress Ine of books that cover software
programming or adm istration issues. methodology. best prndices, and how programmern Interact with
the 'suits,'

INTERNET TECHNOLOGIES AND NETWORKING WEB DEVELOPMENT / DESIGN


Try living without plumbing (and eventually 1M). Ugly doesn't cut It anymore, and CGlIs absurd.
Talk about netwoll<ing topics including protocols, design. Help is in siglt for your site. Rnd design solutions for your
administration. wireless, wired, stoIage. backuP. certifications, projec1s and get Ideas for buildng an interactive Web site.
trends, and reN tedlllologies.

JAVA SECURITY
We've come a long way from the old Oak tree. Lots of bad guys out there-the good guys need help.
Hang out and d'1SCUSS Java in v.ilatever flavor you choose: Discuss computer and nel'Mllk secuIity issues here. Just don't let
J2SE. J2EE. J2ME. Jakarta. and so on. anyone else know the answers!

MAC OS X TECHNOLOGY IN ACTION


All about the Zen of OS X. Cool things. Fun things.
OS X is both the present and the future for Mac apps. t.-1ake Irs after hours. Irs time to play. Whether you're nto
I ~
suggestions, offer up ideas, or boast about your reN haraMlre. MINOSTORMSTM or turning an old PC into a DVR. 1tjs is oe
technology turns into fun.

OPEN SOURCE WINDOWS


Source code Is good; understanding (open) source Is better. No defenestratlon here.
Discuss open source technologies and related topics such as MAl. questions about all aspects of Windows programming, get
PHP, MySQL, Linux, Pe~ . Apache, Python. and more. help on Microsoft technologies covered inApress books. or
provide fee<hld< on any Apress WiIld<l'Ml book.

HOW TO PARTICIPATE:
Go to the Apress Forums site at http://forums.apress.coml.
Click the New User link.

You might also like