Introduction
1.1 The diversity of languages
1.2 The software development process
1.3. Language design
1.4 Languages or systems?
1.5 The lexical elements 1
‘This chapter looks at the different stages involved in the development of software and con-
cludes that the main purpose of a programming language isto help in the construction of
reliable software. It also discusses how designers have tried to include expressive power,
simplicity and orthogonality in theirlanguages whist noting that pragmatic matters such
as implementation and error detection have a significant influence. We also consider the
distinction between a language and its development environment.
The basic low-level building blocks used in the construction of a language are con-
sidered; that is, the character set, the rules for identifiers and special symbols, and how
comments, blanks and layout are handled.
(G1) The diversity of languages
Although over a thousand different programming languages have been designed by
various research groups, international committees and computer companies, most of
these languages have never been used outside the group which designed them while
others, once popular, have been replaced by newer languages. Nevertheless, a large
number of languages remain in current use and new languages continue to emerge.
‘This situation can appear very confusing to students who have mastered one language,
often Pascal, Delphi, C+ or Java, and perhaps have a reading knowledge of a couple
of others, They might well ask: “Does a lifetime of learning new languages await2
Introduction
Fortunately, the situation is not as bleak as it appears because, although two
languages may seem to be superficially very different, they often have many more
similarities than differences. Individual languages are not usually built on separate
‘principles; in fact, their differences are often due to quite minor variations in the same
principle.
The aim of this book is to consider the principal programming language concepts
and to show how they have been dealt with in various languages. We will see that by
studying these features and principles we can better understand why languages have
been designed in the way they have. Furthermore, when faced with a new language,
we can identify where the language differs from those we already know and where it
provides the same facilities disguised in a different syntax.
(42) The software development process
‘A computer isa tool that solves problems by means of programs (or software) written
in a programming language. The development of software is a multi-stage process.
First, it is necessary to determine what needs to be done. Unfortunately, initial informal
‘ser requirements are usually vague, inconsistent, ambiguous and incomplete. The
purpose of requirements analysis is to understand and clarify the requirements and
ofien involves resolving the conflicting views of different users.
‘The next stage is concemed with the production of a document, the specification,
which defines as accurately as possible the problem to be solved; in other words, it
determines what the system is to do. Requirements analysis and specification are the
‘most difficult tasks in software development.
Having defined what the system is to achieve, we then design a solution and imple-
‘ment the design on a computer. Its only atthe implementation stage tha a programming
language becomes directly involved.
The aim of validation and verification js to show that the implemented solution
does what the users expect and satisfies the original specification, Although there has
been a lot of theoretical work on verification, or program proving, it is usually still
necessary to run the program with carefully chosen test data. But the problem with
program testing is that it can only show the presence of errors, it can never prove theit
absence,
‘The final stage of software development, usually termed maintenance, covers two
quite distinet activities:
1. The correction of errors that were missed at an earlier stage but have been detected
after the program has been in active service.
2. Modification of the program to take account of additions or changes in the users"
requirements,
Although a programming language is only explicitly introduced during the implemen-
tation stage, it has traditionally influenced the earlier stages of the process. Designers
are, for example, often aware of the implementation language to be used and bias theit
designs to take account of the language's strong points‘The software development process. 3
Tequremants
analysis
specification |
Implementation
‘maintenance |
Figure 1.1. The waterfall made! of software development.
Development models
I is important to realise that the software development process is iterative, not sequen-
tial, Therefore, knowledge gained at any one of the stages outlined can (and should) be
used to give feedback to earlier stages. The traditional approach isto treat the different
stages in the development process as being self-contained and this has led to the waterfall
‘model of software development shown in Figure 1.1
However, there has been increasing acceptance of the idea that an ineremental and
iterative approach is much mote realistic. Central to this approach is the idea of risk
management. Every time we make a decision, there is the possibility that we get it
‘wrong. We therefore want to have continual feedback to show up possible errors because
the longer an error remains undetected, the more expensive it will he to put right. This
led tothe spiral mode! shown in Figure 1.2 (Boehm, 1988). We start at the centre of the
spiral and go repeatedly through the different stages as our system is built incrementally,
Many modem languages are objectoriented and this has led to the creation of
object-oriented development methods. In other development methods, there is a clear
distinction in the techniques used in specification, design and implementation. However,
in object-oriented development, a problem can be understood and a solution designed
and then implemented using the same framework of a set of communicating objects.
The object-oriented development process is therefore well suited to an incremental and
iterative approach. At any given stage, different objects can be described at different
levels of abstraction, As the iterative development process continues, we incrementally
add more detail to the object descriptions.
‘The need for 2 notation in which a specification or design can be written down{Introduction
Determine objectives, | Evaluate alternatives,
alteratves, constraints | identity and resahe sks
i
Pan next phase I Develop and verity
Figure 1.2. Simplified Boehm spiral model
has led to the development ofboth specification and program design languages. Such
languages are at higher level of abstraction and give fewer details than implementation
languages. Many specification languages are mathematical in form and are amenable to
proof techniques. However, languages of this type are outside the scope ofthis book and
s0 we ony look at what are conventionally considered to be implementation languages,
althoogh some funetional languages have been used as executable specification lan
uages.
‘Another approach is to use graphical notions to capture the requirements and
represent designs. Examples of diagrams that occur in many different development
methods are dataflow diagrams, entity relationship diagrams, state transition diagrams
and message sequence chars. A problem is that each development method can Us its
‘own setof diagram notations which, although they are representing much the same thing,
can differ in detail. This situation is far from satisfactory as it can suggest differences
in the development process that do not really exist. In object-oriented development,
standard notation called the Unified Modeling Language (UML) has been adopted
$0 that different methods now use a common notation (Booch et al, 1998). There are
several different kinds of diagram in UML, but the single most important isthe lass
dingram which shows the classes involved in an object system and their associations
‘An example class diagram is shown in Chapter 8
The use ofa systematic software development process has greatly influenced both
language design and how languages are used. For example, Pascal was designed to
support the ideas of structured programming. The problems of constructing large
systems and of program maintenance led to the introduction of language features that
allow large systems to be broken down into self-contained modules, Packages in Ads
and classes in object-oriented languages satisfy that need. It is clear, therefor, that
programming languages éonotexistina vactum; rather, he design of modern languages
‘isa drect response tothe needs and problems ofthe software development process.Language design 5
(G3) Language design
Most widely used programming languages are imperative; exaraples are Fortran,
COBOL, C, C++, Pascal, Ada and Java. A program writen in an imperative language
achieves its effect by changing the value of variables or the attributes of objects by
means of assignment statements. Until quite recently, most widely used imperative
languages were procedural, that is ther organisation was centred around the defnition
of procedures. Many procedural languages have now been extended to include abject-
oriented features (C by C++, Pascal by Delphi, Ada by Ada95, COBOL by OOCOBOL,
Basic by Visual Basic) while other new purely object-oriented languages such as Fiffel
and Java have been designed. Object-oriented programs are organised asa set of objects
‘hich communicate with one another through small strictly defined interfaces
Other approaches to language design include functional languages (such as pure
Lisp and ML) and logic languages (such as Prolog). These alternative approaches are
dealt with in Chapters 9 and 10 respectively.
‘The primary purpose of & programming language is to support the construction
of reliable software. Hence, in most modera languages, type checking takes place at
compile time, which isa considerable help in catching logical erors before the program
is run. [Li also important that a language is user friendly so that its straightforward to
design, write, read, tes, run, document and modify programs written in that language.
“To understand how these objectives may be achieved, the issues of language design
can be divided into several broad categories
expressive power,
simplicity and orthogonality,
implementation,
‘stror detection and correction,
+ correctness and standards,
Expressive power
‘A programming language with high expressive power enables solutions to be expressed
in terms of the problem being solved rather than in terms of the computer on which
the solution isto be implemented. Hence, the programmer can concentrate on problem
solving. Such a language should provide a convenient notation to describe both algo-
‘rithms and data structures in addition to supporting the ideas of structured programming
‘and modalarisation,
Another aspect of expressive power is the number of types provided together with
their associated operations. Instead of providing a large number of built-in types, most
‘modern languages provide facilities, such as the Ada package or the C+ and Java class,
for defining new types, called abstract data types. Such languages can then provide a
‘wide range of predefined types by means of standard libraries which the programmer
can use to build new types for the problem in hand. When a language, together with
its standard libraries, does not include a suitable range of types and operations, then
the programmer generally has to provide these by declarations, thereby distractingIntroduction
the programmer's attention to the lower level aspects of solving the problem. Often,
languages may have high expressive power in some areas, but not in others; for example,
‘Ada has a range of numerical operations that give it expressive power for numerical
work, but i is less effective in data processing applications.
‘Also included under the heading of expressive power is readability; tha is, the ease
with which someone familiar with the language can read and understand programs
written by other people. Readability is considerably enhanced by a well-designed
comment facility, and good layout and naming conventions. In practice, it should be
possible to write programs that can act, to an extent, as their own documentation, thereby
making maintenance and extension of the prggram much easier,
Simplicity and orthogonality
‘Simplicity implies that a language allows programs to be expressed concisely in a
manner that is easily written, understood and read. This objective is often underrated
by computer scientists, but is « high priority for non-professional programmers. The
success, fist of Basic and then of Visual Basic is an eloquent commentary on the
importance that users place on simplicity.
‘A simple language either avoids complexity or handles it well. Inherentin most simple
languages is the avoidance of features that most human programmers find difficult.
‘Simple languages should not allow altemative ways of implementing constructs nor
should they produce surprising results from standard applications of their rules. An
orthogonal language is one in which any combination ofthe basic language constructs is
allowed and so there ae few, if any, restrictions or special cases. Examples of orthogonal
languages are Algol 68 and Smalltalk, which were both designed with the aim of keeping
the number of basic concepts as small as possible. The idea was that the resulting
language would be simple as it would only consist of combinations of features from a
smal set of basic concepts.
‘There can, however, be a clash between the ideas of orthogonality and simplicity
For example, Pascal, which is not orthogonal, is simpler to learn and use than Algol 68.
Where a new special construct is introduced in Pascal, the same effect is achieved in
Algol 68 by the combination of simpler existing constructs. As an example, Pascal
separates the notion of the type of a parameter from whether itis a value or a variable
parameter. (Details are given in Chapter 6.) Algol 68, on the other hand, combines both
pieces of information within the parameter type. Although the Algol 68 approach is
elegant and powerful, the more pragmatic approach (Wirth, 1975) taken in the design
of Pascal has led to a more understandable language.
What is generally agreed is that the use of constructs should be consistent; that is,
they should have a similar effect wherever they appear. This is an important design
principle for any language although itis obviously of great importance in an orthogonal
language which gets its expressive power from a large number of combinations of basic
concepts. Whether simplicity or orthogonality is the goal, once the basic constructs
are known, their combination should be predictable. This is sometimes called the law
of minimum surprise, However, again the importance of simplicity should not be
underestimated. In Java, the declaration int: x; defines x to be an integer variable
while the declaration SomeClass x; defines x to be a reference to a SomeClassLanguage design 7
object. We therefore have the same syntax meaning different things. Although this is
inconsistent, it can be argued that inventing new syntax to make the distinetion clear
‘would have just complicated matters,
Implementation
Execution of a program written in an imperative language, such as Pascal, Ada or C++,
normally takes place by translating (compiling) the source program into an equivalent
machine code program. This machine code program is then executed. The ease with
which a language can be translated and the efficiency of the resulting code can be
1ajor factors in a language's success. Large languages, for example, have an inherent
disadvantage in this respect because the compiler will, almost inevitably, be large, slow
and expensive.
‘An altemative to compiling a source program is to use an interpreter. An interpreter
can directly execute a source program, but what is more common is fora source program
to be translated into some intermediate form which is then executed by the interpreter.
‘The interpreter can be said to implement a virtual machine. Executing a program
‘under the control of an interpreter is much slower than running the equivalent machine
code program, but does give much more flexibility at run time. The added flexibility
is important in languages whose main purpose is symbolic manipulation rather than
‘numerical calculation. Examples of such languages are the string processing language
SNOBOL4, the object-oriented language Smalltalk, the functional language Lisp, the
logic language Prolog and the scripting language Perl
‘The use of an interpreter also supports an interactive programming environment
in which programs may be developed incrementally. When developing. Lisp program,
for example, a programmer can interact directly with the Lisp interpreter and type in
the definition of functions followed by expressions which call these functions. The
expressions are immediately executed and the results made available. This allows the
carly detection, and easy correction, of logical errors. Once the complete program has
‘been developed, it ean be compiled so that it will run faster.
Java is an imperative language and so we would expect that it would normally
be compiled into machine code. However, that is not the case; Java programs are
interpreted. An exciting use of Java isto animate web pages. A person can download a
web page which contains a Java applet (e small application) and, using a Java-enabled
web browser such as Netscape or Intemet Explorer, can run the applet. To achieve this,
it must be possible for a Java program to be translated on one computer and to ran
‘on a different kind of computer and the easiest way of doing this is to translate Fave
source programs into code for a Java virtual machine. Java-enabled browsers provide
interpreters for the Java virtual machine.
Some language designers, notably Wirth the designer of Pascal and Modula-2, have
‘made many of their design decisions on the basis ofthe ease with which a feature can
bbe compiled and executed efficiently. One of the many advantages of having a close
‘working relationship between the language design and language implementation teams
is that the designers can obtain early feedback on constructs that are causing trouble.
‘Often, features that are difficult to translate are also difficult for human programmers
to understand. Algol 68 is a prime example of a language that had a lack of success dueIntroduction
to the fact that it was designed by a committee who largely ignored implementation
considerations, as they felt that such considerations would restrict the ability to produce
«8 powerful language. In contrast, the implementation of C, C++, Pascal and Java went
hand in hand with their design and the Ada design team was dominated by language
implementers.
However, it is necessary to achieve a proper balance between the introduction of
powerful new features and their ease of implementation. ISO Standard Pascal, for
example, has features, such as procedures being able to accept array parameters of
differing lengths, which were omitted from the original version of the language on the
rounds that they were too expensive to implement.
Error detection and correction
[cis important that programs are correct and satisfy ther original specification. However,
demonstrating that this is indeed the case is no easy matter. As most programmers stil
rely on program testing as a means of showing that a program is error free, a good
language should assist in this task. It is therefore sensible for language designers to
include features that help in error detection and to omit features that ate difficult to
check.
Ideally, erors should be found at compile time when they are easier to pinpoint and
correct. The later an error is detected in the software development process. the more
difficult itis to find and correct without destroying the program structure.
‘As an example of the importance of language design on error detection, consider the
original Fortran method of type declarations where the initial letter of a variable name
implicitly determines the type of the variable. Although this method is convenient and
greatly reduces the number of declarations required, it isin fact inherently unsound
since any misspelling of variable names is not detected at compile time and leads to
logical errors
Conversely, explicit type declarations have the following advantages. Firstly, they
provide extra information that enables more checking to be carried out at compile time
and, secondly, they act as part of the program documentation.
Correctness and standards
‘The most exacting requirement of correctness is proving that a program satisfies its
original specification. With the major exception of purely functional languages, which
are amenable to mathematical reasoning, such proofs of correctness have not, as yet,
hhad a major influence on language design. However, the basic ideas of structured
programming do support the notion of proving the correctness of a program, as it
is clearly easier to reason about a program with high-level control structures than about,
cone with unrestricted goto statements.
‘To prove that a program is correct, or to reason about the meaning of a program,
is necessary to have a rigorous definition of the meaning of each language construct.
(Methods for defining the syntax and semantics of a language are discussed in Chap-
ter 12.) However, although itis not difficult to provide a precise definition of the syntaxLanguages or systems? 9
ofa language, itis very difficul, if not impossible, to produce a full semantic definition,
and as far as most programmers are concerned it is unreadable anyway.
‘tis therefore vital inthe early stages ofa language’s development to have an informal
description that is understandable by programmers. As in many aspects of computer
science, there needs to be a compromise between exactness and informality
A programming language should also have an official standard definition to which all
implementers adhere. Unfortunately, this seldom happens as implementers often omit
features that are difficult to implement and add features that they feel will improve the
language. As a result, program portability suffers. The exception to this is Ada. An Ada
compiler must be validated using a spetially constructed suite of test programs before
ican be called an Ada compiler. It is interesting that one of the aims of these tests is
to rule ont supersets as well as subsets of the language. This is an excellent idea and it
is hoped that it will become the norm.
ED Languages or systems?
‘An important feature of many modem languages is that they support network program-
ming and the creation of graphical user interfaces (GUIS). These features are often
provided through libraries and so we have the question of whether they are part of a
language or part of its support environment. The problem is compounded by the fact
‘that GUIs and networking are often highly dependent on the facilities provided by the
operating system,
A major advantage of Java is that it provides support for GUIs and network pro-
‘gramming in a way that is independent of any particular operating system. Java has an
extensive set of standard libraries where the necessary facilities are defined in terms of
the Java Virtual Machine. As all Java programs make heavy use of these libraries, they
are regarded by Java programmers as an integral part of the language.
‘Languages such as Visual Basic and Delphi also provide these facilities through
an extensive set of libraries, but differ from Java in that they are closely tied to
particular operating system, namely Microsoft Windows. This allows a close and
efficient integration between the language and operating system facilities. However, it
does do away with one of the major advantages of high-level languages which is that
they are machine independent. It also raises the question of when are we talking about
anew language and when are we talking about a new implementation of an existing
language,
‘There are many different implementations of C++, each of which provides its own set,
of libraries for GUIs. It is therefore clear when we are talking about the C++ language
and when we are talking about a particular implementation. However, the GUI and
networking facilities of Visual Basic and Delphi form such a large part of the system
used by their programmers that they can claim to be new languages although they do
hhave Basic and Object Pascal respectively as their core, Moreover, Visual Basic and
Delphi both provide extensive visual development environments. One view is therefore
that they are not languages, but are system development environments. This lack of a
clear distinction between a language and its development environment will continue to
increase as support facilities become ever more sophisticated.10
Introduction
‘An important feature of programs that use graphical user interfaces is that they are
event driven. They wait for some user event such as the click of a mouse over the
representation of a bation on the screen, handle that event and then wai for the next
is leads to a very different program structure from that provided
by traditional programming languages. Writing event driven programs is difficult, but is
dealt with in languages such as Java, Visual Basic and Delphi by most ofthe work being
done behind the scenes. This allows the programmer to work at a very high level of
abstraction and not worry about implementation details. With earlier languages, event
handling had to be explicitly programmed. This is therefore another example of where
inction between a language and its sugporting environment has become blurred,
(AS) The lexical elements.
‘The basic building blocks used in writing programs in a particular language are often
known as the lexical elements, This covers such items as the character set, the rules for
identifiers and operators, the use of keywords or reserved words, how comments are
‘written, and the manner in which blanks and layout are handled.
Character set
‘The character set can be thought of as containing the basic building blocks of a pro-
ramming language — letters, digits and special characters such as arithmetic operators
and punctuation symbols. Two different approaches were taken when deciding the
character set to be used in early languages. One isto choose all the characters deemed
‘necessary. This is the approach taken with APL. and Algol 60, bu it as the drawback
that either special imput/output equipment has to be used or changes have to be made
to the published language when itis used on a computer,
The otter approach is to use only the characters commonly available with current
input and output devices. Hence, the character set of early versions of Fortran was
restricted by the 64 characters avaiable with punched cards while Pascal initially was
constrained by the character set available with the CDC 6000 series computer on which
it was ist implemented
Since the early 1970s, most inpot and output devices have supported internationally
accepted character sets such as ASCII (American Standard Code for Information
TIterchange) and this has been reflected in the character ses of languages. The ASCTI
character set has 128 characters of which 95 are printable; the remaining characters are
special control characters. The printable characters are the upper and lower case leer,
digits, punctuation characters, arithmetic operators and three different sets of brackets
OI) and). Composite symbols are used to extend the range of symbols available
Commonly used examples are the relational operators <= and >= and the assignment
operator = used in the Algol family of languages
‘More recently, the Unicode character set has been created to give a much larger
range of characters. Each Unicode character occupies 16 bits rather than the 8 used
‘with ASCII characters. Java uses the Unicode characte se.The lexical elements 11
Identifiers and reserved words
The character sets the collection from which the symbols making up the vocabulary of
programming language are formed. Clearly, a language needs conventions for grouping
characters into words so that names (usually known as identifiers in computing) can be
given to entities such as variables, constants, etc, (Naming conventions are discussed
in Chapter 3.)
Some of the words in a programming language are given a special meaning. Examples
of this are DO and GOTO in Fortran and begin, end and for in Pascal. TWo methods
are used for including such words in @ Janguage. The method adopted by Fortran is to
allow such words to have their special well-defined meaning in certain contexts. The
‘words are then called keywords. This method was also adopted by the designers of PLL
since it limited the number of special words tha the programmer had to remember ~ the
scientific programmerusing PL/Tis unlikely to know all the business-oriented keywords,
‘while the business programmer is unlikely to know all the scientific keywords. However,
the drawback of this method is thatthe reader of @ program written in a language with
keywords has the task of deciding whether a keyword is being used for its special
meaning or is an occurrence of an ordinary identifier. Furthermore, when an error
‘occurs due to the inadvertent use of an unknown keyword, i is not always clear when
1 word has its special meaning, without consulting all the declarations.
The alternative method, used initially by COBOL and adopted by most modem
languages, isto restrict the use of such words to their special meaning, The words are
then called reserved words. The advantage of the reserved word method is best scen
in languages like Pascal and C++ where the number of such words is quite small.
In COBOL, however, the number of reserved words is much larger ~ over 300 —
and so the programmer has the task of remembering a large number of words that
‘ust not be used for such things as variables. As well as reserved words, languages
often have predefined identifiers, These are ordinary identifiers that have been given
an initial definition by the system, but which may be redefined by the programmer.
Examples, in Pascal, are the predefined type Tnteger and the input and output
procedures read and write. In languages such as Ada and Java, a large number
of identifiers are defined in the standard libraries, Such identifiers can be redefined in
programs.
In Algol 60 programs, reserved words are written in a different typeface, either
underlined or bold face, depending on the situation. ‘The drawback of this is that many
input devices cannot cope with underlined words, so less attractive alternatives, such as
‘writing reserved words in quotes, had to be used. In handwritten versions of programs
in Pascal and Ada, for example, reserved words ate often underlined so that they stand
‘out while in books they are often printed in boldface. In the version presented to &
compiler, however, they are typed in the same way as ordinary identifiers.
Comments
Almost all languages allow comments, thereby making the program more readily
understood by the human reader. Such comments are, however, ignored by the compiler.
In early languages such as Fortran, which has a fixed format of one statement per line,