0% found this document useful (0 votes)
77 views13 pages

Economic Range Checks in Pascal Welsh 1978

This document describes an implementation of Pascal that minimizes runtime checks for variable value ranges through the use of subrange type declarations. It calculates implied value ranges for expressions at compile-time based on variable types and operation semantics. If an expression's implied range is always within the required range, no runtime check is needed. This analysis allows some optimizations and eliminates most range check overhead.

Uploaded by

Bob William
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views13 pages

Economic Range Checks in Pascal Welsh 1978

This document describes an implementation of Pascal that minimizes runtime checks for variable value ranges through the use of subrange type declarations. It calculates implied value ranges for expressions at compile-time based on variable types and operation semantics. If an expression's implied range is always within the required range, no runtime check is needed. This analysis allows some optimizations and eliminates most range check overhead.

Uploaded by

Bob William
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

SOFTWARE-PRACTICE AND EXPERIENCE, VOL.

8, 85-97 (1978)

Economic Range Checks in Pascal


J. WELSH
Department of Computer Science, Queen’s University, Belfast BT7 I N N , N . Ireland

SUMI\IARY
A Pascal implementation is described which exploits the information provided by subrange
type declarations to minimize the run-time checking involved in detecting range violations.
An evaluation of its performance is given, and some possible modifications are discussed.
KEY WORDS Pascal Error checks Run-time checks

INTRODUCTION
Effective use of a high-level programming language requires that every program violation
of the language rules is detected. While most violations are detected at compile time some
are not, and are commonly detected at runtime, or not at all. Since each run-time check
implies some degradation of a program’s performance throughout its lifetime language
designers and implementors must strive to maximize the proportion of language rule
violations which are detected prior to run time.
That a program obeys the language rules is of course a prerequisite of its ‘correctness’
in the more general sense. The techniques now being developed for proving a program’s
correctness by formal manipulation can also be used to prove that it obeys some of the
language rules not enforced by a compiler. Suzuki and Ishihatal have described an automatic
verifier which checks the correctness of array accesses in a Pascal program, by constructing,
and providing the validity of, assertions for each expression occurring as an array subscript.
By using the verifier as a preprocessor to a compiler the need for run-time checking of
array subscripts can be eliminated.
One of the advantages of the system which Suzuki and Ishihata stress is that it requires
no assistance from the programmer in the form of assertions or loop invariants for subscript
variables. However, programmers are quite accustomed to providing a limited implicit
form of assertion about a variable’s value-by declaring its type. Each variable declaration
is an assertion that the variable will only take a certain set of values, and be subjected to a
certain set of operations, throughout its scope. The language implementation must check
that the validity of this assertion is maintained at each occurrence of the variable, but
having done so it may exploit the asserted properties in other ways. This is the basis of
conventional type checking.
With built-in scalar types such as integer or real the implicit assertions provided by
variable declaration are too weak to assist in checking the range of array subscripts at
compile time. In Pascal, however, subrange and enumerated scalar types greatly increase
the precision with which the programmer may assert the range of values to be taken by a
variable. As Fischer and Leblanc have noted,2 systematic exploitation of these stronger
assertions should enable a significant reduction, or even the total elimination, of the run-time
0038-6644/78/0108-0085 $01.OO Received 25 September 1977
@ 1978 by John Wiley & Sons, Ltd.
85
86 J. WELSH

checks which a one-pass compiler is forced to generate. This paper describes an existing
compiler which implements an optimized range-checking facility on this basis, and
evaluates its performance on a number of test programs put forward by other authors.

T H E IMPLEMENTATION
The optimized range-checking facility has been incorporated in the Mk 2 Pascal compiler
for ICL 1900 series computers, whose general characteristics have been outlined el~ewhere.~
This is achieved within the one-pass compilation scheme by constructing
-a declared range for each scalar variable or function occurring in the program,
-an implied range for each expression which computes a scalar value,
-a required range for each occurrence of a scalar expression which is subject to a range
restriction by the language (or implementation) rules.
In practice every scalar expression is subject to an implementation restriction-that overflow
should not occur during its computation. The detection and reporting of arithmetic overflow
are thus handled as a particular case within the more general range analysis logic.
Comparison of the implied and required ranges for each expression shows whether its
value will always fall within the required range or whether some run-time check is necessary.
The calculation of implied ranges can also contribute to the optimization of the code
generated for the productive operations of the program, as the following more detailed
discussion shows.

Declared ranges
The declared range of a variable or function is trivially obtained by a compiler, since it is
derived directly from the declared type which the compiler must record anyhow. The range
is conveniently represented as a Pascal record of the form
range = record min, max: machine integer end;
where machine integer is some type suitable for representing all representations used for
scalar values on the target machine (usually the standard integer representation itself).
The compiler records one such range for each data type recognized in the program being
compiled.
On some machines the inclusion of ranges of real values within such a representation
poses problems since the floating-point representation used for real numbers differs in bit
length or interpretation from that used for integers. However, exact range analysis for real
arithmetic is impractical in any case since the inaccuracies of floating-point arithmetic may
lead to unexpected range violations being reported. An acceptable facility for real range
checking would have to incorporate some user-defined tolerance in range enforcement, a
facility for which Pascal’s subrange notation is inadequate. For both these reasons real
arithmetic was excluded from all range analysis in the 1900 Pascal compiler.

Implied ranges
The implied range of an expression is obtained by
(a) associating with each primary operand (variable, constant or function designator) a
range which is either the declared range (in the case of a variable or function) or a
degenerate range with coincident upper and lower bounds (in the case of a constant).
The declared range is a true description of the run-time value of a variable or function
provided it has been previously assigned and the assignment has been subject to an
ECONOMIC RANGE CHECKS I N PASCAL 87

appropriate range check. The latter condition is easily ensured by checking all
assignments throughout the program. The problem of unassigned variables is con-
sidered further in the final section of this paper;
(b) calculating a range for the result of each operation from the corresponding ranges of
its operand and the operation itself. For example, the range Y of an addition a + 6,
where a and b have ranges ra,rb respectively, would be given by
r .min: = ra .min + rb .min;
r .max: = ra .max+ rb .max;
I n programming this calculation the compiler writer must be aware of the possibility of
overflow at two levels.
1. The range calculated may go beyond the limiting values representable on the target
machine, indicating that run-time overflow is possible (and that it should be checked
for). Where a bound value exceeds its limit in this way it may be replaced by the limit
value itself in subsequent calculations, since this is the bound on the significant run-
time values with which execution may proceed. Other values may arise in a run-time
operation which causes overflow but their significance will be truncated by the overflow
check which is presumably applied. A suitable compile-time representation for a
calculated implied range is thus
implied range = record
may ovwflow: boolean;
min, max: machine integer
end;
where the bounds min and max apply only to those values generated without the
occurrence of overflow.
2. The compile-time arithmetic which calculates an implied range is itself liable to over-
flow, an occurrence which the compiler writer must avoid at all costs. In practice this
can be done without resorting to multilength arithmetic, provided the integer range
at compilation is greater than or equal to that of the target machine. (For most
compilers, which run on their own target machine, the two ranges coincide.) Given a
set of compile-time arithmetic operations which either return the result of a specified
operation or an indication that overflow would have occurred in carrying it out, the
implied range calculations as described above are easily programmed. (Such overflow
avoiding arithmetic is required in any case if the compiler attempts to ‘fold’ the code
generated for expressions with constant operands.)
Calculation of the implied ranges for the built-in scalar operations provided by Pascal,
+
v i z . the operators , x , *, div, and mod and the functions sqr, abs, succ, pred, ord and chr,
is readily programmed in a similar manner to that outlined for + above, though the
appropriate calculation for div and mod requires careful consideration.
The availability of implied ranges for their operands enables some immediate optimization
of the code generated for *, div and mod on some machines. The integer multiplication
operation provided on machines such as the 1900 series produces a double-length result
which must be reduced to single length to obtain an acceptable integer value. When the
result is positive this reduction can be achieved by discarding the more significant of the
register pair in which the double-length product is held, an operation which involves no
code. However, if the result may be negative its reduction involves either transferring the
88 J. WELSH

sign bit from the more significant to the less significant register, or performing a double-
length ‘arithmetic shift’ to leave the correctly signed result in the more significant register.
By examining the implied range of the result the compiler can avoid this additional
adjustment code when the result is shown to be positive.
T h e Axiomatic Definition of Pascal4 restricts its definition of the div operator to non-
negative operands, and defines mod as a corresponding remainder operation. However,
the Pascal R e p ~ r t which
,~ is the binding definition on implementors, imposes no such
restriction on the operands of div (which is defined simply as ‘division with truncation’)
and gives no explicit definition of mod at all. Implementors therefore must choose either
to adopt the limitations of the Axiomatic Definition as an implementation restriction or to
implement some definition of div and mod for negative operands consistent with the
Pascal Report. The latter course had already been chosen for 1900 Pascal. For operands
yielding a negative result ‘division with truncation’ presumably means rounding towards
zero, which in turn implies the remainder is negative also. Unfortunately this definition of
div and mod does not coincide with the division operation provided on some machines
and compilers must generate additional instructions to test the sign of the quotient and
adjust the result accordingly. Examination of the implied ranges of the operands enables
this additional code to be suppressed by the compiler whenever possible. It also enables
faster shift and mask instructions to be used for div and mod when the second operand is
a constant power of 2, an optimization only permissible when the first operand is non-
negative.

Required ranges
Implementations impose a required range on the result of all scalar expressions, through
the limitations of the representation used for scalar values on the target machine. A more
stringent restriction applies to the result of an expression e appearing in the following
Pascal contexts :
(i) as a value to be assigned to a variable v of subrange type, either
by a direct assignment ZI:= e,
by being passed as a value parameter to a formal v, or
by a for statement for v := ...;
(ii) as an index of an indexed variable Are];
(iii) as a member in a set membership operation, e.g. e in S ;
(iv) as a selector in a case statement, case e of ... .
I n cases (i) to (iii) the required range is determined directly by the declared type of the
variable v, by the index type of A, or by the base type of S. I n case (iv) the required range is
determined implicitly by the values labelling the limbs of the case statement. These values
do not necessarily form a continuous range but in practice the occurrence of missing inner
values is detected by a different implementation technique to that used for values outside
the implied range, which constitute a conventional range-checking situation.
Given an implied range for each expression occurring in a Pascal program and a required
range determined by its context, the need for any run-time checking of the expression’s
value is determined by direct comparison of the two. If the implied range lies wholly within
the required range no check is needed. If the two ranges do not overlap an error can be
flagged at compile time! Otherwise some run-time check is necessary and inspection shows
ECONOMIC RANGE CHECKS I N PASCAL 89

what combination of upper bound, lower bound and overflow checking is required to give
the most efficient check possible.
For assignment, parameter passing, array indexing and the set operations the check code
is readily incorporated within the existing code sequences for these operations. However,
the for and case statements provide greater opportunities for code optimization when
embedding the necessary checks.

For statements
The Pascal Report is not very explicit as to whether the control variable in a for statement
must be separately declared, but most implementations have required that it be so. It is
possible therefore that the variable is declared of a subrange type in which case the values to
be assigned to it by the for statement must fall within the required range. To enforce this
requirement it is clearly more efficient if the range of values to be taken by the control
variable is checked in a single operation before looping begins, rather than checking the
incremented value on each iteration. The code generated for the statement
for v : = el to e2 do S
where v has a declared range rv and el, e2 have implied ranges rl, r2 may be equivalent to the
following Pascal code, though the temporary variables templ, temp2 may represent transient
register contents:
templ: = el;
temp2 := e2;
check overflow ;f necessary ;
if templ < temp2 then
begin
check that temp1 2 r v .min if necessary ;
check that temp2 < r v .max if necessary ;
repeat
v: = temp1 ;
S;
templ := succ(v)
until templ > temp2
end
This code form has the following notable properties.
(i) Whether any run-time check code is necessary is determined at compile time by the
implied ranges rl, r2 calculated for initial and final expressions el, e2, and the declared
range rz, of the control variable 21.
(ii) Any run-time check code required lies wholly outside the loop code itself, so mini-
mizing the resultant degradation of loop execution speed.
(iii) Given the loop entry condition (temp1 < temp21 it is sufficient to check templ against
the lower bound and temp2 against the upper bound, to ensure a valid range of values
for v.
(iv) Values of el and e2 which preclude execution of the loop body are not subjected to any
check, other than a possible overflow check necessary to ensure their own validity.
(v) The loop entry test itself, if templ < temp2 then, may be excluded if rl. max < r2.min.
This is frequently the case, not only when both el and e2 are constant, but also when
90 J. WELSH

one is a variable and the other a constant which is a limiting value of the variable’s
range, e.g.
..
varJ: 1 n ; ......for v:= 1 toJ do ...
The location of all checking code outside the loop body itself implies one restriction-
those for loops which avoid assigning an out-of-range value to the control variable only by
jumping out of the loop body will be rejected at loop entry. Since such loops are contrary
to the special purpose for which the for construct is provided this restriction seems an
acceptable price to pay for loop efficiency with error security.
Case statements
The normal method of implementing a case statement is to create in the object code a
table of jump instructions of the form
array [casemin ..casemax] of jump instruction
where casemin and casemax are the minimum and maximum values of the labels which
appear on the case limbs, and each jump instruction leads to the object code for the case
limb with the corresponding label value. Executing the case statement involves indexing
this array with the selector value and obeying the jump instruction selected. The run-time
occurrence of missing label values between casemin and casemax is detected at no extra cost
by inserting an error jump at the corresponding points in this table. Detecting the occurrence
of values outside the range involves a range check analogous to that used for program-
defined arrays. However, the use of a range-checking code sequence is not necessarily the
most effective way to implement it. If the implied range of the selector expression e exceeds
the required range, casemin ..casemax, by a reasonably small amount it is more efficient,
and not much more expensive in object program storage, to use a jump table of the form
array [implied range] of jum$ instruction
with the additional entries at either end filled with error jumps. The resultant checked case
statement executes with the same efficiency as its unchecked equivalent and the additional
overhead in object program storage depends only on the amount by which the implied
range exceeds the required range casemin ... casemax.

AN EVALUATION
Introduction of the range-checking system described into the Mk 2 1900 Pascal compiler
involved an increase in the compiler’s object code size of less than 3 per cent and a decrease
in its compilation speed of less than 1 per cent, so the compile-time cost of the optimized
range checking is not very significant.
The cost of the run-time checking which it generates has been measured for a number
of test programs introduced by other authors. Each program tested was run in four forms,
as follows:
RUN 1 in which all variables were declared as imprecisely as possible, i.e. making no
use of subrange or enumerated types, and generation of run-time checks was
suppressed ;
RUN 2 in which the program was as in Run 1, but generation of run-time checks was
requested ;
RUN 3 in which all variables were declared as precisely as possible, making full use of
subrange and enumerated types, and generation of tun-time checks was
suppressed ;
ECONOMIC RANGE CHECKS IN PASCAL 91

RUN 4 in which the program was as in Run 3, but generation of run-time checks was
requested.
For each run the length of object code produced by the program compiled, and the time
taken for its execution, were measured. I n Runs 1 and 2 the system obtained no assistance
from the imprecise declarations used. Run 1 thus provides a base measure of the unopti-
mized unchecked code produced, while Run 2 gives a measure of the additional cost of
run-time checking without range assistance. Run 3 then shows any optimization of the
unchecked code which range assistance enables, and Run 4 shows the reduced cost of run-
time checking with the same assistance.
The first programs tested were those for matrix multiplication and sorting used by Wirth
in testing the original Pascal compiler.6 The matrix program used was
program matrix;
const N = 50;
var I,J,K: integer; S: real;
A, B : array [l .. N , 1 ..N ] of real;
begin
for I : = 1 to N do
forJ:= 1 t o N d o A [ I , J ] : =1.0;
for I : = 1 to N do
for J : = 1 to N do
begin
s:=0;
for K : = 1 to N do S: = A[I,K]*A[K,J]+ S ;
B [I,J]:= s
end
end.
The program was used exactly as shown for Runs 1 and 2. I n Run 2 the compiler
introduced eight range checks, one for each subscript occurrence, and one overflow check
for the real arithmetic performed in the innermost loop. These resulted in a 60 per cent
increase in object code length and a 167 per cent increase in execution time!
For Runs 3 and 4 the declaration of I,J, K was rewritten as
I,J, K : 1 ..N ;
Run 3 gave identical results to Run 1, but in Run 4 the compiler was now able to exclude
all of the subscript range checks and introduced no new checks on the assignment of the
subrange variables by the for loops. The only remaining run-time check was that for real
overflow and the resultant increases in code length and execution time were 3 per cent and
4 per cent respectively.
Wirth’s second test program, for sorting, is shown below.
program sort ;
const N = 1000;
var I,J, K , M : integer;
A: array [l ..N ] of integer;
begin
for K : = 1 to N do A [ K ] : =K ;
for I : = 1 to N - 1 do
92 J. WELSH

begin
K := I ; M := A [ l J ;
forJ:= 1+1 to N d o
if A[J]> M then begin K : = J ; M : = A[J]end;
A [ K ]:= A [ I ]; A [ I ]:= M
end
end
Its use of index variables and for loops is similar to that of the matrix multiplication
program but it does illustrate the calculation of implied ranges for simple expressions, and
an opportunity for code optimization through range analysis.
Runs 1 and 2 were carried out with the program as shown. Run 2 introduced seven sub-
script range checks and an arithmetic overflow check for the expression I + 1 . These
resulted in a 61 per cent increase in object code length and a 96 per cent increase in execution
time compared with Run 1 .
For Runs 3 and 4 the declaration of I,J, K was rewritten as
I: 1 .. N - 1 ; (* or equivalent valid Pascal *)
J: 2..N;
K : 1..N;
In Run 3 the compiler was able to exclude the entry test on the inner for statement
giving a decrease of 6 per cent in object code length but a negligible decrease in execution
time compared with Run 1.
In Run 4 the compiler was able to exclude all subscript range checks and the overflow
check, and did not have to introduce any new checks on the values assigned to the subrange
variabIes (the latter would not have been the case had all three variables been declared of
type 1 ..N ) . Thus Run 4 gave exactly the same results as Run 3, i.e. a slight improvement
on the unchecked program in Run 1 .
The results for these two programs are highly satisfactory-in effect all run-time checking
of ranges (other than an unavoidable real overflow check) is eliminated. Although the
programs were not designed to test such a system they are clearly favourable to it, in the
simplicity of the subscripting and subscript calculation which they use. Many array manipu-
lation programs share this property, but some do not. It was felt therefore that the system
should be tested on programs which carry out subscript or bound scalar calculations in a
more complex manner. Such programs are used by Suzuki and Ishihata to demonstrate
their automatic bound checker. The fact that their checker was able to verify the subscripts
completely makes these programs a very direct challenge to the much simpler strategy
incorporated in the 1900 Pascal compiler.
One test used by Suzuki and Ishihata is a procedure which applies a binary search to a
sorted table of integers, as shown below.
*YPe
table = array [l ... 1001 of integer;
procedure binarysearchl (var A : table;
key : integer ;
var middle : integer) ;
var
low, high : integer;
begin
high:= 100; low:= 1 ;
while low< = high do
ECONOMIC RANGE CHECKS IN PASCAL 93

begin
middle := (low + high) div 2 ;
if A[middZe] = key
then low: = high + 1
else if A[middle] >key
then high: = middle - 1
else low: = middle f1
end ;
if A[middle] o key
then middle: = 0
end ;
The procedure was tested in a program which created a sorted table of the even numbers
from 2 to 200 and then searched it for all numbers, odd and even, from 1 to 201.
Runs 1 and 2 were carried with the procedure exactly as shown. Run 2 introduced three
subscript range checks and four overflow checks, resulting in a 39 per cent increase in
object code length and a 56 per cent increase in execution time compared with Run 1.
For Runs 3 and 4 the declarations of low, high and middle were replaced by
low: 1 .. 101;
high: 0 .. 100;
middle: 0 .. 100;
In Run 3 the compiler was now able to use a single shift instruction for the div 2
operation, which resulted in a 7 per cent decrease in object code length and a 8 per cent
decrease in execution time compared with Run 1.
In Run 4 the compiler eliminated all overflow checks but retained three partial subscript
checks (of the lower bound only) and introduced a similar partial check on the assignment
high: = middle - 1. As a result Run 4 still showed a 19 per cent increase in code length and a
17 per cent increase in execution time compared with Run 3.
This slightly disappointing result is due to the fact that the variable parameter ‘middle’
plays a dual role in Suzuki and Ishihata’s procedure. As an output parameter it carried a
result in the range 0 ... 100 back to the calling program, but within the search loop itself
it is used as a working variable, taking values in the range 1 .. 100 only. This distinction
can be made explicit by rewriting the procedure as shown.
procedure binarysearch2 (var A : table;
key : integer ;
var position : integer);
var
low, high, middle : integer ;
begin
high:= 100; low:= 1 ;
while low < = h$h do
begin
middle: = (low + high) div 2 ;
if A[middle] = key
then low := high + 1
else if A[middle] > key
then high :F middle - I
else low: = middle + 1
94 J. WELSH

end ;
if A[middEe] = key
then position := middle
else position := 0
end ;
It is noteworthy that this procedure will be more efficient than the original on most
implementations since the main loop body is expressed in terms of a directly accessibIe local
variable in place of an indirectly accessed variable parameter. Other possible improvements
in its efficiency are irrelevant to the present purpose.
Runs 1, 2, 3 and 4 were repeated for this revised procedure. Run 1 showed a 3 per cent
decrease in code length and a 10 per cent decrease in execution time compared with Run 1
of the original procedure. Run 2 again showed a 41 per cent increase in code length and a
62 per cent increase in execution time over Run 1.
For Runs 3 and 4, however, it was now possible to write the declarations of middle, low
and high as
.
middle: 1 . 100;
low: 1..101;
high: 0 .. 100;
Run 3 again showed a 7 per cent improvement in code length and a 9 per cent improvement
in execution time compared with Run 1, through the improved code for div 2.
I n Run 4 however the compiler was able to eliminate all of the checks retained in Run 4
of the original procedure, but had to introduce one new partial check, on the assignment
middle: = (low +high) div 2. The result was a 4 per cent increase in object code length and a
6 per cent increase in execution time compared with Run 3, or a net decrease of 4 per cent
in code length and 3 per cent in execution time compared with Run 1.
It is interesting to note that the residual run-time check in Run 4 is retained only to
+
detect the execution of the assignment middle: = (low high) div 2 with low = 1 and
h&h = 0, a combination which their declared ranges permit. However, the occurrence of
this combination at this particular point in the program is clearly excluded by the
preceding while condition low < high, a fact which standard program proving techniques
would easily detect. The possibility of including a limited exploitation of such locally
implied ranges for variables within a one-pass compilation scheme is discussed further in
the following section.
As a final, more general test the system was run on the Mk 2 compiler itself which is
again a Pascal program. T h e compiler makes little use of arrays and subscripted variables,
but depends heavily on set operations and case selection, which the preceding test programs
did not illustrate. The compiler had already been written with a systematic use of subrange
types to make explicit the perceived ranges of variables. It was impractical to replace these
declarations by ‘imprecise’ type declarations so the measurements taken in Runs 1 and 2
of the test programs could not be obtained. However, measurements corresponding to
Runs 3 and 4 were obtained by having the existing compiler compile itself with the
generation of run-time checks requested and then repeating this process using the compiler
so produced. The checks introduced caused a 3 per cent increase in the object code length
of the compiler and a 8 per cent increase in the time taken to compile itself. In addition,
the latter process detected several range violations which were clue to misconceptions by
the compiler programmers of the range of values actually taken by variables, and which
would otherwise have remained undetected !
ECONOMIC RANGE CHECKS I N PASCAL 95

The results for all the programs evaluated are summarized in Tables I and 11, which
show the significant ratios of object code lengths and execution times measured in the runs
described.

Table I. Object code length ratios

Program Run 2/Run 1 Run 3/Run 1 Run 4/Run 3 Run 4/Run 1


Matrix multiplication 1.60 1 .oo 1.03 1.03
Sorting 1 *61 0.96 1 .oo 0.96
Binary search 1 1.39 0.93 1.19 1.11
Binary search 2 1 *41 0.93 1.04 0.96
Mk 2 compiler 1 *03
~~~~~ ~ ~

Table 11. Execution-time ratios

Program Run 2/Run 1 Run 3/Run 1 Run 4/Run 3 Run 4/Run 1


Matrix multiplication 2.67 1.00 1-04 1-04
Sorting 1*96 1a00 1a 0 0 1a 0 0
Binary search 1 1.56 0.92 1.17 1-08
Binary search 2 1.62 0.91 1.06 0.97
Mk 2 compiler 1*08

CONCLUSIONS
For the programs used in the evaluation the range analysis scheme proved highly successful,
either eliminating the need for run-time checks entirely or reducing their run-time cost
to a level which might be tolerated throughout most programs’ productive lifetime.
For some environments, such as real-time programming, it may be argued that the
presence of any run-time check in a program is unacceptable, since no adequate response
to its detecting an error can be imagined. An alternative option which a range-checking
compiler might provide is to print out a list of all program points at which a run-time check
should have been introduced. It is then the programmer’s responsibility to convince
himself that the check is truly unnecessary. Provided the list of check points produced is
reasonably small such a facility might prove a cheap but positive aid to achieving program
correctness.
The major shortcoming of the system described is its behaviour in the presence of
unassigned variables. The detection of unassigned variables is a logically distinct problem
from that of range violations, and one which few implementors choose to tackle. However,
if the declared range of a variable is adopted as its implied range at a point where the
variable is or may be unassigned then the compiler may suppress a run-time check which
would otherwise reject the arbitrary value which the variable’s storage location might contain.
This suppression may in turn lead to an arbitrary overwriting of a program location in
the case of an unchecked array subscript, or to an arbitrary transfer of control in the case
of an unchecked case index. The loss of this fortuitous detection of unassigned variables
by range checking is not in itself important since many others may pass undetected in any
case, but the possibility of a consequent random overwrite or loss of control is disastrous if
postmortem diagnosis of the program’s failure is required.
96 J. WELSH

This ‘ill behaviour’ of a program which uses an unassigned variable could be prevented
by a modification of the range analysis scheme described earlier. A one-pass compiler could
easily divide the accessible variables at any point in a program into three classes.
1. Those definitely unassigned, i.e. those locals for which no possibility of assignment
has so far been encountered in the scope.
2. Those definitely assigned, i.e. those locals and non-locals for which a local assignment
is inevitably executed before reaching this point.
3. Those possibly assigned, i.e. other non-locals and those locals whose prior assignment
may be by-passed by some alternative control path, or depend on some other run-time
occurrence not perceptible to a one-pass compiler.
Any attempt to access the value of a variable in class (1) should be reported as an error at
compile time. For a variable in class (2) the compiler may safely adopt the declared range
as a description of the value accessed. However, for a variable in class (3) it may not, and
should instead adopt an implied range which reflects the complete set of bit patterns which
the storage allocated to the variable is capable of holding, since access of the (unassigned)
variable may yield any of these. By determining the need for run-time checks on the basis
of this implied range the compiler will generate all checks necessary to ensure the good
behaviour of the program should an unassigned access actually occur at run time.
The number of additional checks generated by this modification would depend on the
number of class (3) variable occurrences which are subject to range requirements. While a
considerable proportion of all variables occurring in programs fall into class (3), in
practice many of the variables which occur in range-critical contexts, e.g. as subscripts or
case indices, prove to be of class (Z),i.e. they are simple, local variables which are either
assigned at entry to their scope or controlled by a for loop in the area of use. It may be
therefore that the number of additional run-time checks introduced by this modified system
would be small for many programs. I n the test programs used to evaluate the 1900 Pascal
system all variables occurring as array subscripts fall into class (2) except for one occurrence
of the variable ‘middle’ outside the loop of the binary search procedure. The effect of the
proposed modification would thus be negligible for these programs.
This modification would recover the ‘good behaviour’ guarantee which programs obtain
from a conventional run-time checking system. However, it must be remembered that the
proportion of unassigned variables actually detected by the system remains arbitrary. It is
essential that further advances in language design or hardware design should make
guaranteed detection of unassigned variables an economic reality.
A more ambitious extension of the range analysis scheme described would be one which
deduces local implied ranges for variables from the statements compiled. For example, if an
assignment statement v: = e is encountered where the expression e has an implied range
which is less than the declared range of v then this implied range could be associated with
v until a point is reached at which the assignment statement does not necessarily represent
the last executed assignment to v.
The techniques required to record and discard such implied ranges are no different from
those used in many compilers to associate variable values with register contents during code
generation. For some programs such implied ranges would entirely override the declared
ranges supplied by the programmer, making careful choice of the latter unnecessary. The
matrix multiplication test is one such program. For others, however, programmer-declared
ranges would still play a vital role, providing an invariant range assertion which is otherwise
imperceptible to a one-pass compiler. The binary search program illustrates such a case.
ECONOMIC RANGE CHECKS I N PASCAL 97

Implied ranges for variables could also be deduced from the Boolean conditions tested
by control statements in the program. For example, in the binary search program the
condition for the while loop
low ,<high
taken in conjunction with the declared ranges
1<low < 101
06highGlOO
implies more restrictive ranges at entry to the loop body, uiz.
1 ,<low 6 100
1 G high G 100
As noted earlier, these ranges would exclude the need for the single run-time check
introduced by the 1900 compiler in Run 4 of the second search program.
However, the techniques required for deducing implied ranges from Boolean expressions
lie well beyond those normally found in one-pass compilers. It is an open question whether
their introduction would significantly improve the program verification capabilities of such
compilers, or whether in doing so the intrinsic advantages of one-pass compilation would
be lost.
T o summarize, the 1900 Pascal compiler shows that the cost of run-time checking can
be dramatically reduced by a simple range-analysis system. Future advances in formal
program verification may remove all responsibility for error checking from code-generating
compilers. I n the meantime there is much that can be achieved, and much still to be
investigated, even within the confines of a one-pass compilation scheme.

REFERENCES
1. N. Suzuki and K. Ishihata, ‘Implementation of an array bound checker’, Internal Report of Department
of Computer Science, Carnegie-Mellon University (1 976).
2. C. N. Fischer and R. J. Leblanc, ‘Efficient implementation and optimisation of run-time checking in
Pascal’, ACM Sigplan Notices, 12, 3, 19-24 (1977).
3 . J. Welsh, ‘Two ICL 1900 Pascal compilers’, in Pascal, tlze Language and Its Implementation (ed.
D. W. Barron), Wiley (in press).
4. C. A. R. Hoare and N. Wirth, ‘An axiomatic definition of the programming language Pascal’, Acta
Informatica, 2, 335-355 (1973).
5. K. Jensen and N. Wirth, ‘Pascal-user manual and report’, Lecture Notes in Computer Science, 18,
Springer Verlag (1974).
6 . N. Wirth, ‘The design of a Pascal compiler’, Software-Practice and Experience, 1, 4 , 309-333 (1971).

You might also like