Icse2009 Automatic Dimension Inference and Checking For Object-Oriented Programs
Icse2009 Automatic Dimension Inference and Checking For Object-Oriented Programs
This paper introduces UniFi, a tool that attempts to auto- MouseWheelEvent(Component source,
matically detect dimension errors in Java programs. UniFi int id, long when, int modifiers,
infers dimensional relationships across primitive type and int x, int y, int clickCount,
string variables in a program, using an inter-procedural, boolean popupTrigger,
context-sensitive analysis. It then monitors these dimen- int scrollType, int scrollAmount,
int wheelRotation);
sional relationships as the program evolves, flagging incon-
sistencies that may be errors. UniFi requires no programmer The interface of this method alone involves several variables
annotations, and supports arbitrary program-specific dimen- with independent dimensions, such as id, modifier, time, co-
sions, thus providing fine-grained dimensional consistency ordinate, number of clicks and scroll type. While even mod-
checking. UniFi exploits features of object-oriented lan- erately sized programs have hundreds of dimensional asso-
guages, but can be used for other languages as well. We ciations, there is no protection provided to the programmer
have run UniFi on real-life Java code and found that it is to keep unrelated dimensions from interacting, or to ensure
useful in exposing dimension errors. We present a case that they are combined only in dimensionally sound ways.
study of using UniFi on nightly builds of a 19,000 line code
Dimensions are currently coarsely simulated with pro-
base as it evolved over 10 months.
gramming language types, which do not provide adequate
granularity. For instance, an integer variable representing
a network port can be interchanged with another integer
1. Introduction representing a graphics color, and a string variable holding
a file name may be used in place of a host name; in nei-
Dimensional analysis is a simple and well-understood ther case will a conventional type checker complain. To get
way of checking physics equations for consistency. How- the benefit of dimensional analysis, the programmer would
ever, programming languages have poor support for check- have to go through a cumbersome process of defining cus-
ing dimensional consistency within programs. The work tom types for each dimension associated with a program,
reported in this paper is motivated by the many software er- and to specify legal ways in which they may interact. This
rors we have created, debugged, or otherwise encountered process is especially difficult and unnatural for values of
that could have been caught by a simple dimension or unit primitive types (e.g. integers and floating point numbers)
check. and strings, which is why we focus exclusively on these
types in this work.
Programmers often do have an implicit understanding
1.1 Dimensions in Programs about dimensions in the program, as indicated by the fact
that variable names often refer to dimensions. However,
without automatic dimension checking, it is easy for pro-
While dimensions in physics are traditionally associ- grammers to create obvious errors, for example due to poor
ated with physical quantities like mass, length and time, naming choices, or in parameter passing where the corre-
program variables have dimensions in a broader sense.
spondence is positional.
They represent sizes, dates, colors, IDs, counts, positions,
masks, ports, flags, states, file names, host names, ad- Prior research aimed at enforcing dimension checking in
dresses, properties, messages, and so on. Consider the programming language usually proposes addition of syntac-
following constructor declaration in the Java library class tic and type-checking support for some form of units and/or
java.awt.MouseWheelEvent: dimensions to the language, and expects programmers to
annotate their programs with extra information [1, 10, 19]).
156
Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on May 21,2024 at 08:44:47 UTC from IEEE Xplore. Restrictions apply.
RULE P ROGRAM S TATEMENT I NFERENCE
L OAD x1 = x2 .f ; δx1 = δf
S TORE x1 .f = x2 ; δf = δx2
but not their absolute semantics. For example, if two vari- 2.1 The Program Model
ables are added, we can infer that the variables must have
the same dimension, though we do not know a precise name Our inference algorithm is designed to find dimension
for that dimension. Note that a variable can also be inferred relationships between scalar and array variables of strings
to be dimensionless, i.e. a pure number. For example, if a and primitive types in Java: int, long, byte, char,
program contains a statement such as n = n2 or m = m/n, short, boolean, float and double. Though it han-
then n must necessarily be dimensionless for the program dles the full Java language, for the sake of brevity, we will
to be well-typed. use a simplified language as shown in Table 1. The table
provides a summary of the intraprocedural inference rules;
how we handle methods is described in Section 2.3.
We model dimensions in a program as follows:
In general, it is possible for dimensions to form a hier- • Each local variable x has dimension δx . Every use of
archy, in a way that is analogous to a class hierarchy. For a constant is treated as a separate local variable. (As a
example, it may be legal to add the number of apples to preprocessing step, we rename logically different vari-
the number of oranges, if the result is supposed to be the ables that happen to use the same local variable slot,
number of fruits. However, in this paper, we assume a flat using an analysis similar to deriving the static single
hierarchy for dimensions, i.e., there are no subtyping rela- assignment form [16]).
tionships between dimensions. This choice simplifies our • Fields, static or instance, are monomorphic. Field f of
analysis and keeps the number of false-positive warnings all instances of a class are assumed to have the same
low. dimension δf .
157
Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on May 21,2024 at 08:44:47 UTC from IEEE Xplore. Restrictions apply.
• All objects created at allocation site l are treated as dollars = yen;
having the same dimension δl . A multi-dimensional
array is modeled as an array of arrays. The dimension the analysis will infer that yen must have the same dimen-
of an array a, δa , has two related components: sion as dollars, and exchangeRate must therefore be
dimensionless. As will be discussed in Section 3, a change
1. Dimension of the elements, δa[] . All elements in in dimensions across versions causes UniFi to raise a warn-
an array are assumed to have the same dimension. ing of a potential error.
2. Dimension of the length of the array, δ|a| . A popularly cited example related to units errors is the
crash of the $125 million Mars Climate Orbiter due to a
• Methods can be polymorphic, allowing parameters in failure to convert between metric and English units in soft-
different invocations to take on different dimensions. ware [13]. However, not all unit errors can be caught using
The dimension summary of method m is expressed as our approach, especially in cases where the wrong scaling
constraints between class fields and parameters δm[i] , factor is used.
where δm[i] represents the the dimension of the ith pa-
rameter, and δm[0] is the dimension of the return value. 2.3 Interprocedural Analysis
2.2 Intraprocedural Inference Rules In the following, we first describe an additional inference
rule for methods, and then the inference algorithm itself.
Loads, stores, and assignments generate unification con-
straints between the dimensions of the source and destina- 2.3.1 Subtyping Constraints
tion variables. For array element access, the dimension of
the array index is unified with the length of the array. This The Liskov Substitution Principle in object-oriented pro-
can be useful to detect bugs where a dimensionally incorrect gramming requires that the contract of a method remain
variable is used to index an array. the same in all subtype implementations [12]. This im-
Addition, subtraction, and comparison are all operators plies the constraint that the dimensions associated with each
whose operands are expected to have the same dimension, method’s corresponding parameters and return value remain
whereas addition, subtraction and negation produce results the same as well. A stricter form of this constraint would be
of the same dimension as the operands. The result of a that a method in a subtype must have covariant parameters
remainder operator shares the same dimension as its first and contravariant return types. However, since we use a flat
operand. hierarchy, we simply require that the parameters and return
Multiply and divide operations are special from a dimen- values have the same dimension.
sional perspective because they produce results with com- In practice, we can just substitute a reference to method
posite dimensions. We refer to such constraints as com- T.m with a reference to T .m where T is the most generic
posite dimensional constraints. Given a statement x1 = class or interface of T that contains the same method inter-
x2 × x3 , we can infer that δx1 = δx2 × δx3 . Though not face m. If there are multiple most generic interfaces, then
detailed in Table 1 for purposes of brevity, UniFi also han- we generate constraints to unify the respective parameters
dles the semantics of the java.lang.Math library meth- and return values of m in all those interfaces. The con-
ods such as sqrt (the dimension of the return value is the straints inferred due to each method body implementing a
square root of the parameter), and abs, floor, round, given interface all update the constraints at the interface.
min, max (the dimension of the return value is the same as
the parameter). The pow method generates the appropriate 2.3.2 Polymorphic Dimension Inference
constraint if the exponent is a simple compile-time rational
From our early attempts of using a monomorphic dimen-
constant, otherwise it is ignored.
sion inference algorithm, we have found programs to have
Our dimension inference algorithm often provides con-
at least a few dimensionally polymorphic methods. Thus, it
straints we would normally associate with units, rather than
is important to have a context-sensitive dimension inference
dimensions. (Units are multiple scales for measuring the
algorithm to avoid conflating dimensions from different call
same dimension, e.g. foot and meter are both units of the
sites to polymorphic methods.
dimension length.) Given the program statement:
To achieve context sensitivity, our algorithm summarizes
yen = dollars * exchangeRate; the effect of methods with dimensional constraints between
input parameters and return values, and other global di-
our analysis infers that the dimension of exchangeRate mensions. These method summaries can involve unification
is the dimension of yen divided by the dimension of as well as composite dimensional constraints. These sum-
dollars. If we later encounter an erroneous statement: maries are applied at each call site. Our algorithm computes
158
Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on May 21,2024 at 08:44:47 UTC from IEEE Xplore. Restrictions apply.
the summaries of the methods iteratively until convergence 3.2 Comparing Unification Constraints
is reached.
Once we have identified variables that are common in
2.4 UniFi Usage the two program versions, we check if the dimensions for
these variables form the same equivalence classes in both
programs. If not, this means that some set of variables that
A user supplies each run of UniFi with a set of Java class were in different equivalence classes in one set of results
files. Class files may be compiled with or without debug were unified in the other; this is reported as a potential di-
information; if debug information is present, it is used to mensional consistency violation. We initially expected that
print accurate local variable and method parameter names. errors would be detected with unification constraints mainly
Since UniFi analysis runs directly on bytecode, the class when dimensions that used to be previously independent
files analyzed could potentially be generated from multiple in a program were subsequently unified. However, as de-
source languages like Java or Java Server Pages. Methods in scribed later in Section 4, we also uncover errors when di-
external libraries are treated as black boxes unless explicitly mensions that were in the same equivalence classes became
included in the set of classes to be analyzed. independent in a subsequent version.
With every variable, UniFi tracks all the constraints as- When a dimension error is reported, we find it useful to
sociated with it, and with every constraint, it tracks the point let users query in our GUI the smallest set of unifications
in the code that created the constraint. This information is that caused two variables to share the same dimension. The
essential for explaining intelligibly to the user why UniFi user can investigate these unifications and correlate each
inferred the dimensional constraints that it did. unification with the point in the source code that caused it.
The UniFi GUI lets users graphically browse the results
of an analysis run. The user can view all the inferred di- 3.3 Comparing Composite Dimension
mensions, and the variables in each one of them. The GUI Constraints
also correlates constraints with source code and lets the user
query why two variables were inferred to have the same di- Recall that the constraint generation phase sets up a sys-
mension. tem of composite dimension constraints involving different
dimensions. UniFi converts each constraint to the form:
3 Comparing Inferences Across Programs δ 1 e1 × δ 2 e2 × . . . × δ n en = 1
where δ1 , . . . , δn are all the dimensions in the program and
each exponent ei is a rational number. For example, the
We now describe how UniFi compares two sets of di-
statement E = m × c2 is converted to the constraint:
mensional constraints inferred by the algorithm described in
the previous section. There are three aspects to this compar- δE × δm −1 × δc −2 = 1.
ison. The first is to identify common dimension variables in It is useful to reduce this system of constraints to a
the two sets. The second is to check whether these variables canonical form, in order to enable simple comparison of
form the same equivalence classes. The third is to check constraints across the two versions of the program. A sec-
whether composite dimensional constraints are equivalent ond benefit is that we can ensure that the results of the anal-
in the two versions. ysis can be presented to programmers consistently, without
being perturbed by extraneous issues like the order in which
program statements are processed.
3.1 Identifying Common Variables We derive a canonical form for composite dimension
constraints by expressing dependent dimensions in terms
To enable comparison of inference results across two dif- of other, independent dimensions. Independent dimensions
ferent programs or two versions of the same program, we are just those that are not expressed in terms of others (sim-
first need to find correspondence between variables across ilar to the fundamental S.I. units in physics). However,
the two programs. A simple heuristic that we adopt in our our choice for selecting which dimensions to consider inde-
current implementation is the following. We match fields pendent can be somewhat arbitrary. For example, the con-
with the same fully qualified name. We match method straint:
parameters, return values and local variables by the full δE × δm −1 × δc −2 = 1
method signature and position of the parameter or local could result in the derivation of any of the following
variable. More robust association mechanisms that allow, equations, based on which two of the three dimensions
for example, for systematic re-factoring of code are possi- involved are considered independent:
ble.
159
Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on May 21,2024 at 08:44:47 UTC from IEEE Xplore. Restrictions apply.
δE = δm × δc2
Inputs:
δc = (δE /δm )1/2 1. A list of dimensions D: δ1 , . . . , δn in increasing order
δm = (δE /δc2 ) of priority.
We call the expression for a dependent dimension δv in 2. A set of constraints C = {c1 , c2 , . . . , cm }, where ci is
terms of a composition of other, independent dimensions a
δj eij = 1
formula for δv . 1≤j≤m
To derive canonical formulas for dependent variables, we and each exponent eij is a rational number.
first define a priority order for selection of independent di-
Outputs:
mensions. Elements with higher priority are preferred as
1. A set of dependent dimensions D ∈ D
independent dimensions to elements with a lower priority.
2. A set of constraints C = {ci | δi ∈ D }, where
Our goal is to derive a set of formulas for dependent vari-
ci is δi = Fδi , and
ables in terms of other independent variables that are all of Fδi is of the form δi+1 ei+1 × . . . × δn en .
higher priority.
Algorithm:
As the first step, we rewrite each constraint, replacing
each dimension with the highest priority dimension in the for j = 1 to n do
same equivalence class. Next, we reduce our system of pick some ci ∈ C such that eij = 0
constraints to canonical formulas using a Gaussian elimi- if no such ci exists
nation style algorithm. The algorithm is described in Fig- to the next j
continue
ure 1. It takes in a set of constraints and a priority order- Fδj ← ( 1≤k≤n,k=j δk eik )−1/eij
ing. It successively eliminates lower priority dimensions remove ci from C
from the constraint system by replacing them with formu- foreach c ∈ C do
las composed of higher priority dimensions. It then back- rewrite c replacing δj with Fδj
substitutes each of the dimensions in each formula, this time update exponent matrix e to reflect the exponents in the
rewritten constraint
going from the highest priority dimension downwards. This
for j = n down to 1 do
ensures that dependent dimensions are expressed as a func-
if Fδj is defined
tion of independent dimensions. rewrite Fδj replacing δk with Fδk ,
To compare two sets of composite dimension constraints, where k = j + 1, . . . , n and Fδk is defined
we adopt a priority order for independent dimensions in add δj to D and the constraint δj = Fδj to C
each program such that the dimensions of variables in com-
mon are ordered after those not in common. The common
dimensions are also sorted to ensure they have a consistent
order for both constraint systems. We then use the algo- Figure 1. Algorithm to canonicalize compos-
rithm in Figure 1 to generate formulas in each set of con- ite dimension constraints
straints. The formulas for the dependent dimensions that
are common are expressed in terms of other common di-
mensions when possible. Given these canonical formulas riod and we had easy access to the developer of the project
in both versions, it is simple to check whether the formulas for analyzing the results generated by UniFi.
for the common dependent variables are the same in both We built the main trunk of the repository as it existed
versions. If they are different, a possible dimension error is each night of this period. There were a total of 292 success-
flagged. An error can also be flagged if a variable that has ful builds. For each successful build (except the first), we
a non-empty dimensional formula in one version is flagged compared UniFi results with the results from the previous
as dimensionless in the other. build and reported differences in dimensional relationships.
All dates in this section are in Year-Month-Date format.
4. UniFi Case Study
4.1 Codebase Statistics
In this section, we report results of running UniFi over
the codebase of bddbddb, an open source program analysis Table 2 provides statistics about this codebase as of
toolset. This toolset is implemented in Java and is hosted 2004-10-01 and as of 2005-07-30. Typical analysis run-
at the public open source repository SourceForge. We ran time for each inference run was between 20 and 25 seconds
UniFi retrospectively over daily snapshots across 10 months on a 2.2GHz Intel CPU with 4 GB physical memory, and a
(from October 2004 to July 2005) when this project was un- 512MB JVM heap size. The row listing “number of distinct
der active development. We chose this project because we method interfaces” is different from the number of method
knew this project had evolved significantly during this pe- bodies analyzed because multiple methods could map to the
160
Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on May 21,2024 at 08:44:47 UTC from IEEE Xplore. Restrictions apply.
2004-10-01 2005-07-30 All three errors were independently discovered by the de-
Classes 179 226 veloper and fixed at some point after they were introduced.
Interfaces 14 14 Bug 1: A code update on 2004-11-07 caused
Method bodies 1,376 1,801 UniFi to issue a warning saying that the previ-
Distinct Method interfaces 889 1215 ously independent dimension variables associated
Binary jar file size (KB) 359 463 with the fields NO CLASS SCORE and NO CLASS
Lines of code (Non-blank, 14,379 19,119 in the class FindBestDomainOrder (in package
non-comment) net.sf.bddbddb) were now unified. The field
NO CLASS has the same dimension as a group of variables
Table 2. Codebase statistics referring to a class identifier; NO CLASS is used as a
Type of Dimension Variable Count special default value. The NO CLASS SCORE field has the
Field 258 same dimension as the score of a class, which is unrelated
Local variable 877 to a class identifier; once again, NO CLASS SCORE is a
Constant 2,244 special default value for the score of a class.
Method parameter 552 In method tryNewGoodOrder2 in this class, the fol-
Method return value 393 lowing initialization code was introduced:
Result of multiply/divide 102
Array element 129 ...
Array length 407 double vScore=NO_CLASS,aScore=NO_CLASS,
Total 4,962 dScore=NO_CLASS;
double vClass=NO_CLASS,aClass=NO_CLASS,
Table 3. Frequency of dimension variables dClass=NO_CLASS;
...
161
Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on May 21,2024 at 08:44:47 UTC from IEEE Xplore. Restrictions apply.
public Stratify(Solver solver) { Types of Errors: Many errors due to dimensional in-
this.solver = solver; consistency tend to be relatively simple errors, similar to
this.NOISY = solver.NOISY; the errors found by conventional type checking. However,
this.nodes = new HashMap(); we have also come across situations where a dimensional
this.emptyRelationNode = error has taken several days to debug. This is typical, for
getRelationNode(null);
example, when the range of the incorrect value is similar to
}
that of the correct value, such as a small integer. The use
The copy of TRACE was re-introduced in a bug fix a few of the incorrect value may not lead immediately to an ob-
days later on 2005-06-11. vious failure like a crash, and may even cause silent data
Bug 3 (not detected by UniFi): This bug illustrates a corruption.
limitation of UniFi: a dimension error was introduced in False Positives: We have not yet made significant ef-
new code. Since UniFi implicitly assumes dimensions asso- forts to reduce the false positive rate because, in practice,
ciated with variables that it has not seen before to be correct, the absolute number of false positives is small and has not
it did not generate a warning. Manual inspection of dimen- been problematic. Running on nightly builds, UniFi typi-
sions associated with new variables may therefore be useful cally issues a warning only once in a few days, and even
for detecting these kinds of errors. Our attention was fo- when the report does not point out a bug, it highlights new
cused on this example because UniFi correctly detected the relationships between variables in the code in interesting
change in dimension relationships when the bug was fixed. ways. One reason we see for false positives is when a field
The error was as follows: is declared in a class, but its functionality is initially left
The class net.sf.bddbddb.order.BaggedId3 unimplemented. When the field is eventually used, UniFi
has 2 fields: numClasses and NUM TREES, which have emits a warning because the dimension of the field merges
logically different dimensions. The dimensions for these with other dimensions in the program.
fields were being incorrectly merged due to the following Another reason for false positives is when a constant
loop in method distributionForInstance(): primitive type field switches between being declared final
and non-final. A final declaration causes the javac compiler
double[] distribution = to remove the reference to the field and replace its use with
new double[numClasses]; a compile time constant, making the field itself unused at
... //compute sum and initialize the bytecode level. We expect that it should be fairly easy
... // distribution array to eliminate or de-prioritize these kinds of false positives.
for (int i=0; i<NUM_TREES; ++i) A more interesting source of false positives (or lost pre-
distribution[i] /= sum; cision) is due to our treatment of all elements of an array
having the same type. Some coding patterns, most notably
The for loop above should have run for the range those involving reflection, assemble different kinds of vari-
[0..numClasses-1] instead of [0..NUM TREES-1]. The ables (e.g. strings) into a single array, thereby merging dis-
effect of this bug was to unify the dimensions of tinct dimensions. A possible workaround for this specific
numClasses and NUM TREES via the variable i and the pattern is to try and assign different dimension variables if
length of the distribution array; the bug fix correctly an array is only accessed with constant indices.
caused UniFi to report that these variables were now in dif- Dimensionally-inconsistent code: We find
ferent equivalence classes. that some methods, most notably those that
This bug was introduced in new code on 2004-11-09; it override Object.hashCode() or implement
was fixed on 2004-11-11. Comparable.compareTo(), intentionally perform di-
In our experiments with UniFi on bddbddb, no dimen- mensionally inconsistent computations. A common idiom
sional difference reports were issued which involved com- for hashCode() is to compute an arbitrary function of
pound constraints due to multiply/divide operations; all re- the object’s fields. A common idiom for compareTo() is
ports were related to unification of dimension variables. the following:
This probably reflects the fact that bddbddb does not per-
public int compareTo (Object o) {
form many multiply/divide operations, except a few for sta-
SomeClass other = (SomeClass) o;
tistical reporting. if (this.field1 != other.field1)
return (this.field1 - other.field1);
5 Discussion return this.field2 - other.field2;
}
In this section, we discuss our overall experience in using This idiom compares two objects of a type by first com-
UniFi on bddbddb as well as on other projects. paring a major field (field1) and then comparing a minor
162
Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on May 21,2024 at 08:44:47 UTC from IEEE Xplore. Restrictions apply.
field (field2) if the major fields are equal. However it tentially be used automatically to detect dimension errors,
conflates the dimensions of field1 and field2 by uni- these three systems depend on some form of user annota-
fying both with the dimension of the return value of the tion to seed the type system. Further, only Fortress exploits
compareTo method. Therefore, UniFi always excludes subtyping relationships in object-oriented languages.
method bodies for both of these special methods from its Lackwit is a tool that performs polymorphic type in-
analysis. ference on a C program to identify variables that are con-
Variable Naming: One side effect we have observed strained to have the same representation [14]. This informa-
during analysis of UniFi results is that browsing the vari- tion is used as an aid to performing software maintenance.
ables with a given dimension sometimes points to a poor However, Lackwit does not generalize its type inference to
choice of variable names. While generic local variable support dimensions and does not support arithmetic opera-
names like i and n commonly appear in different dimension tors like addition and multiplication.
classes, we also see cases where a variable’s dimension is Some prior work attempts to infer dimensional consis-
correct, but the variable is named misleadingly, reflecting tency at runtime. Guo et al attempt to infer abstract types
confusion in the mind of the programmer. Poor naming can by instrumenting a program and detecting interactions at
mislead the reader of the code about the semantics of the run time [8]. Their built-in interactions do not handle mul-
variable and cause a potential software maintenance prob- tiply and divide constraints using abelian groups. Petty’s
lem. proposed approach for Fortran aims to ensure dimensional
consistency (mainly for the S.I. units) by using a modified
6 Related Work real type to carry dimensional information [15].
Another body of related work uses the concept of type
qualifiers. The Cqual and Jqual frameworks (for C and
UniFi belongs to a general class of tools that attempt
Java programs respectively) provide generalized type infer-
to acquire specifications automatically by mining existing
ence and checking using type qualifiers. They let program-
software, either statically or dynamically. These approaches
mers assign qualifiers to type declarations, and describe
exploit the property that programs are often mostly correct,
how the type qualifiers interact with the operators of the
and can thus be a useful source of specifications. The in-
language [6, 7]. UniFi’s dimensions can be viewed as a
ferred specifications can be used to check for errors in many
particular class of type qualifiers, although UniFi does not
ways: for example, by detecting inconsistencies within the
support subtyping relationships between dimensions. How-
specifications, verifying the code statically against the spec-
ever, UniFi targets the specific domain of dimensions, and
ifications, or checking for violations of invariants in dy-
therefore embeds constraint rules and solvers specific to this
namic runs of the program [2, 5, 9, 11, 20]. In UniFi’s case,
domain. Uses of Cqual so far have been in the domain of
we attempt to use existing code to derive program specific
constant variable inference [6] and taint propagation[17];
dimensions and the relationships between them.
Jqual has been applied to enhance type checking with re-
Prior approaches to enforcing dimensional consistency
spect to native (JNI) code and for detecting immutable vari-
in software depend on programmers providing annotations
ables. JavaRI is another system that attempts to use type
or modifying programs in some way. Fortress is a research
checking to verify immutability properties of annotated Java
programming language from Sun Microsystems that pro-
programs [18]. UniFi’s general approach of inferring di-
vides support for units and dimensions in an object-oriented
mensions on one version of the program and using the re-
setting by extending the syntax and semantics of the Java
sults to check other versions of the program may be useful
programming language [1]. Van Delft proposes another
with other kinds of type qualifiers as well.
extension to Java to support dimensions [19]. The Xelda
system checks dimensional correctness of spreadsheets and
found bugs in several scientific computing spreadsheets ac- 7 Future Work
companying a textbook [3]. Osprey is a type-checking sys-
tem for C that tries to limit the programmer burden by re- As mentioned earlier, there are many different situations
quiring annotations on some set of variables, but inferring in which it may be useful to compare dimension inference
dimensions on others [10]. However, Osprey is limited to results. More work is needed to gain experience with UniFi
checking dimensions that are a function of a fixed set of in these situations. More experience is also needed with sci-
units, like the S.I. units. entific applications where there is an abundance of multiply
Like UniFi, the Fortress, Osprey and Xelda systems and divide relationships.
mentioned above use abelian groups to represent dimen- One promising approach is to use UniFi to infer dimen-
sional algebra constraints related to multiplication and di- sions on the implementation of popular Java libraries. After
vision, and employ techniques similar to Gaussian elimina- manual review and assignment of human-friendly names,
tion to solve such constraints. Unlike UniFi, which can po- the inferred dimensions can be output as type annotations
163
Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on May 21,2024 at 08:44:47 UTC from IEEE Xplore. Restrictions apply.
using the JSR-308 syntax. This will offer a considerable [3] T. Antoniu, P. A. Steckler, S. Krishnamurthi, E. Neuwirth,
amount of documentation to users of these libraries and a and M. Felleisen. Validating the unit correctness of spread-
way for compile time checkers to detect dimension errors. sheet programs. In ICSE ’04: Proceedings of the 26th Inter-
We could also extend UniFi to precompute summaries national Conference on Software Engineering, pages 439–
448. IEEE Computer Society, 2004.
of methods in popular libraries, so that the effect of these
[4] A. Buckley and M. D. Ernst. Java Specification Request-
methods can be accurately applied to programs that invoke
308: Annotations on Java Types.
them, without the need to analyze the libraries along with http://jcp.org/en/jsr/detail?id=308.
every program. [5] D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf.
Finally, we are interested in exploring dimensional con- Bugs as deviant behavior: a general approach to inferring
sistency in the context of hardware programs written in lan- errors in systems code. In SOSP ’01: Proceedings of the
guages like Verilog. Hardware programs have even less pro- Eighteenth ACM Symposium on Operating Systems Princi-
tection than software in terms of type checking, and infer- ples, pages 57–72. ACM, 2001.
ring dimensions in a sea of bits may be a useful way to find [6] J. S. Foster, M. Fähndrich, and A. Aiken. A theory of type
inconsistencies in the design. qualifiers. In PLDI ’99: Proceedings of the ACM SIGPLAN
1999 Conference on Programming Language Design and
We plan to release our UniFi implementation in
Implementation, pages 192–203. ACM, 1999.
open source form. Additional information and screen-
[7] D. Greenfieldboyce and J. S. Foster. Type qualifier infer-
shots of the UniFi GUI are available at the website: ence for Java. In OOPSLA ’07: Proceedings of the 22nd
http://cs.stanford.edu/˜hangal/unifi.html. annual ACM SIGPLAN Conference on Object-Oriented Pro-
gramming Systems and Applications, pages 321–336. ACM,
2007.
8 Conclusions [8] P. J. Guo, J. H. Perkins, S. McCamant, and M. D. Ernst.
Dynamic inference of abstract types. In ISSTA 2006, Pro-
We have shown that the UniFi approach is a practical ceedings of the 2006 International Symposium on Software
way to bootstrap the use of dimensional analysis into the Testing and Analysis, pages 255–265, July 18–20, 2006.
software development process. Dimension checking is use- [9] S. Hangal and M. S. Lam. Tracking down software bugs us-
ful for much more than scientific code; it is valuable in all ing automatic anomaly detection. In ICSE ’02: Proceedings
of the 24th International Conference on Software Engineer-
types of code because programs manipulate many different
ing, pages 291–301. ACM, 2002.
kinds of values that have dimensions associated with them.
[10] L. Jiang and Z. Su. Osprey: a practical type system for
validating dimensional unit correctness of c programs. In
9 Acknowledgments ICSE ’06: Proceedings of the 28th International Conference
on Software Engineering, pages 262–271. ACM, 2006.
[11] T. Kremenek, P. Twohey, G. Back, A. Ng, and D. En-
We thank Rajit Badgandi for working on early parts gler. From uncertainty to belief: inferring the specification
of the UniFi implementation, John Whaley for letting us within. In OSDI ’06: Proceedings of the 7th Symposium on
use bddbddb as a test case and providing us feedback on Operating Systems Design and Implementation, pages 161–
the bugs, Christopher Unkel for useful discussions and the 176. USENIX Association, 2006.
anonymous reviewers for valuable feedback. This work is [12] B. H. Liskov and J. M. Wing. A behavioral notion of subtyp-
ing. ACM Trans. Program. Lang. Syst., 16(6):1811–1841,
supported in part by the National Science Foundation un-
1994.
der TRUST grant #0424422 and a Stanford graduate student
[13] T. Mars Climate Orbiter Mishap Investigation Board. Phase
fellowship. 1 Report
ftp://ftp.hq.nsa.gov/pub/pao.reports/1999/MCO report.pdf.
References [14] R. O’Callahan and D. Jackson. Lackwit: a program under-
standing tool based on type inference. In ICSE ’97: Pro-
ceedings of the 19th International Conference on Software
[1] E. Allen, D. Chase, V. Luchangco, J.-W. Maessen, and Engineering, pages 338–348. ACM, 1997.
G. L. Steele, Jr. Object-oriented units of measurement. In [15] G. W. Petty. Automated computation and consistency
OOPSLA ’04: Proceedings of the 19th Annual ACM SIG- checking of physical dimensions and units in scientific pro-
PLAN Conference on Object-Oriented Programming, Sys- grams. Software: Practice and Experience, 31(11):1067–
tems, Languages, and Applications, pages 384–403. ACM, 1076, 2001.
2004. [16] B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Global value
[2] G. Ammons, R. Bodı́k, and J. R. Larus. Mining speci- numbers and redundant computations. In POPL ’88: Pro-
fications. In POPL ’02: Proceedings of the 29th ACM ceedings of the 15th ACM SIGPLAN-SIGACT symposium on
SIGPLAN-SIGACT Symposium on Principles of Program- Principles of programming languages, pages 12–27. ACM,
ming Languages, pages 4–16. ACM, 2002. 1988.
164
Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on May 21,2024 at 08:44:47 UTC from IEEE Xplore. Restrictions apply.
[17] U. Shankar, K. Talwar, J. S. Foster, and D. Wagner. De-
tecting format string vulnerabilities with type qualifiers. In
SSYM’01: Proceedings of the 10th Conference on USENIX
Security Symposium. USENIX Association, 2001.
[18] M. S. Tschantz and M. D. Ernst. Javari: Adding reference
immutability to Java. In Object-Oriented Programming Sys-
tems, Languages, and Applications (OOPSLA 2005), pages
211–230, October 18–20, 2005.
[19] A. van Delft. A Java extension with support for dimensions.
Software: Practice and Experience, 29(7):605–616, 1999.
[20] J. Whaley, M. C. Martin, and M. S. Lam. Automatic extrac-
tion of object-oriented component interfaces. In ISSTA ’02:
Proceedings of the 2002 ACM SIGSOFT International Sym-
posium on Software Testing and Analysis, pages 218–228.
ACM, 2002.
165
Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on May 21,2024 at 08:44:47 UTC from IEEE Xplore. Restrictions apply.