B Prolog PDF
B Prolog PDF
Neng-Fa Zhou
Department of Computer and Information Science
CUNY Brooklyn College & Graduate Center
[email protected]
submitted 5th October 2009; revised 1th March 2010; accepted 21st February 2011
Abstract
B-Prolog is a high-performance implementation of the standard Prolog language with sev-
eral extensions including matching clauses, action rules for event handling, finite-domain
constraint solving, arrays and hash tables, declarative loop constructs, and tabling. The
B-Prolog system is based on the TOAM architecture which differs from the WAM mainly
in that (1) arguments are passed old-fashionedly through the stack, (2) only one frame is
used for each predicate call, and (3) instructions are provided for encoding matching trees.
The most recent architecture, called TOAM Jr., departs further from the WAM in that
it employs no registers for arguments or temporary variables, and provides variable-size
instructions for encoding predicate calls. This paper gives an overview of the language
features and a detailed description of the TOAM Jr. architecture, including architectural
support for action rules and tabling.
KEYWORDS: Prolog, logic programming system
1 Introduction
Prior to the first release of B-Prolog in 1994, several prototypes had been de-
veloped that incorporated results from various experiments. The very first proto-
type was based on the Warren Abstract Machine (WAM) (Warren 1983) as imple-
mented in SB-Prolog (Debray 1988). In the original WAM, the decision on which
clauses to apply to a call is made solely on the basis of the type and sometimes
the main functor of the first argument of the call. This may result in unneces-
sary creation of choice points and repeated execution of common unification op-
erations among clauses in the predicate. The first experiment, inspired by the
Rete algorithm used in production rule systems (Forgy 1982), aimed at improv-
ing the indexing scheme of the WAM. The results from that experiment included
an intermediate language named matching clauses and a Prolog machine named
TOAM (Tree-Oriented Abstract Machine) which provided instructions for encod-
ing tries called matching trees (Zhou et al. 1990). Several other proposals had been
made with the same objective (Van Roy et al. 1987; Hickey and Mudambi 1989;
Kliger and Shapiro 1990), but these proposed schemes had the drawback of possi-
bly generating code of exponential size for certain programs.
2 N.F. Zhou
The WAM was originally designed for both software and hardware implemen-
tations. In the WAM, arguments are passed through argument registers so that
hardware registers can be exploited in native compilers and hardware implementa-
tions. In an emulator-based implementation, however, passing arguments through
registers loses its advantage since registers are normally simulated. The second
experiment, which took place during 1991-1994, was to have arguments passed old-
fashionedly through the stack as in DEC-10 Prolog (Warren 1977). The result from
that experiment was NTOAM (Zhou 1994). In this machine, only one frame is used
for each predicate call which stores a different set of information depending on the
type of the predicate. This architecture was later refined and renamed to ATOAM
(Zhou 1996b).
During the past fifteen years since its first release, B-Prolog has undergone several
major extensions and refinements. The first extension was to introduce a new type
of frame, called a suspension frame for delayed calls (Zhou 1996a). In WAM-based
systems, delayed calls are normally stored as terms on the heap (Carlsson 1987).
The advantage of storing delayed calls on the stack rather than on the heap is that
contest switching is light. It is unnecessary to allocate a frame when a delayed call
wakes up and deallocate it when the delayed call suspends again. This advantage
is especially important for programs where calls wake up and suspend frequently,
such as constraint propagators (Zhou 2006).
A delay construct like freeze is too weak for implementing constraint solvers.
New constructs, first delay clauses (Meier 1993; Zhou 1998) and then action rules
(Zhou 2006), were introduced into B-Prolog. While these new constructs give signif-
icantly more modeling power, they required only minor changes to the architecture:
for action rules, one extra slot was added into a suspension frame for holding events.
The action rule language serves well as a powerful and yet efficient intermedi-
ate language for compiling constraints over finite-domain variables. A constraint is
compiled into propagators defined in action rules that maintain some sort of con-
sistency for the constraint. The availability of fine-grained domain events facilitates
programming AC-4 like propagation algorithms (Zhou et al. 2006). As propagators
are stored on the stack as suspension frames, allocation of frames is not needed
to activate propagators and hence context switching among propagators becomes
faster.
Another major extension was tabling. Unlike OLDT (Tamaki and Sato 1986)
and SLG (Chen and Warren 1996) which rely on suspension and resumption of
subgoals to compute fixed points, the tabling mechanism, called linear tabling
(Zhou et al. 2001; Zhou et al. 2008), implemented in B-Prolog relies on iterative
computation of top-most looping subgoals to compute fixed points. Linear tabling is
simpler, easier to implement, and more space-efficient than SLG, but a naive imple-
mentation may not be as fast due to the necessity of re-computation. Optimization
techniques have been developed to make linear tabling competitive with SLG in time
efficiency by significantly reducing the cost of re-computation (Zhou et al. 2008).
For tabled predicates, a new type of frame was introduced into the architecture. Re-
cently, the tabling system has been modified to support table modes, which facilitate
describing dynamic programming problems (Guo and Gupta 2008; Zhou et al. 2010).
The Language Features and Architecture of B-Prolog 3
The PRISM system (Sato 2009) has been the main driving force for the design and
implementation of the tabling system in B-Prolog.
In 2007, B-Prolog’s abstract machine was replaced by a new one named TOAM
Jr. (Zhou 2007). This switch improved the speed of B-Prolog by over 60% on the
Aquarius benchmarks (Van Roy 1990). The old machine ATOAM, like the WAM,
has a very fine-grained instruction set in the sense that roughly each symbol in
the source program is mapped to one instruction. This fine granularity is a big
obstacle to fast interpretation due to the high dispatching cost commonly seen in
abstract machine emulators. The new machine TOAM Jr uses no temporary regis-
ters at all and provides variable-size specialized instructions for encoding predicate
calls. In WAM-based systems, similar efforts have also been made to specialize
and merge instructions to reduce the cost of interpretation (Santos Costa 1999;
Demoen and Nguyen 2000; Nässén et al. 2001; Morales et al. 2005).
The memory manager of B-Prolog has also been improved recently. B-Prolog
employs an incremental copying garbage collector (Zhou 2000) based on the one
proposed for the WAM by Older and Rummell (Older and Rummell 1992). Be-
cause of the existence of suspension frames on the stack, the garbage collector also
reclaims space taken by unreachable stack frames. The memory manager automat-
ically expands the stacks and data areas before they overflow, so applications can
run with any initial setting for the spaces as long as the overall demand for memory
can be met.
This paper overviews in Section 2 the language features of B-Prolog, gives in
Section 3 a detailed description of TOAM Jr., the architecture B-Prolog has evolved
into after nearly two decades, and summarizes in Section 4 the changes made to
the memory architecture for supporting action rules and tabling. The reader is
referred to (Zhou 2006) for a detailed description of architectural support for action
rules and to (Zhou et al. 2008) for a detailed description of the extension of the
architecture for tabling.
B may create choice points. When the event pattern E together with the enclosing
braces is missing, an action rule degenerates into a determinate matching clause.
A set of built-in events is provided for programming constraint propagators and
interactive graphical user interfaces. For example, ins(X) is an event that is posted
when the variable X is instantiated. A user program can create and post its own
events and define agents to handle them. A user-defined event takes the form of
event(X, O) where X is a variable, called a suspension variable, that connects the
event with its handling agents, and O is a Prolog term that contains the information
to be transmitted to the agents. The built-in post(E) posts the event E. In the next
subsection, we show the events provided for programming constraint propagators.
Consider the following examples:
echo(X),{event(X,Mes)}=>writeln(Mes).
ping(T),{time(T)} => writeln(ping).
The agent echo(X) echoes whatever message it receives. For example,
?-echo(X),post(event(X,hello)),post(event(X,world)).
outputs the message hello followed by world. The agent ping(T) responds to time
events from the timer T. Each time it receives a time event, it prints the message
ping. For example,
?-timer(T,1000),ping(T),repeat,fail.
creates a timer that posts a time event every second and creates an agent ping(T)
to respond to the events. The repeat-fail loop makes the agent perpetual.
The action rule language has been found useful for programming coroutining
such as freeze, implementing constraint propagators (Zhou 2006), and develop-
ing interactive graphical user interfaces (Zhou 2003). Action rules have been used
by (Schrijvers et al. 2006) as an intermediate language for compiling Constraint
Handling Rules and by (Zhou et al. 2011) to compile Answer Set Programs.
2.3 CLP(FD)
Like many Prolog-based finite-domain constraint solvers, B-Prolog’s finite-domain
solver was heavily influenced by the CHIP system (van Hentenryck 1989). The first
fully-fledged solver was released with B-Prolog version 2.1 in March 1997. That
solver was implemented with delay clauses (Zhou 1998). During the past decade,
the action rule language has been extended to support a rich class of domain events
(ins(X), bound(X),dom(X, E), and dom any(X, E)) for programming constraint
propagators (Zhou et al. 2006) and the system has been enriched with new domains
(Boolean, trees, and finite sets), global constraints, and specialized fast constraint
propagators. Recently, the two built-ins in/2 and notin/2 have been extended to
allow positive and negative table (also called extensional) constraints (Zhou 2009).
The following program solves the SEND + MORE = MONEY puzzle. The
call Vars in 0..9 is a domain constraint, which narrows the domain of each of
the variables in Vars down to the set of integers from 0 through 9. The call
6 N.F. Zhou
primal_dual(Xi,I,DualVarVector),var(Xi),
{dom_any(Xi,J)}
=>
arg(J,DualVarVector,Yj),
domain_set_false(Yj,I).
primal_dual(Xi,I,DualVarVector) => true.
Each time a value J is excluded from the domain of Xi, assume Yj is the Jth variable
in DualVarVector, then I must be excluded from the domain of Yj. We need to
exchange primal and dual variables and create a propagator for each dual variable
as well. Therefore, in total 2 × n propagators are needed.
Thanks to the employment of action rules as the implementation language, the
constraint solving part of B-Prolog is relatively small (3800 lines of Prolog code and
4500 lines of C code, including comments and empty lines) but its performance is
very competitive with other CLP(FD) systems (Zhou 2006). Moreover, the action
rule language is available to the programmer for implementing problem-specific
propagators.
nth(I,L,E) :- E @= L[I].
Note that, for the array access notation A[I], while it takes constant time to access
the Ith element if A is a structure, it takes O(I) time when A is a list.
outputs four tuples (a,1), (a,2), (b,1), and (b,2). The base foreach call has
the form:
where E1 in D1 is called an iterator (E1 is called the pattern and Di the collection
of the iterator), Goal is a callable term, and LocalV ars (optional) specifies a list
of variables in Goal that are local to each iteration. The pattern of an iterator is
normally a variable but it can be any term; the collection of an iterator is a list of
terms and the notation B1 ..Step..B2 denotes the list of numbers B1 , B1 +Step,
B1 +2 ∗ Step, . . ., B1 +k ∗ Step where B1 +k ∗ Step is the last element that does not
cross over B2 . The notation L..U is a shorthand for L..1..U . The foreach call
means that for each permutation of values E1 ∈ D1 , . . ., En ∈ Dn , the instance
Goal is executed after local variables are renamed.
In general, a foreach call may also take as an argument a list of accumulators
that can be used to accumulate values from each iteration. With accumulators,
we can use foreach to describe recurrences for computing aggregates. Recurrences
have to be read procedurally. For this reason, we adopt the list comprehension
notation for constructing lists declaratively. A list comprehension takes the form:
[T : E1 in D1 , . . ., En in Dn , LocalV ars,Goal]
where LocalV ars (optional) specifies a list of local variables, Goal (optional) is a
callable term. This construct means that for each permutation of values E1 ∈ D1 ,
. . ., En ∈ Dn , if the instance of Goal with renamed local variables is true, then T
is added into the list. A list of this form is interpreted as a list comprehension if it
occurs as an argument of a call to ’@=’/2 or in arithmetic constraints.
Calls to foreach and list comprehensions are translated into tail-recursive pred-
icates. For example, the call Xs @= [X : (X, ) in Ps] is translated into
dummy(Ps, L, []), Xs @= L
The Language Features and Architecture of B-Prolog 9
As can be seen in this example, there is little or no penalty to using these loop
constructs compared with using recursion.
The loop constructs considerably enhance the modeling power of CLP(FD). The
following gives two programs for the N-queens problem to illustrate different uses
of the loop constructs. Here is the first program:
queens(N,Qs):-
length(Qs,N),
Qs in 1..N,
foreach(I in 1..N-1, J in I+1..N,
(Qs[I] #\= Qs[J],
abs(Qs[I]-Qs[J]) #\= J-I)).
The call queens(N,Qs) creates a list Qs of N variables (one variable for each column),
declares the domain of each of the variables to be 1..N, and generates constraints
to ensure that no two queens are placed in the same row or the same diagonal.
The following program models the problem with Boolean constraints.
bool_queens(N,Qs):-
new_array(Qs,[N,N]),
Vars @= [Qs[I,J] : I in 1..N, J in 1..N],
Vars in 0..1,
foreach(I in 1..N, % one queen in each row
sum([Qs[I,J] : J in 1..N]) #= 1),
foreach(J in 1..N, % one queen in each column
sum([Qs[I,J] : I in 1..N]) #= 1),
foreach(K in 1-N..N-1, % at most one queen in each left-down diag
sum([Qs[I,J] : I in 1..N, J in 1..N, I-J=:=K]) #=< 1),
foreach(K in 2..2*N, % at most one queen in each left-up diag
sum([Qs[I,J] : I in 1..N, J in 1..N, I+J=:=K]) #=< 1).
The call Vars @= [Qs[I,J] : I in 1..N, J in 1..N] extracts the variables from
matrix Qs into list Vars. List comprehensions are used in aggregate constraints. For
example, the constraint sum([Qs[I,J] : J in 1..N]) #= 1 means that the sum
of the Ith row of the matrix is equal to 1.
The foreach construct of B-Prolog is different from the loop constructs provided
by ECLi PSe (Schimpf 2002). Syntactically, foreach in B-Prolog is a variable-length
call in which only one type of iterator, namely E in D, is used for iteration, and
an extra argument is used for accumulators if needed. In contrast, ECLi PSe pro-
vides a built-in, called do/2, and a base iterator, named fromto/4, from which
10 N.F. Zhou
six types of iterators are derived for describing various kinds of iteration and ac-
cumulation. In addition, in B-Prolog variables in a loop are assumed to be global
unless they are declared local or occur in the patterns of the iterators (global-by-
default). In contrast, in ECLi PSe variables are assumed to be local unless they are
declared global (local-by-default). From the programmer’s perspective, the necessity
of declaring variables is a burden in both approaches and no approach is uniformly
better than the other. Nevertheless, small loops tend to have fewer local variables
than global ones, and for them global-by-default tends to impose less a burden
than local-by-default. For example, while the two N-queens programs shown above
contain no declaration of local variables, in ECLi PSe the variables N and Qs would
have to be declared global. Large loop bodies, however, may require declaration
of more local variables than global ones, but my personal opinion is that large
loop bodies should be put in separate predicates for better readability. From the
implementation perspective, ECLi PSe ’s local-by-default can be easily implemented
by goal expansion while B-Prolog’s global-by-default requires analysis of variable
scopes. B-Prolog issues warnings for occurrences in loop goals of singleton variables
including anonymous variables.
Semantically, B-Prolog’s iterators are matching-based while ECLi PSe ’s iterators
are unification-based. In B-Prolog, iterators never change collections unless the goal
of the loop changes them explicitly. In contrast, in ECLi PSe variables in collections
can be changed during iterations even if the goal does not touch on the variables.
This implicit change of variables in collections may make loops less readable.
2.6 Tabling
Tabling has been found increasingly important not only for helping beginners write
workable declarative programs but also for developing real-world applications such
as natural language processing, model checking, and machine learning applications.
B-Prolog implements a tabling mechanism, called linear tabling (Zhou et al. 2008),
which is based on iterative computation of looping subgoals rather than suspension
of them to compute the fixed points. The PRISM system (Sato and Kameya 2001),
which heavily relies on tabling, has been the main driving force for the design and
implementation of B-Prolog’s tabling system.
The idea of tabling is to memorize the answers to tabled calls and use the answers
to resolve subsequent variant calls. In B-Prolog, as in XSB, tabled predicates are
declared explicitly by declarations in the following form:
:-table P1 /N1 ,. . .,Pk /Nk .
For example, the following tabled predicate defines the transitive closure of a rela-
tion as given by edge/2.
:-table path/2.
path(X,Y):-edge(X,Y).
path(X,Y):-path(X,Z),edge(Z,Y).
With tabling, any query to the program is guaranteed to terminate as long as the
term sizes are bounded.
The Language Features and Architecture of B-Prolog 11
By default, all the arguments of a tabled call are used in variant checking and
all answers are tabled for a tabled predicate. B-Prolog supports table modes, which
allow the system to use only input arguments in variant checking and table answers
selectively. The table mode declaration
:-table p(M1,...,Mn):C.
directs the system on how to do tabling on p/n, where C, called a cardinality limit,
is an integer which limits the number of answers to be tabled, and each Mi is a
mode which can be min, max, + (input), or - (output). An argument with the mode
min or max, called optimized, is assumed to be output. If the cardinality limit C
is 1, it can be omitted with the preceding ’:’. In the current implementation, only
one argument can be optimized. Since an optimized argument is not required to be
numeral and the built-in @</2 is used to select answers with minimum or maximum
values, multiple values can be optimized.
The system uses only input arguments in variant checking, disregarding all output
arguments. After an answer is produced, the system tables it unconditionally if the
cardinality limit is not yet reached. When the cardinality limit has been reached,
however, the system tables the answer only if it is better than some existing answer
in terms of the argument with the min or max mode. In this case, the new answer
replaces the worst answer in the table.
Mode-directed tabling in B-Prolog was motivated by the need to scale up the
PRISM system(Sato and Kameya 2001; Sato 2009; Zhou et al. 2010) for handling
large data sets. For a given set of possibly incomplete observed data, PRISM col-
lects all explanations for these data using tabling and estimates the probability
distributions by conducting EM learning (Dempster et al. 1977) on these explana-
tions. For many real-world applications, the set of explanations may be too large
to be completely collected even in compressed form. Mode-directed tabling allows
for collecting a subset of explanations.
Mode-directed tabling is in general very useful for declarative description of dy-
namic programming problems (Guo and Gupta 2008). For example, the following
program encodes Dijkstra’s algorithm for finding a path with the minimum weight
between a pair of nodes.
:-table sp(+,+,-,min).
sp(X,Y,[(X,Y)],W) :-
edge(X,Y,W).
sp(X,Y,[(X,Z)|Path],W) :-
edge(X,Z,W1),
sp(Z,Y,Path,W2),
W is W1+W2.
The table mode states that only one path with the minimum weight is tabled for
each pair of nodes.
Nobukuni Kino, originally for his K-Prolog system, and was ported to B-Prolog.
This bi-directional interface makes it possible for Java applications to use Prolog
features such as search and constraint solving, and for Prolog applications to use
Java resources such as networking, GUI, database, and concurrent programming
packages.
PRISM (Sato 2009): This is an extension of Prolog that integrates logic pro-
gramming, probabilistic reasoning, and EM learning. It allows for the description
of independent probabilistic choices and their logical consequences in general logic
programs. PRISM supports parameter learning. For a given set of possibly in-
complete observed data, PRISM can estimate the probability distributions to best
explain the data. This power is suitable for applications such as learning parameters
of stochastic grammars, training stochastic models for gene sequence analysis, game
record analysis, user modeling, and obtaining probabilistic information for tuning
systems performance. PRISM offers incomparable flexibility compared with specific
statistical model such as Hidden Markov Models (HMMs), Probabilistic Context
Free Grammars (PCFGs) and discrete Bayesian networks. PRISM is a product of
the PRISM team at Tokyo Institute of Technology led by Taisuke Sato.
CGLIB (Zhou 2003): This is a constraint-based high-level graphics library de-
veloped for B-Prolog. It supports over twenty types of basic graphical objects and
provides a set of constraints including non-overlap, grid, table, and tree constraints
that facilitates the specification of layouts of objects. The constraint solver of B-
Prolog serves as a general-purpose and efficient layout manager, which is signifi-
cantly more flexible than the special-purpose layout managers used in Java. The
library uses action rules available in B-Prolog for creating agents and programming
interactions among agents or between agents and users. CGLIB is supported in the
Windows version only.
Logtalk (Moura 2009): This is an extension of Prolog developed by Paulo Moura
that supports object-oriented programming. It runs with several Prolog systems.
Thanks to Paulo Moura’s effort, Logtalk has been made to run with B-Prolog
seamlessly. Logtalk can be used as a module system on top of B-Prolog.
The LP/MIP interface: B-Prolog provides an interface to LP/MIP (linear pro-
gramming and mixed integer programming) packages such as GLPK and CPLEX.
With the declarative loop constructs, B-Prolog can serve as a powerful modeling
language for LP/MIP problems.
append([],Ys,Ys).
append([X|Xs],Ys,[X|Zs]):-
append(Xs,Ys,Zs).
This program is translated equivalently into the following matching clauses with no
assumption on modes of arguments:
append(Xs,Ys,Zs),var(Xs) => append_aux(Xs,Ys,Zs) ?=>
append_aux(Xs,Ys,Zs). Xs=[],
append([],Ys,Zs) => Ys=Zs.
Ys=Zs. append_aux(Xs,Ys,Zs) =>
append([X|Xs],Ys,Zs) => Xs=[X|Xs1],
Zs=[X|Zs1], Zs=[X|Zs1],
append(Xs,Ys,Zs1). append(Xs1,Ys,Zs1).
The B-Prolog compiler does not infer modes but makes use of modes supplied by
the programmer to generate more compact canonical-form programs. For example,
with the mode declaration
:-mode append(+,+,-).
The append predicate is translated into the following canonical form:
append([],Ys,Zs) =>
Ys=Zs.
append([X|Xs],Ys,Zs) =>
Zs=[X|Zs1],
append(Xs,Ys,Zs1).
The compiler does not check modes at compile time or generate code for verifying
modes at runtime.
3.1.3 Registers
The following registers are used to represent the machine status (see Figure 1):
The HB register, which also exists in the WAM, is an alias for B->H. It is used in
checking whether or not a variable needs to be trailed. When a free variable is
bound, if it is a heap variable older than HB or a stack variable older than B, then
it is trailed.
The Language Features and Architecture of B-Prolog 15
A1..An: Arguments
AR: Parent frame pointer
CP: Continuation program pointer
BTM: Bottom of the frame
TOP: Top of the frame
Y1..Ym: Local variables
Where BTM points to the bottom of the frame, i.e., the slot for the first argument
A1, and TOP points to the top of the frame, i.e., the slot just next to that for the
last local variable Ym. The BTM slot was not in the original ATOAM (Zhou 1996b).
This slot was introduced to support garbage collection and event-driven action rules
which require a new type of frames called suspension frames (Zhou 2006). The AR
register points to the AR slot of the current frame. Arguments and local variables
are accessed through offsets with respect to the AR slot.
It is the caller’s job to place the arguments and fill in the AR and CP slots. The
callee fills in the BTM and TOP slots.
A choice point contains, besides the slots in a determinate frame, four slots lo-
cated between the TOP slot and local variables:
The CPF slot stores the program pointer to continue with when the current branch
fails. The slot H points to the top of the heap and T points to the top of the trail
stack when the frame was allocated. When a variable is bound, it must be trailed if
it is older than B or HB. When execution backtracks to the latest choice point, the
bound variables trailed on the trail stack between T and B->T are set back to free,
the machine status registers H and T are restored, and the program pointer P is set
to be B->CPF.
The original ATOAM presented in (Zhou 1996b) had another type of frame,
called non-flat, for determinate predicates that have non-flat or deep guards. This
frame was abandoned since it is difficult for the compiler to extract non-flat guards
to take advantage of this feature.
3.1.5 Assertions
The following assertions always hold during execution:
2. No older stack slot can reference a younger stack slot and no older heap
variable can reference a younger heap variable.
3. No slot in a frame can reference another slot in the same frame.
Assertions 1 and 2 are also enforced by the WAM. The third assertion is needed
to make dereferencing the arguments of a last call unnecessary when the current
frame is reused. To enforce this assertion, when two terms being unified are stack
variables, the unification procedure globalizes them by creating a new heap variable
and letting both stack variables reference it.
3 Arguments have positive offsets and local variables have negative offsets.
The Language Features and Architecture of B-Prolog 17
Control: Unify:
allocate det(i1 ,i2 ) unify constant(y, a)
allocate nondet(i1 ,i2 ) unify value(y1 , y2 )
return unify struct(y, f /n, z1 , . . . , zn )
fork(l) unify list(y, i, z1 , . . . , zi , zi+1 )
cut
fail Move:
move struct(y, f /n, z1 , . . . , zn )
Branch: move list(y, i, z1 , . . . , zi , zi+1 )
jmpn constant(y, lvar , lf ail , a)
jmpn struct(y, lvar , lf ail , f /n, y1 , . . . , yn ) Call:
switch on cons(y, lnil , lvar , lf ail , y1 , y2 ) call(p/n, z1 , . . . , zn )
hash(y, i, (val1 , l1 ), . . . , (vali , li ), lvar , lf ail ) last call(i, p/n, z1 , . . . , zn )
Example
The following shows a canonical-form program and its compiled code:
% p ?=> true.
% p => true.
p/0: allocate_nondet(0,8)
fork(l1)
return
l1: cut
return
18 N.F. Zhou
Since the predicate is nondeterminate and there is no local variable, the allocated
frame contains 8 slots reserved for saving the machine status.
unify list and unify struct instructions must dereference a tagged operand if
the operand is not a first-occurrence variable and globalize it if the dereferenced
term is a stack variable.
The following shows an example.
The argument slot with offset 1 allocated to the variable F is reused for L and later
also for X. Since L is a first-occurrence variable, it is encoded as the tagged operand
v(1). The variable X occurs twice in L=[X,X,a]. The first occurrence is encoded
as v(1) and the second one is encoded as u(1). The tagged operand c(a) encodes
the constant element a and the operand c([]) encodes the empty tail of the list.
call(p/n, z1, . . . , zn ){
for each zi (i = 1, ..., n) do
*TOP-- = value of zi
parent ar = AR;
AR = TOP;
AR->AR = parent ar;
AR->CP = P;
P = entrypoint(p/n);
}
After passing the arguments to the callee’s frame, the instruction also sets the AR
and CP slots of the frame, and lets the AR register point to the frame.
The value of each tagged operand zi is computed as follows. If it is v(k), then
the value is the address of the frame slot with offset k (it is initialized to be a
free variable) unless when k is 0, in which case the value is the content of the TOP
register. If it is u(k), then the value is the content of the frame slot with offset k.
Otherwise, the value is zi itself, which is a tagged constant.
20 N.F. Zhou
A last call instruction encodes the last call in the body of a determinate clause
or a clause in a nondeterminate predicate that contains cuts. For a nondeterminate
clause in a nondeterminate predicate that does not contain cuts, the last call is
encoded as a call instruction followed by a return instruction. Unlike the call
instruction which always allocates a new frame for the callee, the last call instruc-
tion reuses the current frame if it is a determinate frame or a choice point frame
whose alternatives have been cut off. The last call instruction takes an integer,
called layout bit vector, which tells what arguments are misplaced and hence need
to be rearranged into proper slots in the callee’s frame when the current frame is
reused. There is a bit for each argument and the argument needs to be rearranged
if its bit is 1.4
The following steps are taken to reuse the current frame: Firstly, all the misplaced
arguments that are tagged u are copied out to a temporary frame. Because of the
enforcement of assertion 3, it is unnecessary to fully dereference stack slots, but
free variables in the frame must be globalized since otherwise unrelated arguments
may be wrongly aliased. Constants and first-occurrence variables in the arguments
are not touched in this step. Secondly, if the arity of the current frame is different
from the arity of the last call, the AR and CP slots are moved. Thirdly, all misplaced
arguments are moved into the frame for the callee. For u-tagged arguments, the
values in the temporary frame are used instead of the old ones because the old
values may have been overwritten by other values. Finally, the AR register is set to
be AR+(AR->BTM)-n.
For example,
4 In the actual implementation, an integer with 28 bits is used for a layout vector. If the last call
has more than 28 arguments, then the last-call optimization is abandoned.
The Language Features and Architecture of B-Prolog 21
move_struct(y(-1),f/2,u(3),u(2)) % S=f(X,Y)
call(q/1,u(-1)) % q(S)
last_call(0b1011,r/4,u(1),u(2),u(3),v(0))
The binary literal ’0b1101’ is the layout bit vector for the last call which indicates
that all the arguments except for the second one (Y) are misplaced. The variable W
is a singleton variable in the clause and is encoded as v(0).
5 A variable with offset 0 is never stored in the current frame. Recall that the slot with offset 0
stores the pointer to the parent frame.
22 N.F. Zhou
u(i), and c(a)). Obviously, reckless introduction of specialized instructions will re-
sult in explosion of the emulator size and even performance degradation depending
on the platform.
A specialized instruction carries the number and the types of its operands in
its opcode. An instruction, named unify cons(y, z1, z2 ), is introduced to replace
unify list that has two operands. The unify cons instruction is further special-
ized so no operand is tagged.
Specialized instructions are introduced for unify struct that has up to two ar-
guments so no operand is tagged. Any unify struct instruction that has more than
two arguments is translated to a specialized instruction for the first two arguments
followed by unify arg instructions. In this way, no operand is tagged.
For the call instruction, specialized instructions in the form of call k u (k =
1, ..., 9) are introduced which carry k initialized variables as operands in addition
to the predicate symbol. Specialized instructions are also introduced for often-
occurring call patterns such as u, v, and uv.
Specialized versions of the last call instruction are introduced that carry in-
dices of misplaced arguments explicitly as operands. In general, a specialized in-
struction for a last call takes the form last call k(i1 , . . . , ik , p/n, z1 , . . . , zn ) where
the integers i1 , . . . , ik are indices of misplaced arguments that need to be rear-
ranged. The currently implemented abstract machine has three specialized instruc-
tions (k = 0, 1, 2). Further specialized instructions are used to encode tail-recursive
calls.
The same idea can be applied to merged instructions of unify and cut. Consider
the merged instruction unify constant cut(y, a). If y is a free variable, then cut
can be performed before y is bound to a. In this way, unnecessary trailing of y can
be avoided.
3.5 Discussion
Compiling a high-level language into an abstract or virtual machine has become a
popular implementation method, which has traditionally been adopted by compilers
for Lisp and Prolog, and recently made popular by implementations of Java and
Microsoft .NET. One of the biggest issues in designing an abstract machine concerns
whether to have arguments passed through registers or stack frames. Stack-based
abstract machines are more common than register-based machines as exemplified
by the Java Virtual Machine and Microsoft Intermediate Language.
One of the biggest advantages of passing arguments through stack frames over
through registers is that instructions for procedure calls need not take destinations
of arguments explicitly as operands. This leads to more compact bytecode and less
interpretation overhead as well. For historical reasons, most Prolog systems are
based on the WAM, which is a register machine, except for B-Prolog which is based
on a stack machine called ATOAM. Even ATOAM retains registers for temporary
variables.
For Prolog, a register machine such as the WAM does have its merits even when
registers are normally simulated. Firstly, no frame needs to be created for determi-
nate binary programs. Secondly, registers are represented as global variables in C
and the addresses of the variables can be computed at load time rather than run
time. Thirdly, in some implementations a register never references a stack slot, and
hence when building a compound term on the heap the emulator needs not derefer-
ence a component if it is stored in a register. In a highly specialized abstract machine
such as the one adopted in Quintus Prolog (according to (Nässén et al. 2001)), the
registers an instruction manipulates can be encoded as part of the opcode rather
than taken explicitly as operands. In this way, if the emulator is implemented in
an assembly language to which hardware registers are directly available, abstract
machine registers can be mapped to native registers.
Nevertheless, using registers has more cons than pros for Prolog emulators. Firstly,
as mentioned above, instructions for procedure calls have to carry destination reg-
isters as operands which results in less compact code. Secondly, long-lived data
stored in registers have to be saved in stack frames and loaded later when they
are used. In Prolog, variables shared by multiple chunks6 or multiple clauses are
long-lived. Thirdly, the information in a frame cannot be easily reused by the
last call if the clause of the frame contains multiple chunks. Extra efforts are
needed to reuse frames for such clauses (Demoen and Nguyen 2008a; Meier 1991).
Finally, registers make it more expensive to interpret tagged operands and harder
6 A chunk consists of a non-inline call preceded by inline calls. The head of a clause belongs to
the chunk of the first non-inline call in the body.
24 N.F. Zhou
This TOAM architecture has been extended to support action rules (Zhou 2006)
and tabling (Zhou et al. 2008). This section overviews the changes to the memory
architecture.
The STATE slot indicates the current state of the frame, which can be start, sleep,
woken, or end. The suspension frame enters the start state immediately after it is
created and remains in it until the call is suspended for the first time or it is ended
because no action rule is applicable. Normally a suspension frame transits to the
end state through the sleep and woken states, but it can transit to the end state
directly if its call is never suspended. The EVENT slot stores the most recent event
that activated the call. The REEP slot stores the program pointer to continue when
the call is activated. The PREV slot stores the pointer to the previous suspension
frame.
Consider, for example, the following predicate:
x_in_c_y_ac(X,Y,C),var(X),var(Y),
{dom(Y,Ey)}
=>
Ex is C-Ey,
domain_set_false(X,Ex).
x_in_c_y_ac(X,Y,C) => true.
frame is copied, and the copy and the original frame are connected to the active
chain, each holding one of the events in the EVENT slot.
With suspension frames on the stack, the active chain is no longer chronological.
Figure 3 illustrates such a situation. The frames f1 and f2 are suspension frames,
f3 is the latest choice point frame, and f4 is a determinate frame. The execution of
f4 was interrupted by an event that woke up f1 and f2. The snapshot depicts the
moment immediately after the two woken frames were added into the active chain
and f2 became the current active frame.
Placing delayed calls as suspension frames on the stack makes context switch-
ing light. It is unnecessary to allocate a frame when a delayed call wakes up
and deallocate it when the delayed call suspends again. Nevertheless, the non-
chronologicality of the active chain on the stack requires run-time testing to de-
termine if the current frame can be deallocated or reused. Moreover, unreachable
frames on the stack need to be garbage collected (Zhou 2000). A different scheme
has been proposed which stores the WAM environments for delayed calls on the
heap (Demoen and Nguyen 2008b), but this scheme also complicates memory man-
agement.
table, a pointer to the answer table for the subgoal, a pointer to the strongly con-
nected component (SCC) to which the subgoal belongs, a word that indicates the
state of the subgoal (e.g., whether the subgoal is complete, whether the subgoal is
a looping one, and whether the answer table has been updated during the current
round of evaluation). The answers in the answer table constitute a chain with a
dummy answer sitting in the front. In this way, answers can be retrieved one by
one through backtracking.
The frame, called a tabled frame, for a subgoal of tabled predicate contains the
following two slots in addition to those slots stored in a choice point frame:
The SubgoalTable points to the subgoal table entry, and the CurrentAnswer points
to the current answer that has been consumed. The next unconsumed answer can
be reached from this reference.
When a tabled predicate is invoked by a subgoal, a tabled frame is pushed onto
the stack. The subgoal table is looked up to see if a variant of the subgoal exists.
If so, the SubgoalTable slot is set to point to the entry and CurrentAnswer is set
to point to the first answer in the answer table (recall that the first answer is a
dummy). If the state of the entry is complete, the subgoal only consumes existing
answers one by one through backtracking. If the state of the entry is not complete,
the subgoal is resolved using clauses if it appears for the first time and using existing
answers if it has occurred before in the current round of evaluation. If no variant of
the subgoal exists in the subgoal table, then an entry is allocated and the subgoal
is resolved using clauses.
After all clauses are tried on a tabled subgoal, a test is performed to see if the
subgoal is complete. A subgoal is complete if it has never occurred in a loop, or it
is a top-most looping subgoal and none of the subgoals in its SCC has obtained any
new answer during the current round of evaluation. The execution of a top-most
looping subgoal is iterated until it becomes complete. When a top-most looping
subgoal becomes complete, all the subgoals in its SCC become complete as well.
As can be seen, the change to the architecture is minimal for supporting lin-
ear tabling. Unlike in the implementations of SLG, no effort is needed to preserve
states of tabled subgoals and the garbage collector is kept untouched in linear
tabling. Linear tabling is more space efficient than SLG since no stack frames
are frozen for tabled subgoals. Nevertheless, linear tabling without optimization
could be computationally more expensive than SLG due to the necessity of re-
computation (Zhou et al. 2008).
5 Final Remarks
This paper has surveyed the language features of B-Prolog and given a detailed
description of TOAM Jr. with architectural support for action rules and tabling.
B-Prolog has strengths and weaknesses. The competitive Prolog engine, the cutting-
28 N.F. Zhou
edge CLP(FD) system, and the efficient tabling system are clear advantages of
B-Prolog. With them, B-Prolog serves well the core application domains such as
constraint solving and dynamic programming. We will further strengthen B-Prolog
as a tool for these applications. Future work includes parallelizing action rules
for better performance in constraint solving and improving the tabling system to
enhance the scalability of B-Prolog for large-scale machine-learning applications.
The shortcomings of B-Prolog are also obvious. The lack of certain functionalities
such as a module system, native interfaces with database and networking libraries,
and support of unicode increasingly hinders the adoption of B-Prolog in many
other application domains. Additions of these new features are also part of the
future work.
Acknowledgements
Very early experiments were conducted while the author was a PhD student at
Kyushu University during 1988-1991. The first working system and the versions up
to 4.0 were built while the author was with Kyushu Institute of Technology dur-
ing 1991-1999. Most recent improvements and enhancements have been conducted
at Brooklyn College of the City University of New York. The B-Prolog system is
indebted to many people in the logic programming community. I wish to express
my gratitude to Taisuke Sato and Yoshitaka Kameya for their support, encour-
agement, and propelling. Their PRISM system has been a strong driving force for
recent improvements in the tabling system and memory management. The countless
feedbacks from the PRISM team greatly helped enhance the robustness of the sys-
tem. Special thanks are also due to Bart Demoen for his intensive scrutiny of both
the design and the implementation of B-Prolog, Yi-Dong Shen for his cooperation
on linear tabling, and Paulo Moura and Ulrich Neumerkel for helping make the core
part of B-Prolog more compatible with the ISO standard. Thanks also go to the
anonymous referees and the editors, Maria Garcı́a de la Banda and Bart Demoen,
for their detailed comments and guidances on the presentation. B-Prolog-related
projects have received numerous grants from various funding organizations, most
recently from AIST, CISDD, PSC CUNY, and NSF.
References
Carlsson, M. 1987. Freeze, indexing, and other implementation issues in the WAM. In
Proceedings of the International Conference on Logic Programming (ICLP). 40–58.
Chen, W. and Warren, D. S. 1996. Tabled evaluation with delaying for general logic
programs. Journal of the ACM 43, 1, 20–74.
Debray, S. K. 1988. The SB-Prolog System, Version 3.0. SUNY Stony Brook.
Demoen, B. and Nguyen, P.-L. 2000. So many WAM variations, so little time. In
Proceedings of the International Conference on Computational Logic (CL). LNAI, vol.
1861. 1240–1254.
Demoen, B. and Nguyen, P.-L. 2008a. Environment reuse in the WAM. In Proceedings
of the International Conference on Logic Programming (ICLP). 698–702.
The Language Features and Architecture of B-Prolog 29
Demoen, B. and Nguyen, P.-L. 2008b. Two WAM implementations of action rules. In
Proceedings of the International Conference on Logic Programming (ICLP). 621–635.
Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from
incomplete data via the EM algorithm. Proceedings of the Royal Statistical Society,
1–38.
Forgy, C. L. 1982. Rete: A fast algorithm for the many pattern/many object pattern
match problem. In Artificial Intelligence. Vol. 19. 17–37.
Guo, H.-F. and Gupta, G. 2008. Simplifying dynamic programming via mode-directed
tabling. Softw., Pract. Exper. 38, 1, 75–94.
Hickey, T. J. and Mudambi, S. 1989. Global compilation of Prolog. Journal of Logic
Programming 7, 3, 193–230.
Kliger, S. and Shapiro, E. Y. 1990. From decision trees to decision graphs. In Pro-
ceedings of the North American Conference on Logic Programming (NACLP). 97–116.
Maier, D. and Warren, D. S. 1988. Computing with Logic: Logic Programming with
Prolog. The Benjamin/Cummings Publishing Company.
Meier, M. 1991. Recursion versus iteration in Prolog. In Proceedings of the International
Conference on Logic Programming (ICLP). 157–169.
Meier, M. 1993. Better late than never. In ICLP-Workshop on Implementation of Logic
Programming Systems. 151–165.
Mohr, R. and Henderson, T. C. 1986. Arc and path consistency revisited. Artificial
Intelligence 28, 225–233.
Morales, J. F., Carro, M., Puebla, G., and Hermenegildo, M. V. 2005. A gen-
erator of efficient abstract machine implementations and its application to emulator
minimization. In Proceedings of the International Conference on Logic Programming
(ICLP). 21–36.
Moura, P. 2009. From plain Prolog to Logtalk objects: Effective code encapsulation and
reuse. In Proceedings of the International Conference on Logic Programming (ICLP).
23.
Nässén, H., Carlsson, M., and Sagonas, K. F. 2001. Instruction merging and spe-
cialization in the SICStus Prolog virtual machine. In Proceedings of the International
Conference on Principles and Practice of Declarative Programming (PPDP). 49–60.
Older, W. J. and Rummell, J. A. 1992. An incremental garbage collector for WAM-
based Prolog. In Proceedings of the Joint International Conference and Symposium on
Logic Programming (JICSLP). 369–383.
Ramakrishnan, I., Rao, P., Sagonas, K., Swift, T., and Warren, D. 1998. Efficient
access mechanisms for tabled logic programs. Journal of Logic Programming 38, 31–54.
Sagonas, K. and Swift, T. 1998. An abstract machine for tabled execution of fixed-
order stratified logic programs. ACM Transactions on Programming Languages and
Systems 20, 3, 586–634.
Santos Costa, V. 1999. Optimizing bytecode emulation for Prolog. In Proceedings of
the International Conference on Principles and Practice of Declarative Programming
(PPDP). LNCS 1702, 261–277.
Santos Costa, V., Sagonas, K. F., and Lopes, R. 2007. Demand-driven indexing of
Prolog clauses. In Proceedings of the International Conference on Logic Programming
(ICLP). 395–409.
Sato, T. 2009. Generative modeling by PRISM. In Proceedings of the International
Conference on Logic Programming (ICLP). 24–35.
Sato, T. and Kameya, Y. 2001. Parameter learning of logic programs for symbolic-
statistical modeling. Journal of Artificial Intelligence Research, 391–454.
30 N.F. Zhou
Zhou, N.-F., Shen, Y.-D., Yuan, L., and You, J. 2001. Implementation of a linear
tabling mechanism. Journal of Functional and Logic Programming 2001(1), 1–15.
Zhou, N.-F., Takagi, T., and Ushijima, K. 1990. A matching tree oriented abstract ma-
chine for Prolog. In Proceedings of the International Conference on Logic Programming
(ICLP). 159–173.
Zhou, N.-F., Wallace, M., and Stuckey, P. J. 2006. The dom event and its use in
implementing constraint propagators. Technical report TR-2006013, CUNY Compute
Science.