0% found this document useful (0 votes)
316 views

1.1.rete - A Fast Algorithm For The Many Pattern, Many Object Pattern Match Problem

This document describes the Rete algorithm, an efficient method for matching a large number of patterns against a large number of objects. It was developed for use in production system interpreters. The algorithm works by storing partial match results in a network of nodes. This allows it to efficiently determine which patterns are satisfied by the current set of objects in working memory with minimal re-checking as objects are modified. The document explains the basic concepts of the algorithm and how objects and patterns should be represented to allow efficient implementation of the algorithm. It also provides details of a specific implementation.

Uploaded by

cfscholl
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
316 views

1.1.rete - A Fast Algorithm For The Many Pattern, Many Object Pattern Match Problem

This document describes the Rete algorithm, an efficient method for matching a large number of patterns against a large number of objects. It was developed for use in production system interpreters. The algorithm works by storing partial match results in a network of nodes. This allows it to efficiently determine which patterns are satisfied by the current set of objects in working memory with minimal re-checking as objects are modified. The document explains the basic concepts of the algorithm and how objects and patterns should be represented to allow efficient implementation of the algorithm. It also provides details of a specific implementation.

Uploaded by

cfscholl
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

ARTIFICIAL INTELLIGENCE 17

Rete: A Fast Al gori thm for the


Many Pattern/ Many Object
Pattern Match Problem*
Cha r l e s L. Fo r g y
Department of Computer Science, Carnegie-Mellon University,
Pittsburgh, PA 15213, U.S.A.
Recommended by Harry Barrow
ABSTRACT
The Rete Match Algorithm is an efficient method for comparing a large collection of patterns to a large
collection of objects. It finds all the objects that match each pattern. The algorithm was developed for use in
production system interpreters, and it has been used for systems containing from a f ew hundred to more
than a thousand patterns and objects. This article presents the algorithm in detail. It explains the basic
concepts of the algorithm, it describes pattern and object representations that are appropriate for the
algorithm, and it describes the operations performed by the pattern matcher.
1. Introduction
In many pattern/many object pattern matching, a collection of patterns is
compared to a collection of objects, and all the matches are determined. That
is, the pattern matcher finds every object that matches each pattern. This kind
of pattern matching is used extensively in Artificial Intelligence programs
today. For instance, it is a basic component of production system interpreters.
The interpreters use it to determine ~vhich productions have satisfied condition
parts. Unfortunately, it can be slow when large numbers of patterns or objects
are involved. Some systems have been observed to spend more than nine-
tenths of their total run time performing this kind of pattern matching [5]. This
*This research was sponsored by the Defense Advanced Research Projects Agency (DOD),
ARPA Order No. 3597, monitored by the Air Force Avionics Laboratory under Contract
F33615-78-C-1551.
The views and conclusions contained in this document are those of the author and should not be
interpreted as representing the ottieial policies, either expressed or implied, of the Defense
Advanced Research Projects Agency or the US Government.
Artificial Intelligence 19 (1982) 17-37
I X104- 3702/ 82/ ~/ $02. 75 O 1982 North-Holland
18 C.L. FORGY
article describes an al gori t hm t hat was desi gned t o make many pat t er n/ many
obj ect pat t ern mat chi ng less expensive. The al gori t hm was devel oped for use in
product i on syst em i nt erpret ers, but since it shoul d be useful for ot her lan-
guages and systems as well, it is present ed in detail.
Thi s article at t ends t o two compl ement ar y aspects of efficiency: (1) designing
an al gori t hm for t he task and (2) i mpl ement i ng t he al gori t hm on t he comput er.
The rest of Section 1 provides some backgr ound i nformat i on. Section 2
present s t he basic concept s of t he al gori t hm. Section 3 explains how t he obj ect s
and pat t erns shoul d be r epr esent ed t o allow t he most efficient i mpl ement at i ons.
Section 4 describes in detail a very fast i mpl ement at i on of t he algorithm.
Finally, Section 5 present s some of t he results of t he analyses of t he algorithm.
1.1. oPs5
The met hods descri bed in this article were devel oped for product i on system
i nt erpret ers, and t hey will be illustrated with exampl es drawn from product i on
systems. This section provides a brief i nt roduct i on t o t he l anguage used in t he
examples, oPs5. For a mor e compl et e description of oPS5, see [6].
A product i on syst em pr ogr am consists of an unor der ed collection of If-Then
st at ement s called productions. The dat a oper at ed on by t he product i ons is hel d
in a global dat a base called working memory. By convent i on, t he If part of a
product i on is called its LHS (l eft -hand side), and its Then part is called its RHS
(right-hand side). The i nt er pr et er execut es a product i on syst em by performi ng
t he following operat i ons.
(1) Match. Eval uat e t he LHSs of t he product i ons t o det er mi ne which are
satisfied given t he current cont ent s of worki ng memor y.
(2) Conflict resolution. Select one product i on with a satisfied LHS; if no
product i ons have satisfied LHSs, halt t he i nt erpret er.
(3) Act. Perform t he actions in t he RHS of t he selected product i on.
(4) Got o 1.
oPss worki ng memori es typically cont ai n several hundr ed objects, and each
obj ect typically has bet ween ten and one hundr ed associated at t ri but e-val ue
pairs. An obj ect t oget her with its at t ri but e-val ue pairs is called a working
memory element. The following is a typical, t hough very small, oPs5 worki ng
memor y el ement ; it indicates t hat t he obj ect of class Expressi on which is
named Expr l 7 has 2 as its first argument , ' *' as its operat or, and X as its second
argument .
(Expression 1' Name Expr l 7 1' Ar gl 2 1' Op * ~' Arg2 X)
The 1' is t he oPs5 oper at or t hat distinguishes at t ri but es from values.
The LHS of a product i on consists of a sequence of pat t erns; t hat is, a
sequence of partial descriptions of worki ng memor y el ement s. When a pat t ern
P describes an el ement E, P is said t o match E. In some product i ons, some of
THE RETE MATCH ALGORITHM 19
the patterns are preceded by the negation symbol, -. An LHS is satisfied when
(1) Every pattern that is not preceded by- ma t c he s a working memory
element, and
(2) No pattern that is preceded by- ma t c he s a working memory element.
The simplest patterns contain only constant symbols and numbers. A pattern
containing only constants matches a working memory element if every constant
in the pattern occurs in the corresponding position in the working memory
element. (Since patterns are partial descriptions, it is not necessary for every
constant in the working memory element to occur in the pattern.) Thus the
pattern
(Expression 1' Op * 1' Arg2 0)
woul d match t he element
(Expression 1' Name Expr86 1' Ar gl X 1' Op * 1` Arg2 0)
Many non-constant symbols are available in oPS5 for definfng patterns, but the
two most important are variables and predicates. A variable is a symbol that
begins with the character '(' and ends with the character ' ) ' - - f or example (X). A
variable in a pattern will match any value in a working memory element, but if a
variable occurs more than once in a production' s LHS, all occurrences must match
the same value. Thus the pattern
(Expression T Ar gl (VAL) T Arg2 (VAL))
would match either of the following
(Expression 1' Name Expr9 1`Argl Expr23 1' Op * TArg2
Expr23)
(Expression 1` Name Expr5 1`Argl 0 1 ` Op - l ' Ar g2 0)
but it would not match
(Expression 1' Name Expr8 1' Ar gl 0 1' Op * 1' Arg2 Expr23)
The predicates in oPs5 include = (equal), < > (not equal), < (less than), >
(greater than), < = (less than or equal), and > = (greater than or equal). A
predicate is placed bet ween an attribute and a value to indicate that the value
mat ched must be related in that way t o the value in the pattern. For instance,
(Expression 1' Op <>*)
will match any expression whose operand is not *. Predicates can be used with
variables as well as with constant values. For example, the following pattern
(Expression 1' Ar gl (LEFT) 1' Arg2 < > (LEFT))
will match any expression in which the first argument differs from the second
argument.
20 C.L. FORGY
The RHS of a pr oduct i on consists of an uncondi t i onal sequence of actions.
The onl y act i ons t hat need t o be descr i bed her e are t he ones t hat change
worki ng memor y. MAKE builds a new el ement and adds it t o worki ng
memor y. The ar gument t o MAKE is a pat t er n like t he pat t er ns in LHSs. For
exampl e,
( MAKE Expr essi on ~' Name Expr l T Ar gl 1)
will build an expressi on whose name is Expr l , whose first ar gument is I, and
whose ot her at t r i but es all have t he val ue NI L (t he defaul t val ue in oess) .
MODI FY changes one or mor e val ues of an existing el ement . Thi s action t akes
as ar gument s a pat t er n desi gnat or and a list of at t r i but e- val ue pairs. The
fol l owi ng action, f or exampl e
( MODI FY 2 ~' Op NI L T Arg2 NI L)
woul d t ake t he expressi on mat chi ng t he second pat t er n and change its oper at or
and second ar gument t o NI L. The action RE MOVE del et es el ement s f r om
worki ng memor y. It t akes pat t er n desi gnat ors as argument s. For exampl e
( REMOVE 1 2 3)
woul d del et e t he el ement s mat chi ng t he first t hr ee pat t er ns in a pr oduct i on.
An or' s5 pr oduct i on consists of (1) t he symbol P, (2) t he name of t he
pr oduct i on, (3) t he LHS, (4) t he symbol - - > , and (5) t he RHS, with ever yt hi ng
encl osed in par ent heses. The fol l owi ng is a typical pr oduct i on.
(P Ti me 0x
( Goal 1' Type Simplify 1' Obj ect (X))
( Expr essi on I' Name (X) 1' Ar gl 0 1' Op *)
( MODI FY 2 I ' OP NI L ~' Arg2 NIL))
1.2. Work on production system efficiency
Si nce execut i on speed has always been a maj or issue f or pr oduct i on systems,
several r esear cher s have wor ked on t he pr obl em of efficiency. The most
common appr oach has been t o combi ne a process called indexing with di rect
i nt er pr et at i on of t he LHSs. In t he simplest f or m of indexing, t he i nt er pr et er
begi ns t he mat ch process by ext ract i ng one or mor e f eat ur es f r om each worki ng
me mor y el ement , and uses t hose f eat ur es t o hash i nt o t he col l ect i on of
pr oduct i ons. Thi s pr oduces a set of pr oduct i ons t hat might have satisfied LHSs.
The i nt er pr et er exami nes each LHS in this set individually t o det er mi ne
whet her it is in fact satisfied. A mor e efficient f or m of i ndexi ng adds memor y t o
t he process. A t ypi cal scheme involves st ori ng a count with each pat t er n. The
count s are all zer o when execut i on of t he syst em begins. When an el ement
ent er s wor ki ng memor y, t he i ndexi ng funct i on is execut ed with t he new
THE RETE MATCH ALGORrrHM 21
element as its only input, and all the patterns that are reached have their
counts increased by one. When an element leaves working memory, the index
is again executed, and the patterns that are reached have their counts
decreased by one. The interpreter performs the direct interpretation step only
on those LHSs that have non-zero counts for all their patterns. Interpreters
using this scheme--in some cases combined with other efficiency measures--
have been described by McCracken [8], McDermott, Newell, and Moore [9],
and Rychener [10].
The algorithm that will be presented here, the Rete Match Algorithm, can be
described as an indexing scheme that does not require the interpretive step.
The indexing function is represented as a network of simple feature recog-
nizers. This representation is related to the graph representations for so-called
structured patterns. (See for example [2] and [7]). The Rete algorithm was first
described in 1974 [3]. A 1977 paper [4] described some rather complex
interpreters for the networks of feature recognizers, including parallel inter-
preters and interpreters which delayed evaluation of patterns as long as
possible. (Delaying evaluation is useful because it makes it less likely that
patterns will be evaluated unnecessarily.) A 1979 paper [5] discussed simple but
very fast interpreters for the networks. This article is based in large part on the
1979 paper.
2. The Rete Match Algorithm---Basic Concepts
In a production system interpreter, the output of the match process and the
input to conflict resolution is a set called the conflict set. The conflict set is a
collection of ordered pairs of the form
(Production, List of elements matched by its LHS)
The ordered pairs are called instantiations. The Rete Match Algorithm is an
algorithm for computing the conflict set. That is, it is an algorithm to compare a
set of LHSs to a set of elements fn order to discover all the instantiations. The
algorithm can efficiently process large sets because it does not iterate over the
sets.
2.1. How to avoid iterating over working memory
A pattern matcher can avoid iterating over the elements in working memory by
storing information between cycles. The step that can require iteration is
determining whether a given pattern matches any of the working memory
elements. The simplest interpreters determine this by comparing the pattern to
the elements one by one. The iteration can be avoided by storing, with each
pattern, a list of the elements that it matches. The lists are updated when
working memory changes. When an element enters working memory, the
interpreter finds all the patterns that match it and adds it to their lists. When an
22 C.L. FORGY
element leaves working memory, the interpreter again finds all the patterns
that match it and deletes it from their lists.
Since pattern matchers using the Rete algorithm save this kind of in-
formation, they never have to examine working memory. The pattern matcher
can be viewed as a black box with one input and one output.
(Changes to Working Memory)
1
Black Box
1
(Changes to the Conflict Set)
The box receives information about the changes that are made to working
memory, and it determines the changes that must be made in the conflict set to
keep it consistent. For example, the black box might be told that the element
(Goal T Type Simplify 1' Object Exprl9)
has been added to working memory, and it might respond that production
TimexN has just become instantiated.
2.1.1. Tokens
The descriptions of working memory changes that are passed into the black
box are called tokens. A token is an ordered pair of a tag and a list of data
elements. In the simplest implementations of the Rete Match Algorithm, only
two tags are needed, + and - . The tag + indicates that something has been
added to working memory. The tag - indicates that something has been
deleted from working memory. When an element is modified, two tokens are
sent to the black box; one token indicates that the old form of the element has
been deleted from working memory, and the other that the new form of the
element has been added. For example, if
(Expression 1' Name Expr41 1' Argl Y
was changed to
(Expression 1' Name Expr41 I' Argl 2
the following two tokens would be processed.
(-(Expression
(+(Expression
1' Op + 1' Arg2 Y)
TOp * I' Arg2 Y)
1' Name Expr41 T Argl Y 1" Op + 1' Arg2 Y)>
TNameExpr41 l ' Ar gl 2 I ' Op* l' Arg2Y)>
2.2. How to avoi d iterating over producti on memory
The Rete algorithm avoids iterating over the set of productions by using a
tree-structured sorting network or index for the productions. The network,
THE RETE MATCH ALGORITHM 23
which is compiled from the patterns, is the principal component of the black
box. The following sections explain how patterns are compiled into networks
and how the networks perform the functions of the black box.
2.2.1. Compiling the patterns
When a pattern matcher processes a working memory element, it tests many
features of the element. The features can be divided into two classes. The first
class, which could be called the intra-element features, are the ones that
involve only one working memory element. For an example of these features,
consider the following pattern.
(Expression 1' Name (N) l ' Argl 0 l ' Op + l' Arg2 (X))
When the pattern matcher processes this pattern, it tries to find working
memory elements having the following intra-element features.
-The class of the element must be Expression.
-The value of the Argl attribute must be the number 0.
- The value of the Op attribute must be the atom +.
The other class of features, the inter-element features, results from having a
variable occur in more than one pattern. Consider Plus0x's LHS.
(P Plus0x
(Goal 1' Type Simplify 1' Object (N))
(Expression 1' Name (N) l ' Argl 0 l ' Op +
_- ~ . . .)
1' Arg2 (X))
The intra-element features for the second pattern are listed above. A similar
list can be constructed for the first pattern. But in addition to those two lists,
the following inter-element feature is necessary because the variable (N) occurs
twice.
-The value of the Object attribute of the goal must be equal to the value of
the Name attribute of the expression.
The pattern compiler builds a network by linking together nodes which test
elements for these features. When the compiler processes an LHS, it begins
with the intra-element features. It d~termines the intra-element features that
each pattern requires and builds a linear sequence of nodes for the pattern.
Each node tests for the presence of one feature. After the compiler finishes
with the intra-element features, it builds nodes to test for the inter-element
features. Each of the nodes has two inputs so that it can join two paths in the
network into one. The first of the two-input nodes joins t he linear sequences
for the first two patterns, the second two-input nodes joins the output of the
first with the sequence for the third pattern, and so on. The two-input nodes
test every inter-element feature that applies to the elements they process.
Finally, after the two-input nodes, the compiler builds a special terminal node
to represent the production. This node is attached to the last of the two-input
24 C.L. FORGY
nodes. Fig. 1 shows the network for Plus0x and the similar production Time0x.
Not e that when two LHSs require identical nodes, the compiler shares parts of
the network rather than building duplicate nodes.
2.2.2. Processing in the network
The root node of the network (at t he t op in Fig. 1) is t he input to the black box.
This node receives the t okens that are sent to the black box and passes copies
of the t okens t o all its successors. The successors of the t op node, the nodes to
perform the intra-element tests, have one input and one or more outputs. Each
node tests one feat ure and sends the t okens that pass the test to its successors.
The two-input nodes compare t okens from different paths and join t hem into
bigger t okens if they satisfy the inter-element constraints of the LHS. Because
of the tests performed by the ot her nodes, a terminal node will receive only
t okens that instantiate the LHS. The terminal node sends out of the black box
the information that the conflict set must be changed.
For an example of the operation of the nodes, consider what happens in the
network in Fig. 1 when the following two elements are put into an empt y
working memory.
(Goal 1' Type Simplify 1' Obj ect Exprl 7)
(Expression 1' Name Expr17 1' Ar gl 0 1' Op * 1' Arg2 X)
First the t oken
(+(Goal 1' Type Simplify 1' Obj ect Exprl 7))
is creat ed and sent t o the root of the network. This node sends the t oken t o its
successors. One of the successors (on the right in Fig. 1) tests it and rejects it
because its class is not Expression. This node does not pass the t oken t o its
successor. The ot her successor of the t op node accepts the t oken (because its
class is Goal) and so sends it to its successor. That node also accepts the t oken
(since its t ype is Simplify), and it sends the t oken t o its successors, the
two-input nodes. Since no ot her t okens have arrived at the two-input nodes,
they can perform no tests; they must just store the token and wait.
When the t oken
(+(Expression 1' Name Expr l 7 1' Ar gl 0 1` Op * 1' Arg2 X))
is processed, it is t est ed by the one-input nodes and passed down t o the right
input of Time0x' s two-input node. This node compares t he new t oken t o the
earlier one, and finding that t hey allow the variable t o be bound consistently, it
creates and sends out the token
(+(Goal 1' Type Simplify 1' Obj ect Expr17)
(Expression 1' Name Expr17 1' Ar gl 0 1' Op * 1' Arg2 X))
THE RETE MATCH ALGORI THM 25
(P Plus0x
(Goal 1" Type Simplify
(Expression T Name (N)
_ _ > . . . )
(P Time0x
(Goal 1" Type Simplify
(Expression 1" Name (N)
_ _ : > . . . )
1" Object (N))
1"Argl 0 1"Op+
1" Object (N))
1"Ar gl 0 1"Op*
1" Arg2 (X))
1" Arg2 (X))
Distribute the tokens.
j l
Is the element class Goal?
Is the value of the
Type Simplify? ~
Is the value of the
Op+? [
Join the elements i n wh i c h
the value of the Obj ect
attribute from the left
i s equal to
the value of the Name
attribute from the right.
Report that production
Pl u s 0 x i s s a t i s f i e d.
Is the element class Expression?
Is the value of the Ar gl 0?
Is the value of the
l
Join the elements i n wh i c h
the value of the Object
attribute from the left
i s equal to
the value of the Name
attribute from the right.
Report that production
Time0x i s s at i s f i e d.
FIG. 1. The network for Plus0x and Time0x.
When its successor, the terminal node for Time0x, receives this token, it adds
the instantiation of Time0x to the conflict set.
2.2.3. Saving information in the network
As explained above, the black box must maintain state information because it
must know what is in working memory. In simple Rete networks all such state
is stored by the two-input nodes. Each two-input node contains two lists called
its left and right memories. The left memory holds copies of the tokens that
arrived at its left input, and the right memory holds copies of the tokens that
arrived at its right input. The tokens are stored as long as they are useful. The
next section explains how the nodes determine when the tokens are no longer
useful.
26 C.L. FORGY
2.2.4. Using the tags
The t ag in a t oken i ndi cat es how t he st at e i nf or mat i on is t o be changed when
t he t oken is pr ocessed. The + and - t okens ar e pr oces s ed i dent i cal l y except :
- The t er mi nal nodes use t he t ags t o de t e r mi ne whet her t o add an i nst ant i at i on t o
t he conflict set or t o r e move an exi st i ng i nst ant i at i on. When a + t oken is
pr ocessed, an i nst ant i at i on is added; when a - t oken is pr ocessed, an
i nst ant i at i on is r emoved.
- Th e t wo- i nput nodes use t he t ags t o de t e r mi ne how t o modi f y t hei r i nt ernal
memor i es . When a + t oken is pr ocessed, it is st or ed in t he i nt er nal me mor y;
when a - t oken is pr ocessed, a t oken wi t h an i dent i cal dat a par t is del et ed.
- T h e t wo- i nput nodes use t he t ags t o de t e r mi ne t he appr opr i at e t ags f or t he
t okens t hey bui l d. When a new out put is cr eat ed, it is gi ven t he t ag of t he t oken
t hat j ust ar r i ved at t he t wo- i nput node.
2.3. Completing the set of node types
The net wor k in Fig. 1 cont ai ned f our ki nds of nodes: t he r oot node, t he
t er mi nal nodes, t he one- i nput nodes, and t he t wo- i nput nodes. Cer t ai nl y one
coul d defi ne ma ny mor e ki nds of nodes, but onl y a few mor e are necessar y t o
have a c ompl e t e and useful set. In fact , onl y t wo mor e ki nds of nodes are
necessar y t o i nt er pr et oPs5.
A second ki nd of t wo- i nput node is needed f or negat ed pat t er ns (t hat is,
pat t er ns pr eceded by - ) . The new t wo- i nput node st or es a count wi t h each
t oken in its l eft me mor y. The count i ndi cat es t he numbe r of t okens in t he ri ght
me mo r y t hat al l ow consi st ent var i abl e bi ndi ngs. The t okens in its ri ght me mo r y
cont ai n t he el ement s t hat mat ch t he negat ed pat t er n- - - or , mor e preci sel y, t he
t okens cont ai n t he el ement s t hat have t he i nt r a- el ement f eat ur es t hat t he
negat ed pat t er n r equi r es. The node al l ows t he t okens wi t h a count of zer o t o
pass.
The last node t ype t hat needs t o be def i ned is a var i ant of t he one- i nput
nodes descr i bed earl i er. Thos e nodes t est ed Working me mo r y el ement s f or
const ant f eat ur es (testing, f or exampl e, whet her a val ue was equal t o a gi ven
at omi c symbol ) . The new one- i nput nodes c ompa r e t wo val ues f r om a wor ki ng
me mo r y el ement . Thes e nodes ar e used t o pr ocess pat t er ns t hat cont ai n t wo or
mor e occur r ences of a var i abl e. The fol l owi ng, f or exampl e, woul d r equi r e one
of t hese nodes becaus e (X) occurs twice.
( Expr essi on 1' Ar g l (X) 1' Op + 1' Ar g2 (X))
3. Representing the Network and t he Tokens
Thi s sect i on des cr i bes r epr es ent at i ons f or t oke ns and nodes, , t hat al l ow ver y fast
i nt er pr et er s t o be wri t t en.
THE RETE MATCH ALGORITHM 27
3.1. Working memory elements
Th e r epr es ent at i on chosen f or t he wor ki ng me mor y el ement s shoul d have t wo
pr oper t i es.
- The r epr es ent at i on shoul d make it easy t o ext r act val ues f r om el ement s because
ever y t est i nvol ves ext ract i ng one or mor e val ues.
- The r epr es ent at i on shoul d make it easy t o per f or m t he t est s once t he val ues are
avai l abl e.
To make ext r act i ng t he val ues easy, each el ement shoul d be st or ed in a
cont i guous bl ock in memor y, and each at t r i but e shoul d have a desi gnat ed i ndex
in t he bl ock. For exampl e, if el ement s of class Ck had s event een at t ri but es, A1
t hr ough A17, t hey shoul d be st or ed as bl ocks of ei ght een values. The first val ue
woul d be t he class name (Ck). The second val ue woul d be t he val ue of at t r i but e
A1. The t hi rd woul d be t he val ue of at t r i but e A2, and so on. The par t i cul ar
assi gnment of indices t o at t r i but es is uni mpor t ant ; it is i mpor t ant onl y t hat each
at t r i but e have a fixed i ndex, and t hat t he indices be assigned at compi l e t i me.
Thi s allows t he compi l er t o bui l d t he indices i nt o t he nodes. Thus i nst ead of a
node like t he following:
Is t he val ue of t he St at us at t r i but e Pendi ng?
t he compi l er coul d bui l d t he node
Is t he val ue at l ocat i on 8 Pendi ng?
Wi t h this r epr esent at i on, each val ue can be accessed in one memor y r ef er ence,
regardl ess of t he number of at t r i but es possessed by an el ement .
To make t he t est s i nexpensi ve, t he r epr esent at i on shoul d have explicit t ype
bits. One obvi ous way t o r epr esent a val ue is t o use one wor d f or t he t ype and
one or mor e words f or t he val ue pr oper . But mor e space-efficient r epr esen-
t at i ons are also possible. For exampl e, consi der a pr oduct i on syst em l anguage
t hat suppor t s t hr ee dat a t ypes, i nt egers, floating poi nt number s, and at oms. A
r epr es ent at i on like t he fol l owi ng mi ght be used: One wor d woul d be al l ocat ed
t o each value. For i nt egers and at oms, t he low or der si xt een (say) bits woul d
hol d t he dat um and t he s event eent h bit woul d be a t ype bit. For floating poi nt
number s, t he ent i r e wor d woul d be used t o st or e a nor mal i zed floating poi nt
number . A floating poi nt number woul d be r ecogni zed by havi ng at least one
non- zer o in t he high or der bits.
3.2. The network
Thi s sect i on expl ai ns how t o r epr esent nodes in a f or m similar t o von Ne uma nn
machi ne i nst ruct i ons. Thi s r epr es ent at i on was chosen because it allows t he
net wor k i nt er pr et er t o be or gani zed like t he i nt er pr et er s f or convent i onal von
Neumann archi t ect ures.
28 C.L. FORGY
3.2.1 An assembly language notation
To make it easi er t o discuss t he r epr esent at i on f or t he nodes, an assembl y
l anguage not at i on is used below A one- i nput node like
Is t he val ue of l ocat i ng 8 Pendi ng?
becomes
T E QA 8, Pendi ng
The T, which st ands f or test, i ndi cat es t hat this is a one- i nput node The EQ
i ndi cat es t hat it is a test f or equality (It is also necessar y t o have NE f or not
equal s, LT f or less t han, etc.). The A i ndi cat es t he node t est s dat a of t ype
at om. ( Ther e is also a t ype N f or i nt eger values, a t ype F f or floating poi nt , and
a t ype S f or compar i ng t wo val ues in t he same wor ki ng memor y el ement ).
Two- i nput nodes are i ndi cat ed by lines like t he following
L001 AND ( 2) = (1)
L001 is a label. AND i ndi cat es t hat this is a t wo- i nput node f or non- negat ed
pat t er ns. The sequence ( 2) = (1) i ndi cat es t hat t he node compar es t he second
val ue of el ement s f r om t he left and t he first val ue of el ement s f r om t he right;
t he -- i ndi cat es t hat it per f or ms a t est f or equal i t y. The t er mi nal nodes cont ai n
t he t ype TERM and t he name of t he pr oduct i on For exampl e
TERM Plus0x
As will be expl ai ned bel ow, t he ROOT node is not needed in this r epr esen-
tation
322 Linearizing the network
To make t he nodes like t he i nst ruct i ons f or a v o n Neumann machi ne, it is
necessar y t o el i mi nat e t he explicit links bet ween nodes. Many of t he explicit
links can be el i mi nat ed simply by l i neari zi ng t he net wor k, placing a node and
its successor in cont i guous me mor y locations However , since some nodes have
mor e t han one successor, and ot her s (t he t wo- i nput nodes) have mor e t han one
pr edecessor , l i neari zi ng is not sufficient in itself: t wo new node t ypes must be
def i ned t o r epl ace some of t he links. The first of t he new nodes, t he FORK, is
used t o i ndi cat e t hat a node has mor e t han one successor. The F ORK node
cont ai ns t he address of one of t he successors. The ot her successor is pl aced
i mmedi at el y af t er t he FORK For exampl e, t he F ORK in t he fol l owi ng
i ndi cat es t hat t he node L003 has t wo successors
L003 T E QA 0, Expr essi on
F ORK L004
T E QA 3, +
L004 T E QA 3, *
T H E R E T E M A T C H A L G O R I T H M 2 9
Th e ot her new node t ype, t he MERGE, is used wher e t he net wor k has t o grow
back t oge t he r - - t ha t is, bef or e t wo- i nput nodes. The t wo- i nput node is pl aced
af t er one of its pr edecessor s (say its left pr edecessor ) and t he ME RGE is
pl aced af t er t he ot her . The MERGE, whi ch cont ai ns t he address of t he
t wo- i nput node, f unct i ons much like an uncondi t i onal j ump. Fig. 2 shows t he
effect of t he l i neari zat i on process; it cont ai ns t he pr oduct i ons f r om Fig. 1 and
t he l i neari zed net wor k f or t hei r LHSs.
3.2.3. Representing the nodes in memory
Thi s sect i on shows how t he nodes coul d be r epr es ent ed on a comput er whi ch
has a t hi r t y- t wo bit wor d l engt h. The t hi r t y- t wo bit wor d l engt h was chosen
because it is t ypi cal of t oday' s comput er s; t he preci se wor d l engt h is not
critical, however . Si nce t he net wor k can be r oot ed at a F ORK (see t he exampl e
in Fig. 2) it is not necessar y t o have an explicit r oot node f or t he net wor k.
He nc e onl y seven classes of nodes are needed; FORKs , MERGEs , t he t wo
ki nds of one- i nput nodes, t he t wo ki nds of t wo-i nput nodes, and t he t er mi nal
nodes.
FORKs and MERGEs coul d be r epr es ent ed as single words. Six bits coul d
be used f or a t ype field (t hat is, a field t o i ndi cat e what t he wor d r epr esent s)
and t he r emai ni ng t went y-si x bits coul d be used f or t he address of t he node
poi nt ed t o. FORKs and MERGEs woul d t hus be r epr esent ed:
I I I
I TYPE [ ADDRESS I
I I I
I_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1
( 6 b i t s ) ( 26 b i t s )
Bot h ki nds of one- i nput nodes coul d be r epr es ent ed as single wor ds t hat are
di vi ded i nt o t hr ee fields. The first field woul d hol d t he t ype of t he node. The
second field woul d hol d t he i ndex of t he val ue t o test. The t hi r d field woul d
hol d ei t her a const ant or a second i ndex. The bits in a wor d coul d be al l ocat ed
as follows.
I I I I
I TYPE I INDEX I CONSTANT or INDEX I
i I I I
l _ l _ l _ I _ I _ I _ l _ l _ l _ l _ l _ l _ l _ l _ l _ l _ i _ l _ l _ i _ l _ l _ l _ l _ l _ l _ l _ l _ l _ l _ i _ i _ l
(6 bi ts) (10 bi ts) (16 bi ts)
A si xt een-bi t field is r equi r ed t o r epr esent an i nt eger or an at om using t he
f or mat of Sect i on 3.1. Si nce a fl oat i ng poi nt numbe r cannot be r epr es ent ed in
si xt een bits, in nodes t hat t est floating poi nt number s, this field woul d hol d not
t he number , but t he address of t he number .
30 C. L. F ORGY
(P Pl us0x
(Goal 1' Type Si mpl i fy
( Expr es s i on ]' Na me (N)
_ _ ~ . , .)
(P Ti me0x
(Goal 1" Type Si mpl i f y
( Expr es s i on ~" Na me (N)
T Obj ect (N))
~' Ar gl 0 q ' Op +
1' Obj ect (N))
q' Ar gl 0 l ' Op *
1' Ar g2 (X))
~' Arg2 (X))
ROOT F ORK L003
T E QA 0, Goal
T E QA 1, Si mpl i fy
F ORK L002
LO01 AND (2) = (1)
T E RM Pl us0x
L002 AND (2) = (1)
TERM Ti me0x
L003 T E QA 0, Expr es s i on
T E QN 2, 0
For k L004
T E QA 3, +
ME R GE L001
L004 T E QA 3, *
ME R GE L002
; Root node of t he net wor k
; Is t he e l e me nt class Goal ?
; Is t he Type Si mpl i fy?
; Two- i nput node f or Pl us0x
; Re p o r t Plus0x is sat i sfi ed
; Two- i nput node f or Ti me0x
; Re p o r t Ti me0x is sat i sfi ed
; Is t he e l e me nt class Expr es s i on?
; Is t he Ar g l 0?
; Is t he Op +?
; Is t he Op *?
FIG. 2. A compi l ed net wor k.
T h e t e r m i n a l n o d e s c o u l d a l s o b e s t o r e d i n s i n g l e w o r d s . T h e s e n o d e s c o n t a i n
t w o f i e l d s , t h e u s u a l t y p e f i e l d p l u s a l o n g e r f i e l d f o r t h e i n d e x o r a d d r e s s o f t h e
p r o d u c t i o n t h a t t h e n o d e r e p r e s e n t s .
I I I
I TYPE [ PRODUCTION [
I I I
I_1_1_1_1_1_1_1_1_1_1_1_1_1_I_]_1_ I_1_1_1_1_1_ I_1_1_1_1_1_1_1_1_1
(6 bi t s) (26 bi t s)
The l engt h of a t wo- i nput node woul d de pe nd on t he n u mb e r of val ue pai rs
t est ed by t he node. Each node coul d have one wor d of basi c i nf or mat i on pl us
one wor d f or each val ue pai r. The first wor d woul d cont ai n a t ype field, a
poi nt er t o t he me mo r y f or t he left i nput , a poi nt er t o t he me mo r y f or t he ri ght
i nput , and a field i ndi cat i ng how ma ny t est s ar e pe r f or me d by t he node. The
bi t s in t he wor d coul d be al l ocat ed as fol l ows.
I I I I I
] TYPE I COUNT I MEMORY POINTER I MEMORY POINTER I
I I I I I
I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I _ I - I
(6 bi t s) (4 bi t s) (11 bi t s) (11 bi t s)
THE RETE MATCH ALGORI THM 31
The wor d f or each t est woul d cont ai n t hr ee fields. Two fields woul d hol d t he
i ndi ces of t he t wo el ement s t o test. The r emai ni ng field woul d i ndi cat e t he t est
t o per f or m; t hat is, it woul d i ndi cat e whet her t he node is t o test f or equal i t y of
t he t wo el ement s, f or i nequal i t y, or f or somet hi ng else. The bits in t he Word
mi ght be al l ocat ed as follows.
I I I I
I TEST [ INDEX I INDEX [
I I I I
I _ l _ I _ I _ I _ l _ I _ l _ I _ I _ l _ l _ I _ I _ l _ I _ I _ I _ I _ l _ I _ I _ l _ l _ l _ I _ I _ I _ I _ I _ I _ I _ I
(6 bits) (13 bi ts) (13 bi ts)
Not e t hat t he i ndex fields her e are l onger t han t he i ndex fields in t he one- i nput
nodes. Thi s is necessar y because t he indices in t he t wo- i nput nodes must
desi gnat e el ement s in t he t okens as well as val ues in t he el ement s.
3 . 3 . T h e t o k e n s
Thi s sect i on descri bes a space-effi ci ent r epr esent at i on f or t okens. Thi s
r epr es ent at i on is not sui t abl e f or all i nt er pr et er s; it r equi r es t he i nt er pr et er t o
pr ocess onl y one wor ki ng me mor y change at a t i me, and it r equi r es t hat cert ai n
part s of t he net wor k be t r aver sed dept h first. For t unat el y, t hese are not seri ous
rest ri ct i ons. The simplest way t o pe r f or m t he mat ch is process one t oken at a
t i me, t raversi ng t he ent i r e net wor k dept h first. Sect i on 4 descri bes an i nt er-
pr et er t hat oper at es in this manner .
If t he i nt er pr et er oper at es this way, t hen it can use a st ack t o r epr es ent its
t okens. When a t oken has t o be built, first t he tag f or t he t oken is pushed ont o
t he stack, and t hen t he wor ki ng me mor y el ement s are pushed ont o t he stack in
or der . When t okens have t o be ext ended (a ver y c ommon ope r a t i on- - s e e t he
code in Sect i on 4) t he addi t i onal wor ki ng me mor y el ement s are just pushed
ont o t he stack.
The one- i nput nodes will be mor e efficient if t hey do not use this stack. Si nce
all t he one- i nput nodes will pr ocess t he same worki ng memor y e l e me n t - - t h e
el ement t hat was j ust added t o or del et ed f r om worki ng me mo r y - - t h e el ement
shoul d be made easily available. The el ement coul d be copi ed i nt o a dedi cat ed
l ocat i on in memor y, or t he addr ess of t he el ement coul d be l oaded i nt o a
dedi cat ed base regi st er. Ei t her of t hese woul d make it possible f or t he
one- i nput nodes t o access t he el ement wi t hout goi ng t hr ough t he stack.
3 . 4 . T h e i n t e r p r e t e r ' s s t a t e
In addi t i on t o t he st ack f or t okens, t he i nt er pr et er must mai nt ai n anot her stack
f or its st at e i nf or mat i on. One r eason f or t he stack is t o al l ow t he i nt er pr et er t o
find its way about in t he net wor k. Whe n t he i nt er pr et er passes a FORK, it
pushes t he poi nt er it does not fol l ow ont o t he stack. Then when it r eaches t he
32 C.L. FORGY
end of a pat h it pops a poi nt er f r om t he stack and fol l ows it. Anot he r r eason
f or t he stack is t o pr ovi de a pl ace f or t he t wo-i nput nodes t o keep t hei r local
i nf or mat i on. As will be seen in t he next section, t he t wo- i nput nodes somet i mes
have t o suspend t hemsel ves while t hei r successors are processed. The stack
hol ds t he i nf or mat i on t hat is needed t o r esume processi ng t he t wo-i nput nodes.
4. The Network Interpreter
Thi s sect i on pr ovi des a concr et e descri pt i on of t he oper at i ons per f or med by t he
net wor k i nt er pr et er . One node f r om each class has been sel ect ed, and t he code
t o i nt er pr et t he nodes has been wri t t en. It mi ght be not ed t hat since t he code
sequences are short and simple, t hey coul d easily be wri t t en in mi cr ocode.
The code is wri t t en in a PASCAL-like l anguage whi ch has literal labels and
field ext ract i on. Fi el d ext r act i on is i ndi cat ed by put t i ng t wo number s within
angl e br acket s; t he first numbe r is t he i ndex of t he high or der bit in t he field,
and t he second numbe r is t he i ndex of t he low or der bit. The assumpt i on will
be made t hat t he bits are numbe r e d f r om right t o left, with t he low or der bit
bei ng bit zero. Thus t he expressi on SELF( 31: 26) i ndi cat es t hat t he high or der
six bits of t he val ue of t he vari abl e SELF are t o be ext r act ed and right justified.
The mai n l oop of t he i nt er pr et er is ver y simple: t he i nt er pr et er f et ches t he
next node f r om me mor y and di spat ches on its t ype field. Let t he segment of
memor y t hat hol ds t he nodes be called NODE_ MEMORY and let t he poi nt er
t o t he cur r ent node be cal l ed NC. The mai n l oop is t hen:
MAI N: SELF :--- NODE_ MEMORY[ NC] ;
CASE SELF(31 : 26) OF !Type field is high or der 6 bits
0: GOT O FORK;
1: GOT O MERGE;
2: GOT O TERM;
3: GOT O TEQA;
END;
The node is copi ed i nt o t he vari abl e SELF so t hat t he node pr ogr ams can
exami ne it. The assi gnment of number s t o t he var i ous node t ypes is arbi t rary.
Got o' s are used i nst ead of pr ocedur e calls because t hese exampl es make all t he
st at e of t he i nt er pr et er explicit, and not hi dden in PASCAL'S stack.
T E QA is t ypi cal of t he one- i nput nodes f or t est i ng const ant s. If t he segment
of me mor y t hat hol ds t he wor ki ng me mor y el ement bei ng pr ocessed is cal l ed
CURRENT, t hen T E QA is as follows.
TEQA: TEMP :-- CURRENT[ SELF( 25: 16) ] : ! Get t he wor d poi nt ed t o
!by t he i ndex field
THE RETE MATCH ALGORITHM 33
I F ( TEMP( 31 : 16) = 0) AND
( TEMP( 15: 0) = SELF( 15: 0) )
T HE N GOT O SUCC
ELS E GOT O FAI L:
!Test t ype bits
!Test val ue
Ei t her SUCC or FAI L is execut ed af t er each one- i nput node. SUCC is
execut ed when t he t est succeeds, and FAI L is execut ed when t he t est fails.
SUCC i ncr ement s t he node count er t o poi nt t o t he next node.
SUCC: NC: = NC+ 1;
GOT O MAI N;
F AI L t ri es t o get a node f r om t he st ack of unpr ocessed nodes; if it cannot , it
hal t s t he mat ch. Assumi ng t he st ack is named NS and t he poi nt er t o t he t op of
t he stack is cal l ed NSTOP, t he code is:
FAI L: I F NSTOP < 0 T HE N GOT O EXI T_ MATCH;
NC : = NS[ NSTOP] ;
NSTOP : = NS T OP - 1;
GOT O MAI N;
The one- i nput nodes f or compar i ng pairs of val ues are similar t o t he ot her
one- i nput nodes. TEQS is t ypi cal of t hese nodes.
TEQS: I F CURRENT[ SELF( 25 : 16)] = CURRENT[ SELF( 9 : 0 ) ]
T HE N GOT O SUCC
ELSE GOT O FAI L;
address ont o NS and t hen passes cont r ol t o t he fol l owi ng F ORK pushes an
node.
FORK: NSTOP : = NSTOP + 1;
NS[ NSTOP] : = SELF( 25: 0) ;
GOT O SUCC;
A t wo- i nput node must be able t o det er mi ne whet her it was r eached over its
left i nput or its ri ght i nput . Thi s can be i ndi cat ed t o t he node by a global
vari abl e whi ch usually has t he val ue LEFT, but whi ch is t empor ar i l y set t o
RI GHT when a ME RGE passes cont r ol t o a t wo- i nput node. If this global
vari abl e is cal l ed DI RECTI ON, t he code f or t he ME RGE is
MERGE: DI RECTI ON : = RI GHT;
NC : = SELF( 25: 0) ;
GOT O MAI N;
The t wo ki nds of t wo- i nput nodes ar e ver y similar, so onl y AND is shown
her e. In or der not t o obscur e t he mor e i mpor t ant i nf or mat i on, some det ai l s of
34 C.L. FORGY
t he pr ogr am are omi t t ed. The code does not show how t he vari abl es are t est ed,
nor does it show how t okens ar e added t o and r e move d f r om t he node' s
memor i es. Assumi ng t he t oken stack is cal l ed TS and t he poi nt er t o t he t op
el ement is called TSTOP, t he pr ogr am is as follows.
i Cont r ol can r each this poi nt many t i mes duri ng t he processi ng
! of a t oken. The node needs t o updat e its st at e and put
! i nf or mat i on on NS onl y once, however .
AND: I F NS[ NSTOP] < > NC
T HE N
BEGI N
NSTOP :-- NSTOP + 4;
NS[ NSTOP] : = NC;
NS [ NS T OP - 1] :-- DI RECTI ON;
NS [ NS T OP - 2] : = ME MORYCONT E NT S
( OPPOSI TE( DI RECI ' I ON) ) ;
NS [ NS T OP - 3 ] : = TSTOP;
MODI F Y_ MEMORY( DI RECTI ON) ; !St ore t he t oken
DI RECTI ON : = LEFT; !Reset t o t he defaul t
END;
Go process t he t okens
1
LLOOP:
I F NS[ NSTOP] - 1] = RI GHT T HE N GOT O RL OOP
ELSE GOT O LLOOP;
Compar e t he t oken t o t he el ement s in t he right memor y
!If t he st at e is not in NS
! Then put it t her e
Fall out of t he l oop when t he t est succeeds so t hat
t he successors of this node can be act i vat ed
RE P E AT
TEMP : = NEXT_ P OS I TI ON( NS [ NS TOP - 2]);
I F TEMP = NI L !If right me mor y is empt y
T HE N ! Then cl ean up and exit
BEGI N
TSTOP : = NS [ NS T OP - 3];
NSTOP : = NS T OP - 4;
GOT O F AI L ;
END
UNTI L P E RF ORM AND_ TEST( TEMP, LEFT) ;
THE RETE MATCH ALGORrl'/-IM 35
! Ex t e n d t he t o k e n
T S T OP : = NS [ NS T OP - 3] + 1; TS [ NS TOP ] : = T E MP ;
! Pr e pa r e NS so t ha t c ont r ol will r e t ur n t o t hi s n o d e
NS T OP : = NS T OP + 1; NS [ NS TOP ] : = NC;
!Pass c ont r ol t o t he s ucces s or s of t hi s n o d e
NC : = NC + SELF( 25 : 22) + 1; GOT O MAI N;
! Co mp a r e t he t o k e n t he e l e me n t s in t he l eft me mo r y
!
R L OOP :
Thi s is similar to LLOOP.
Th e onl y r e ma i ni ng n o d e t ype is t he T E R M node . Si nce upda t i ng t he conf l i ct
set is a l a n g u a g e - d e p e n d e n t o p e r a t i o n , t ha t det ai l of t he T E R M n o d e c a n n o t
b e s hown. Th e r est of t he pr oc e s s i ng of t he n o d e is as f ol l ows.
T E R M: UP DAT E _ C ONF L I C T _ S E T ( S E L F < 2 5 : 0));
GOT O F AI L ;
5. Performance of the Algorithm
Ex t e n s i v e s t udi es h a v e b e e n ma d e of t he ef f i ci ency of t he Re t e Ma t c h Al -
gor i t hm. Bo t h anal yt i cal s t udi es ( whi ch d e t e r mi n e d t he t i me a nd s pace c o m-
pl exi t y of t he a l gor i t hm) a nd empi r i cal s t udi es ha ve b e e n ma de . Thi s s ect i on
pr e s e nt s s o me of t he r es ul t s of t he anal yt i cal st udi es. Be c a us e of s pace
cons t r ai nt s , it was not pos s i bl e t o pr e s e nt t he empi r i cal r es ul t s or t he pr oof s of
TABLE 1. Spa c e a nd t i me c ompl e xi t y
Complexity measure Best case Worst case
Effect of working memory
size on number of tokens O(1) O(W c)
Effect of production memory
size on number of nodes O(P) O(P)
Effect of production memory
size on number of tokens O(1) O(P)
Effect of working memory
size on time for one firing O(1) O(W 2c-1)
Effect of production memory
size on time for one firing O(log2 P) O(P)
C is the number of patterns in a production.
P is the number of productions in production memory.
W is the number of elements in working memory.
36 C.L. FORGY
t he anal yt i cal results. The pr oof s and det ai l ed resul t s of some empi ri cal studies
can be f ound in [5].
Tabl e 1 summar i zes t he resul t s of t he analytical st udi es of t he al gori t hm. The
usual not at i on f or asympt ot i c compl exi t y is used in this t abl e [1].Writing t hat a cost
is O( f ( x) ) i ndi cat es t hat t he cost vari es as f(x) plus per haps some smal l er t er ms in
x, The smal l er t er ms ar e i gnor ed because t he f(x) t er m will domi nat e when x is
large. Wri t i ng t hat a cost is O(1) i ndi cat es t hat t he cost is unaf f ect ed by t he f act or
bei ng consi der ed. It shoul d be not ed t hat all t he compl exi t y resul t s in Tabl e 1
sharp; pr oduct i on syst ems achi evi ng t he bounds are descr i bed in [5].
6. Concl usi ons
The Re t e Mat ch Al gor i t hm is a me t hod f or compar i ng a set of pat t er ns t o a set
of obj ect s in or der t o det er mi ne all t he possi bl e mat ches. It was descr i bed in
det ai l in this art i cl e because enough evi dence has been accumul at ed since its
devel opment in 1974 t o make it cl ear t hat it is an efficient al gor i t hm which has
many possible appl i cat i ons.
The al gor i t hm is efficient even when it processes l arge sets of pat t er ns and
obj ect s, because it does not i t er at e over t he sets. In this al gori t hm, t he pat t er ns
are compi l ed i nt o a pr ogr am t o per f or m t he mat ch process. The pr ogr am does
not have t o i t er at e over t he pat t er ns because it cont ai ns a t r ee- st r uct ur ed
sort i ng net wor k or i ndex f or t he pat t er ns. It does not have t o i t er at e over t he
dat a because it mai nt ai ns st at e i nf or mat i on: t he pr ogr am comput es t he mat ches
and part i al mat ches f or each obj ect when it ent er s t he dat a memor y, and it
st ores t he i nf or mat i on as long as t he obj ect r emai ns in t he memor y.
Al t hough t he Re t e al gor i t hm was devel oped f or use in pr oduct i on syst em
i nt er pr et er s, it can be used f or ot her pur poses as well. If t her e is anyt hi ng
unusual about t he pat t er n mat chi ng of pr oduct i on systems, it is onl y t hat t he
pat t er n mat chi ng t akes pl ace on an unusual l y l arge scale. Pr oduct i on syst ems
cont ai n r at her or di nar y pat t er ns and dat a obj ect s, but t hey cont ai n large
number s of t hem, and i nvocat i ons of t he pat t er n mat cher occur ver y f r equent l y
duri ng execut i on. If pr ogr ams of ot her ki nds begi n t o use pat t er n mat chi ng
mor e heavi l y, t hey coul d have t he same efficiency pr obl ems as pr oduct i on
systems, and it coul d be necessar y t o use met hods like t he Re t e Mat ch
Al gor i t hm in t hei r i nt er pr et er s as well. Cer t ai nl y t he al gor i t hm shoul d not be
used f or all mat ch pr obl ems; its use is i ndi cat ed onl y if t he fol l owi ng t hr ee
condi t i ons ar e satisfied.
- Th e pat t er ns must be compi l abl e. It must be possible t o exami ne t hem and
det er mi ne a list of f eat ur es like t he lists in Sect i on 2.2.1.
- Th e obj ect s must be const ant . The y cannot cont ai n vari abl es or ot her
non- const ant s as pat t er ns can.
- The set of obj ect s must change rel at i vel y slowly. Si nce t he al gor i t hm mai nt ai ns
st at e bet ween cycles, it is inefficient in si t uat i ons wher e most of t he dat a changes
on each cycle.
THE RETE MATCH ALGORI THM 37
ACKNOWLEDGMENT
The author would like to thank Allen Newell and Robert Sproull for many useful discussions
concerning this work, and Allen Newell, John McDermott, and Michael Rychener for their valuable
comments on earlier versions of this article.
REFERENCES
1. Aho, A.V., Hopcroft, J.E., and UUman, J.D., The Design and Analysis of Computer Algorithms
(Addison-Wesley, Reading, MA, 1974).
2. Cohen, B.L., A powerful and efficient structural pattern recognition system, Artificial In-
telligence 9 (1977) 223-255.
3. Forgy, C.L., A network match routine for production systems, Working Paper, 1974.
4. Forgy, C.L., A production system monitor for parallel computers, Depart ment of Computer
Science, Carnegie-Mellon University, 1977.
5. Forgy, C.L., On the efficient implementation of production systems, Ph.D. Thesis, Carnegie-
Mellon University, 1979.
6. Forgy, C.L., o~5 user' s manual, Department of Computer Science, Carnegie-Mellon
University, 1981.
7. Hayes-Roth, F. and Mostow, D.J., An automatically compilable recognition network for
structured patterns, Proc. Fourth Internat. Joint Conference on Artificial Intelligence (1975)
246-251.
8. McCracken, D., A production system version of the Hearsay-II speech understanding system,
Ph.D. Thesis, Carnegie-Mellon University, 1978.
9. McDermott, J., Neweli, A., and Moore, J., The efficiency of certain production system
implementations, in: Waterman, D. A. and Hayes-Roth, F. (Eels.), Pattern-Directed Inference
Systems (Academic Press, New York, 1978) 155-176.
10. Rychener, M.D., Production systems as a programming language for Artificial Intelligence
applications. Ph.D. Thesis, Carnegie-Mellon University, 1976.
Recei ved Ma y 1980; revised version received Apri l 1981

You might also like