morton1987

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Algorithm design

specification for
interpreting segmented
image data using
schemas and support logic

S K Morton and S J Popham

based on the belief that knowledge is central to visual


The paper describes in outline an algorithm for the perception, starts from a type 2 theory where a number
interpretation of segmentations of certain sorts of image. of sources of knowledge react with the data in complex
First it explains the motivations behind the approach and fashion to produce an interpretation of an image. To
the techniques and tools employed. Second, it presents contain this complexity it is necessary to begin by
descriptions of the various knowledge structures involved restricting the type of images that we aim to interpret,
and of the subroutines within the organization of the and subsequently, through a cyclical approach, abstract
algorithm. what are initially ad hoc techniques by applying them
to a generalized range of suitable images. Thus it is
Keywords. image segmentations, knowledge-based envisaged that certain (type 1) theories and principles
methods, visual schemas, support logic may emerge from this programme, and we take an essen-
tially pragmatic approach in an attempt to formulate
There are two main approaches to the role of techniques empirically aspects of the visual process. The current
in image understanding, roughly corresponding to what state of this work is set out below.
Marr’ called ‘type 1’ and ‘type 2’ theories. A type 1 To minimize effects which are essentially three dimen-
theory is an information processing theory with an input sional, particularly occlusion, the proposed vision system
and an output which may be implemented as an embodied in the algorithm described herein has an area
algorithm (at least in principle) and which may become of application restricted to certain types of two-
established and built upon by other researchers; a type dimensional images. More specifically we envisage it to
2 theory pertains to the kind of process where many be applied to middle-distance outdoor quasinatural
different events react with each other, and ‘whose inter- colour scenes, typically containing areas of sky, trees
action is its own simplest description’, in Marr’s own and fields as well as man-made and natural discrete
words. objects such as cars and cows. This means that we must
In vision work attempts to decompose the image be able to assume that the information supplied in the
understanding process into type 1 theories have led to form of a segmented image will let us distinguish to
useful results as techniques for low-level interpretation; a certain extent between those regions corresponding
for example, the difference-of-Gaussians theory for edge to background and those corresponding to objects.
detection and the construction of the 2+-dimensional
sketch from stereoscopic data are both results that are
well established. However, the approach taken here, SUPPORT LOGIC

Information Technology Research Centre and Department of


The approach we take is guided by two central themes
Engineering Mathematics, University of Bristol, Bristol, UK which recur in various aspects of artificial intelligence
This paper was presented at the 2nd Alvey Vision Club Conference, work. First, the imperfect nature of the information con-
held at the University of Bristol, UK, during September 1986 veyed in segmented images implies that we need some

0262-8856/87/0320611 $03.00 @ 1987 Butterworth & Co. (Publishers) Ltd

206 image and vision computing


methodology for representation of and inference under a, = a, * ix*
this inherent uncertainty. In particular, the nature of PC= PI * P2
the vision problem is such that we must integrate evi-
dence from diverse sources to arrive at an interpretation. Disjunction
The methodology we propose to use is that of Baldwin’s
support logi$, which rests on the notions of Zadeh’s
fuzzy sets3 and on the Dempster-Shafer theory of evi-
dence4. The current implementation, known as Slops, where
uses a PROLOG-like notation (see Clocksin and Mellish6
for a description of PROLOG) which enables us to attach aD = a, + a2 - (a, * u2)
support pairs to facts and conditional support pairs to PD = PI + h - (PI * b2)
rules. These are pairs of numbers [a, p], with a, 0 E [O,l],
where for a fact p, a represents the degree of belief negation
in p and 1 - f3 represents the degree of belief in -p and,
for a rule p :- q, a represents the support for p given
q and 1 - p represents the support for *p given q. As
such, support pairs may be considered as measures of where
evidential support, or as some sort of probability and
conditional probability measures. For example, the a N=l--P,
support logic fact &.q= 1 - a,

guilty(arthur) :[4/12,7/12] IF conditional

means that the degree of belief that ‘arthur’ is guilty r :h7Prl


is 4/12, while the degree of belief that ‘arthur’ is not
guilty is 5/12, which may be interpreted in terms of where
a jury’s voting as four for and live against with three
abstentions. Similarly the support logic rules a, = a3 * a,
PI = 1 - N1 - PJ * aI1
rain(T) :- pressur$low,T),warm_front(T). :[0.9, l]
Same conclusion
rain(T) :- pressure(high,T) :[O.1,0.3]

express the support for and against rain at a (variable)


t :MGl
time T, in the first case when pressure is low and there where
is a warm front and, in the second case, when there
is high pressure.
The question now arises as to how these support pairs
a5 = ]a, + a5 - fo4 * a,) - Kll(l - a
Ps = (84 * PM - kr)
are established. In the first examule above, the voting
model was invoked as a possible source for the assigned and
pair; in the case of conditional rules, however, this is
not generally applicable, and we say that they are derived
from a subjective statistical assessment. This is obviously
K= h*U - BJI + [as*(l - I331
a laborious task when initializing a support logic
database, but there is scope for an automatic ‘tuning’ is the conflict term representing the support for t and
of the supports based on actual examples which would “t which is redistributed between as and ps (Dempster-
diminish their subjective content. Shafer renormalization4).
The calculus of support logic allows us to reason from
uncertain facts to uncertain conclusions, integrating evi- The formulas for the combination of supports arising
dence from different sources (or ‘proof paths’) as we from different proof paths in ‘same conclusion’ is crucial
do so. Its derivation is documented in Baldwin’, but to the conflation of the various types of evidence at
we reproduce the main combination formulas below the crux of the visual interpretation process.
referring to the knowledge base

P :hAl SCHEMAS
4 :MU
r :- p :Mhl The second plank of the system is what are generally
t :b4,P41 called knowledge-based methods, in which the interpre-
t :bAl tation process is controlled according to the structure
of the background or world knowledge that is available.
Conjunction This is used to make hypotheses about the image and
embodies certain semantic constraints that are to be
PY4@c&1 satisfied amongst entities in the image. Thus it is a top-
down approach which would seem to be necessary when
where the imperfect nature of the low-level data suggests that

voi 5 no 3 august 1981 207


a bottom-up (i.e. data-directed) approach would not be sequence of events (the script) is retrieved when a parti-
wholly effective. Other vision systems using a top-down cular situation is encountered and expectations and slot-
approach are described in Ambler et al.‘, Hanson and filling questions invoked. The advantage of Sowa’s
Riseman’, Wesley and Hansong, Kitchen and schemas over frames and scripts is that they are founded
Rosenfeld” and Ohta”. In Wesley and Hanson9 a on the formal system of conceptual graphs which enables
Dempster-Shafer-like approach to evidential reasoning us to represent associations in a perspicuous way. How-
is incorporated. Dellepiane et al.‘* describe a knowledge- ever there is not the facility to attach procedures directly
based system for the recognition of tomographic images to constituents of the schema; the simplicity of the
(i.e. two-dimensional slices) of the brain, which can be notation is balanced by the complexity of schema
integrated to produce a three-dimensional interpre- manipulation.
tation. In the application of schemas to vision we restrict
The knowledge representation structure we use is the notion of a schema to that of a visual schema. This
adapted from the schema of Sowa’s theory of conceptual means that we limit the kinds of concepts and conceptual
graphs13. Essentially these are knowledge representation relations that comprise the expectations to those that
structures made up of concept nodes linked via concep- are reasonably perceptible. For example, we do not as
tual relation nodes. A concept consists of a type and yet incorporate any ‘event’ subtypes, since these would
a referent. A type is a label assigned to entities grouped seem to be only inferable at a stage subsequent to the
together because of their sharing certain characteristics; visual interpretation; similarly with various conceptual
we may have event types, object types, attribute types relations. The idea of individual referents is irrelevant
and others so that, for example, ‘eat’, ‘rabbit’ and as well, since schemas only contain indefinite concepts.
‘spherical’ are types. Further, we group the types in However, the result of a scene interpretation is to be
a hierarchy which is defined as a special kind of graph- represented as a perceptual graph, where the referents
theoretic structure called a ‘lattice’ where, for example, are all individuals signifying regions in the segmented
‘snake’ is a supertype of ‘adder’ and ‘red’ is a subtype image, rather than entities in the object world; this will
of ‘colour’. A referent is either definite or indefinite: be clarified below.
the former represents an individual instance of a type, We also extend the notation so that we can represent
linguistically expressed as either a proper noun or by the sort of uncertainty associated with the relationships
the definite article; the latter represents an indefinite embodied in these structures. Thus we do not need hard
instance of a type, expressed by the use of the indefinite and fast constraints; rather we give them relative impor-
article. Thus in the notation we represent the expression tances. Examples of visual schemas for the concepts [sky]
‘Frank the hedgehog’ as the concept [hedgehog:Frank], and [trees] are shown in Figure 1; their interpretation
and similarly ‘a mollusc’ as [mollusc:*] or [mollusc] for will be discussed below.
simplicity. Thus the basis of the algorithm is a knowledge-based
Conceptual relations define the semantic roles played approach incorporating uncertainty. A detailed descrip-
by the concepts in the graph; they represent the deep tion of the algorithm is given below, but it is important
cases colouring the relationships between the concepts at this stage to distinguish between the two principle
in a sentence, which may be revealed by morphological processes at the core of its operation. First there is the
or syntactic analysis (word inflexions, parsing) or by collection of evidence. This is guided by the content
semantic analysis to resolve ambiguities. The principle of the schemas applied to the data. Concepts with
advantages of conceptual graph theory are that it is schemas of their own cause the activation of these
a meaning representation language with a close affinity schemas; attribute concepts cause the invocation of low-
to natural language, and that it has a formal foundation level procedures to evaluate the image data; and seman-
leading to the system’s logical adequacy (i.e. unequi- tic constraints (expressed by conceptual relations) are
vocalness) of representation. evaluated or held for later evaluation.
A schema is a conceptual graph representing the back- Secondly, there is the conflation of evidence. All the
ground knowledge associated with a particular concept,
and as such is adumbrated by Minsky’s framesI and schema(sky,*xl) <-
Schank and Abelson’s scripts”. The essential similarity
of schemas to frames and scripts is that they are a way [sky:*xl] -
(*rl,part) - - > [cloud]
of storing the knowledge associated with a particular (*d,part) - - > [blue-sky]
concept as an easily accessible ‘chunk’, and that they (*r3,loc) - - > [image-top];
{1):10.11.
<\,
may interact with each other and with other knowledge & I ,,

{*rl}:[l,l],{*r2}:[1,l],{*r3}:[0.7,1],
structures in complex ways; both of these aspects dis- 1*,*}:ll,ll,{*,*,*}:l~,ll}.
tinguish this approach from a purely logical represen- schema(trees,*xl) <-
tation where knowledge is stored as propositions and
inference is performed by deduction. Minsky’s frames rtrees:*x11-
L----- --,

(*r 1,has_hue) - - > [green]


were originally fairly vaguely defined but essentially they (*r2,has_val) - - > [value@%low_val]
consist of a number of terminals containing expectations (*r3,adj) - - > [sky];
in the form of ‘slots’ to be instantiated or to take default U1:10J1,
{*rl}:[O.6,l],{*r2}:[0.6,1],{*r3}:[0.3,1],
values; procedures may be called on the failure of an {*rl,*r2}:[0.9,1],{*rl,*r3}:[0.75,I],{*r2,*r3}:[0.85,1],
expectation which may involve the invocation of other {*rl,*r2,*r3}:[1,1]}.
frames.
Scripts are similarly vaguely described by Schank and Figure 1. Schema examples (the notation p,*,*]:[x,y/
Abelson, but are more particularly intended for the means that all subsets with three elements have the same
representation of episodic contexts. Thus a typical support pair [x,y])

208 image and vision computing


facts gathered during the collection of evidence become it may have, and thus to be able to build upon it in
the inputs for a support logic inference process resulting the future.
in measures of support for the ascription of particular
type labels to regions or groups of regions in the seg-
mented image. Thus the inference process is not straight- DESCRIPTION OF DATA STRUCTURES
forward; it consists of a knowledge-driven collection of
evidence, followed by an inferential process where this Query
evidence is combined. The reason for this separation
is that semantic constraints must be evaluated between The user might have a particular purpose in mind when
the collection and conflation, and that we must take an image is interpreted. This may be expressed in the
care to remove potential circularities in the inference form of a query which may be translated into a concep-
chain. tual graph by means of a natural language interface
However, there are limitations to the efficacy of a similar to Sinlin’6, which performs this task in a restricted
purely knowledge-based algorithm, as was reported in query domain. For example, the query ‘Where is a car?’
a previous papeP. This looked promising for simple can be represented as
pictures but for more complex examples there seemed
to be a need to incorporate syntactic evidence to make [-list-]- - > (obj)- - > [place:?] < - -(lot) < - -[car]
the system more robust and reliable. For the present
algorithm this takes two forms. First, we use the geo-
metrical and topological properties and relationships of where -list- denotes an actor which will list the answers
the regions in the segmented image to partition the image to the query, i.e. the locations of cars in the image.
into a plan consisting of a number of plan areas. The The concepts in this graph are [place:?] and [car] linked
plan areas are areas of the image where there is a via the conceptual relation (lot) denoting the location
dominant (large) region with subsidiaries or dominated relationship. An actor is a special type of concept which
(smaller) regions; it seems that the latter usually corres- is used to denote some kind of outputting procedure,
pond to background concepts, e.g. sky, trees etc. This and as such is not part of the formalism of conceptual
enables us to concentrate on each area and, by assessing graphs, but rather defines the type of answer required
the direct semantic evidence of its dominant region, when we have a question represented as a conceptual
assign a putative label or labels thereto, which leads graph. The presence of a question mark as the referent
us to apply a particular schema to the area. This of a concept and its connection via the (obj) relation
approach is due to Ohta”. to the actor indicate that this concept is the object of
Secondly, it was decided that there are situations the question.
where it may be more useful to be able to use a data-
driven approach in tandem with the knowledge-based
approach. In other words there are certain low-level Schemas for vision
clues that give strong evidence for the presence of certain
objects which should be used to advantage. Visual schemas are conceptual graphs containing infor-
Another feature felt to be important is the ability mation about the expected visual associations of a parti-
to interpret objects when they appear out of context, cular concept, the subject of the schema. The conceptual
and not to be restricted to searching only for objects relations occurring in them may be classified into four
provided by the current schema. To this end we are categories: property relations, which express the relation-
investigating the idea of minor schemas which embody ship between the subject and its expected visual attri-
the ‘less usual’ connotations of a particular concept, butes; subpart relations expressing relationships to
and which are resorted to when a subarea of an area concepts subsumed by the subject; global constraints
of the image cannot be identified (using some sort of expressing relations that should hold between the subject
threshold) as any concept suggested by the schema. An and subjects of other schemas; and local constraints
analysis of the relevant area must be carried out to search expressing relationships amongst subpart concepts.
for the features of such anomalous concepts. In Figure 1, for example, are schemas for the concepts
For the identification of discrete objects we are [sky] and [trees]. In the [sky] schema there are three
moving towards the application of a matching process conceptual relations: between [sky] and [cloud] and
between an object description and descriptions of between [sky] and [blue-sky] we have (part) relations,
examples of a proposed concept, the descriptions being which express the fact that clouds and blue sky are
represented as conceptual graphs. This would enable expected as parts of a sky area in the image; between
us to generate support pairs automatically using the idea [sky] and [image-top] we have a special kind of property
of a semantic distance, and would obviate the need for relation, (lot), which expresses the expected location of
a rule to define a concept of which the extension might a sky area as being at the top of the image.
carry many structural variations. Since the schema for [sky] has subpart concepts, we
In the next two sections we describe the data structures say that sky is a decomposable type. The schema for
employed in the algorithm and the operation of the pro- [trees], however, contains no such concepts: trees is a
gram. The implementation is at an early stage, and so homogeneous type. The concept [trees] is related to two
some aspects have been more thoroughly examined than attribute concepts, [green] and [value@%low_val], by
others; this is reflected in the relevant descriptions. It two relations, (has-hue) and (has_val) respectively,
is also clear that there are certain aspects that will need and to [sky] by the relation (adj). The first relationship
rethinking. However, the goal at the moment is to get expresses the fact that trees are generally of a green
a working program and learn from any shortcomings hue. The second relationship represents the value or

~015 no 3 august 1987 209


intensity expected of a trees region, where the notation recently considered introducing. They are similar to
has been extended by the introduction of the @ symbol schemas in that they are conceptual graphs but they are
to denote the measure of the concept, and the %symbol for representing descriptions of examples of particular
to denote a fuzzy measure, in this case low_val or ‘low types. Prototypes seem to be particularly useful for
value’. (Measures are further discussed in the third main discrete objects which are of the same type but are struc-
section.) The third relationship is a global constraint turally dissimilar so that a rule is not appropriate. They
expressing the adjacency between ‘sky’ and ‘trees’ regions are discussed further in the next main section, and exam-
in an image. ples of prototypes for the type ‘car’ are shown in Figure 2.
As an example of a local constraint we can adduce
the schema for a concept called [walling] (not shown)
where two subpart concepts of [walling] - [window] Minor schemas
and [wall] - are related via an (around) relation,
expressing the fact that a window is generally surrounded Minor schemas provide information about particular
by wall. concepts that is not supplied by schemas, since the infor-
Each conceptual relation has an associated pair of mation consists of the less usual sorts of association.
numbers indicating the necessary and possible support Thus failure to find the expected associations provided
provided for the subject by that relation holding. A by a schema would lead us to fall back on a minor
relation’s necessary support is the same as the condi- schema to retrieve information such as ‘cows might be
tional support in support logic, i.e. it represents the on the road’ or ‘cars might be in a field’.
degree of support given to the subject of the schema
when that relation holds. A relation’s possible support
is equal to unity minus the negative support of support Image data
logic; since none of the relations in a schema give nega-
tive support then all of their possible supports are unity. From previous exploratory programs and from the
(Consideration is being given to the usefulness of the implementation of the algorithm discussed here we have
incorporation of disconlirmatory relations in the come to some conclusions as to what sorts of low-level
schemas.) It is clear, however, that there are many cases image data would be most useful in our approach. These
where a conjunction of relations provides more eviden- are, for each region
tial support for a concept than its components taken
in isolation. If we represent a schema F as geometrical data, consisting of centroid and
maximum and minimum coordinates and area
I = {r,, . . . ,r,>, colour data, consisting of hue and saturation
measures which roughly correspond to ‘type’ and
wherer,i = l,..., n, are conceptual links, i.e. relations ‘depth’ of colour, as well as an intensity value; it
with their attached concepts, then we can define has been proposed that these values should be norma-
mappings lized in some way so that we have the values relative
to some average value - this would diminish the
u&:2’ + P,ll need for rigidly controlled lighting conditions.
profile data, consisting of a full description of the
where 2r is the power set of I. These mappings express profile of the perimeter; this could be used directly
necessary and possible supports provided by con- as evidence, and also indirectly in the calculation of
junctions of conceptual links. Thus for a conjunction shape and partial-shape properties (especially useful
c of conceptual links the associated support pair is [a,(c),
Pi(c)]. Any element of 2r for which a,- and pr are unde-
lined is assigned the default support [O,l] representing car (~4 c hatchback 1.x)
‘unknown’; similarly for the empty subset. The examples
in Figure 1 show how the supports build up for larger
conjunctions.

car (*xl- Saloon (*xl


Clues
Clues are procedures which directly examine low-level
data for features, the detection of which gives a very
strong indication of the presence of a particular concept. car (.x) - saloon_back_view (4
car 1.x) c saloon_angled_view Crx)

For example, detecting semicircular parts of the profile


of a region is a strong clue that we have found the
wheel arches of a car.
We have also considered that detection of a clue may
also provide us with some idea of the scale of the object, -car (.xlcbus (.x1

and thus act as a guide to the distance and size of other


possible objects in the image.

Prototypes
Prototypes are knowledge structures that we have only Figure 2. Prototypes for cars

210 image and vision computing


for dealing with over- or undersegmentation or the rather than from the object domain, and so we should
effects of shadowing) talk of percepts rather than concepts.
textural data, consisting of a label for the kind of
texturing, together with an indication of the scale;
it is not always clear where texture ends and segmen- Perceptual graph
tation begins, however, so it may be useful to have
alternative interpretations in border-line cases The perceptual graph is the final interpretation as a
a list of adjacent regions and their direction from conceptual graph, formed from the joining of all the
the region perceptions. Each referent is substituted with a unique
label to signify the entity in the world that its concept
Previously we were using segmentations of some images is to represent; thus percepts become concepts, since
of vehicles (shown in Reference 17) and a description they are now deemed to refer to the world. This is the
of the accompanying data appears in Reference 18. It structure representing the knowledge that has been con-
is hoped that a further selection of images covering a veyed by the image, and thus is a symbolic description
wider range of types of image will become available of a perception. We can incorporate this perceived know-
soon. ledge into an intelligent knowledge base and perform
reasoning with it using the various operations defined
for conceptual graphs.
Plan

A plan consists of a set of plan areas, which are groups DESCRIPTION OF OPERATION OF
of contiguous regions of which one is that plan area’s ALGORITHM
principal region which ‘dominates’ the others, usually
by virtue of relative size or enclosure. A partially labelled This section should be read in conjunction with Figure
^
plan is a plan with a list of labels attached to each 3.
of the plan areas, and a plan image is a plan where
each plan area’s list of labels is ordered according to
the evidence accrued up to a certain point. The for-
mation of the plan, of the partially labelled plan and f t
of the plan image are described in the next main section.

Pending constraints

Pending constraints are semantic constraints that have


not been satisfiable at the local level of the schemas
and consequently need to be satisfied in the context
of the whole image.

--c( (5) Form plan image


Slop database

The Slop database consists of a set of Slop facts, collated


from the application of schemas and clues to the image, (6) Apply schemas
which is continually being updated and augmented, and
of a number of Slop rules derived from the schemas.
The set of Slop facts represents a depository for all
(7) Update facts
the information that has been found during the various
phases of the interpretation. It contains facts from all t ‘I
1
different levels, and the final interpretation of the image slop facts
I slop rules

rests on the integration of all this data. L + c


(8) Evaluate supports

‘I
(9) Instantiate schema
Perceptions
t
Perceptions
Perceptions are a type of conceptual graph representing
JI
the interpretation of areas of the image; they are the (10) Join perceptions
results of lumping together Slop facts, and are essentially
instantiated schemas where the concepts have regions $
Perceptual graph
or groups of regions as their referents, together with I
support pairs derived from the assessment of all the
collected evidence. Thus they are a special kind of r
conceptual graph since their concepts’ referents are
exclusively from the domain of the segmented image Figure 3. Algorithm control structure

~015 no 3 august 1987 211


Find_relevant_knowledge
trees(X) :-
< - has_hue(X,Y),green(Y) :[0.6,1]
In. Query, knowledge-base schemas, clues and proto- < - has_val(X,$low_val),value($low_val) :[0.6,1]
types. < - adj(X,Y),sky(Y) :[0.3,1]
<: -has-hue(X,Y),green(Y),
<- has_val~,$low_val),value($low_val) :[0.9,1]
Out. Relevant schemas, relevant clues, relevant proto- < - hashue(X,Y),green(Y),adj(X,Z),sky(Z) :[0.75,1]
types. < - has_val(X,$low_val),value($~ow_val),
adj(X,Z),sky(Z) :[0.85,1]
Procedure. Examine the conceptual graph for the query < - hasc_.hue(X,Y),green(Y),has-val(X,!§low-val),
and retrieve value(%low_val),adj(X,Z),sky(Z) :[1,l]

clues and prototypes for anything mentioned in query;


these become relevant clues and relevant prototypes Figure 4. Bundle for trees
respectively
schemas for decomposable background labels (e.g.
sky, road, building, field) where a(ci) and /3(ci> are the supports of the con-
schemas for homogeneous background labels (e.g. junctions. We need to have restrictions on the support
blue-sky, trees) pairs we can assign to the member rules; these will be
explained in a forthcoming paper on Slop and bundles.
The last two of these become relevant schemas. (With
the present restriction on the kinds of image we are
looking at, all schemas in the current knowledge base
are relevant; in future we would want to be able to Form-plan
pick out only schemas relevant to a particular kind of
image at this stage.) In. Image data.

Out. Plan.
Compile_knowledge
Procedure. Using geometrical data - size, shape, extent
In. Relevant schemas, relevant clues. etc. of regions - and topological data - adjacencies,
enclosures between regions - directly or indirectly
Out. Slop rules. supplied by the image data, partition the image into
plan areas each with a principal or dominant region
Procedure. For each schema we need to form a bundle, and its subsidiary regions. The current version of the
i.e. a collection of Slop rules each of which corresponds algorithm for performing this has several stages. We
to a conjunction of conceptual links in the schema. In first need to find dominant regions which are initially
terms of support logic these rules are mutually depend- those regions which enclose others and those which
ent, and therefore the support logic calculus for inde- satisfy some criterion for ‘largeness’. Testing for
pendent sources of evidence (Dempster’s ‘surroundedness’ is straightforward using the adjacency
renormalization) is inappropriate, and so we need to data; however, we need to set some parameter as a
extend the calculus. Thus a schema fraction of the total size of the image to determine ‘large’
regions. Once this initial stage has been completed
t(x) <“r = {r,, . . . ,r,} regions adjacent with any dominant region are made
into subsidiary regions, i.e. are associated with that
with the support mappings a&: 2r + [O,l] compiles dominant region.
into the bundle In the second stage we relax the largeness parameter
and look for any more dominant regions amongst those
t(x) :- previously unassociated. Then we attempt to associate
<- cl :[a,(cl),Br(cl)l by adjacency any nondominant or unassociated regions
<- c2:[~,w,P,(c2)1 with dominant or previously associated regions. In the
third stage we relax the parameter again and repeat the
association. This process is then continued until no
<- c2, :[a,(c2”), P&T)] unassociated regions are left. It is possible that at any
stage an unassociated region may be adjacent to two
where ciE 2r, i = l,... ,2”. The symbol <- precedes or more ‘growing’ plan areas, and so the conflict must
each member rule of the bundle. For example the ‘trees’ be resolved; this is currently done on the basis of
schema becomes as in Figure 4. The extension to the propinquity of colour features.
calculus is straightforward: in the example above the Evidently the choice of initial parameter and the extent
overall support for t(X), the pair [a,,&], is considered of its subsequent relaxations as well as the topology
as an interval equal to the intersection of the support of the particular segmentation are crucial to the outcome
pair intervals arising from each member rule, i.e. of this process. Consideration is being given to ways
of assessing a plan so that some sort of feedback is
possible. Figure 5 shows the application of the algorithm
~L&J = fi h(ci) * Mci), 1 - ((1 - IWiN * cW)l to a segmented image with an initial largeness parameter
i= I of l/20.

212 image and vision computing


Procedure. If we have any labels for which the applica-
tion of clues has been successful then any corresponding
prototypes are retrieved and an attempt is made to match
with the plan area. A good match means that we have
no need to apply property or other rules of any schemas
(i.e. we bypass the rest of this and the next subsection
for this particular plan area). This is discussed further
below in relation to matching prototypes to subparts
of plan areas.
For each schema to be applied, we find its property
relations by reference to the type lattice and retrieve
corresponding PROLOG procedures; these procedures are
evaluated for the principal regions of a plan area, the
plan labelling is updated and the results are recorded
as Slop facts. In this way we can order the set of schemas
to be considered before we apply them to the plan areas
as whole. The result is that each plan area is associated
with a list of schema names, ordered according to the
evaluation of their attribute concepts on the principal
region of that plan area; this structure is called the plan
image.
The attributes that are relevant for vision are shown
in Figure 6 in the sublattice v-attribute. U(saturation)
and U(intensity) refer to the universes of discourse of
b saturation and intensity respectively; these are con-
Figure 5. a, segmented image; b, result of application tinuous sets associated with measurable attributes
of form-plan to (a) with a largeness parameter of 1120 (m-attribute) from which numerical values or linguistic
values may be taken and assigned as the measure of
the concept. Linguistic values correspond to crisp or
Apply_clues fuzzy subsets of the universe of discourse. Procedures
for types higher up in the lattice call procedures for
In. Image data, relevant clues, plan. their subtypes, and so on until we reach the primitive
types where we can refer to the image data. For example,
Out. Partially labelled plan, Slop facts. in the schema for trees (Figure 1) there are attribute
concepts [green] and [value@$low_val] to be evaluated.
Procedure. Apply procedures for relevant clues to all The procedure to evaluate ‘greenness’ takes the hue and
regions of plan areas; if successful, record the corres- saturation from the colour data and is embodied in the
ponding type label separately from the labels of the back- PROLOG clauses in Figure 7. (Hue and saturation are
ground schemas that are to be applied to each plan in the form of polar coordinates, with hue taking angular
area. We can then apply the schemas or prototypes of values from [0,360], and saturation taking radial values
the successful types as a priority, and cut out the applica- from [0,255].) The evaluation of the value concept con-
tion of the other schemas if a match has been obtained. sists of calculation of the membership value of the (inten-
For example, if the clue for a car has been found, then sity) value from the colour data with respect to the fuzzy
in the subsequent stage we apply the prototypes for ‘car’ set %low_val, as embodied in the clauses of Figure 8.
to the corresponding plan area, and if a good match (The intensity value takes values from [0,250].) Since
is found we do not need to apply the other schemas. some of the data is not directly available we currently
Whether successful or not, the result of a clue application need to introduce the notion of a human observer so
is recorded as a Slop fact for use in the subsequent that missing bits of data can be requested from the user.
support logic evaluation.
The relatively time-consuming nature of the applica-
tion of the clues - evaluating complex low-level attri- Apply_schemas
butes, such as finding a semicircular part of a profile,
for every region - is balanced by the possible saving In. Plan image, relevant schemas, Slop facts, Slop rules,
in time resulting from a successful evaluation. However, relevant prototypes, minor schemas.
the application of the sorts of clues we have mentioned,
i.e. for discrete objects, is not always relevant, since their Out. Slop facts, pending constraints.
detailed nature may not be discernible from the segmen-
tation (as in Figure 5, for example). Procedure. We take the plan image which consists of
a list of plan image areas; each of these consists of a
Form_plan_image principal area, a list of subsidiary areas and a list of
possible type labels each with an associated support pair.
In. Image data, relevant schemas, relevant prototypes, Any type labels with a support pair [O,O]are rejected,
partially labelled plan, procedures, lattice. i.e. that schema is not applied to that plan image area.
The other labels are ordered according to their attached
Out. Plan image, Slop facts. support pair.

~015 no 3 august 1987 213


image-position -<. ~~~~~-~to,

I circular

semicircular

trapezoidal

rectangular

sloping-front

sloping back
attribute
red

block
luminant_attribute

saturation - U(saturotion)

intensity -U(intensity)

\ textural-attribute

’ shiny

Figure 6. Sublattice of visual attribute type labels

The heart of the routine is a predicate called apply_


hashue(green,X,[N,P]) :-
colour(X,Hue,Sat,Value), schema which has a conditional branch in it. If we can
deg_to_rad(Hue,HueRad), find property concepts in the schema then we have a
colour_compatibility(HueRad,Sat,1.0472,3.14159,N,P). homogeneous type and do not consider subparts. Other-
wise we must subdivide, as in the formation of the plan,
colour_compatibility(-,O,-,-,O,l) :-
!.
the plan image area into subareas to which we apply
colour_compatibility(Hue,-,HuePl,_,O,O) :- the schemas of subpart concepts in the original schema
Hue= <HuePl, (providing a kind of recursion); moreover, any relations
I.. between subpart concepts are treated as local constraints
colour_compatibility(Hue,-_,HueP2,0,0) :-
and must be evaluated between corresponding subareas.
Hue > = HueP2,
!. Any subareas not evaluated to a subpart (of the schema)
colour_compatibility(Hue,Sat,HuePl,HueP2,N,P) :- are determined after invoking minor schema(s). (The
HueHalfRange is (HueP2 - HueP1)/2, invocation of minor schemas has not yet been imple-
(Theta is Hue - HuePl; mented.) For both homogeneous and inhomogeneous
Theta is HueP2 - Hue),
Theta > 0,
types, global constraints are subsequently invoked. If
Theta < HueHalfRange, it is possible to evaluate them, i.e. if areas with the
P is Theta/HueHalfRange relevant label have been found already, then the corres-
N is (Sat*Theta)/HueHalfRange/255. ponding procedure is called; if not, then we store them
as pending constraints to be subsequently evaluated.
Figure 7. Procedure for ‘green’ It may be that the subpart we are looking for refers
to a discrete object. It does not seem appropriate to
use a schema for this kind of object since we wish to
concentrate on the object in isolation and are not
has_val(value@M,APR,[NV,PVj) :-
colour(APR,-,,V), interested in its relationships with other parts of the
chi(M,V,NV), image (except in so far as the context has already guided
!, us as to what we should look for). Rather, we need
NV=PV, to compare the description of the object with descrip-
chi($low_val,S,N) :-
tions of a small set of examples of the invoked concept.
rampdown(60,120,S,N). This seems to be necessary as it is not generally possible
to provide a watertight rule defining certain sorts of
rampdown(X1,_,S,1) :- concepts in terms of their structure; for example, ‘car’
s= <Xl, comes in all shapes and sizes, and ‘vehicle’ is even more
!.
rampdown(_,X2,S,O) :- diverse. Thus we introduce the idea of a concept’s proto-
s> =x2, typical cluster as a set of conceptual graphs called proto-
1 types, each of which represents the description of an
rampdown(Xl,X2,S,N) :- example of that concept. Similarly, we may have a set
N is (X2 - S)/(x2 - Xl).
of counterexamples which would enable the elimination
of spurious interpretations. Examples of prototypes of
Figure 8. Procedure for ‘value @$low_value’ a car are shown in Figure 2.

214 image and vision computing


The prototype matching process is not invoked unless Procedure. Use Sowa’s join operation repeatedly to
a clue application (see above) has been successful; if construct an overall interpretation of the image from
the matching is successful we do not need to apply and instantiated schemas, and substitute identifying labels
evaluate schemas as described above. Work on proto- for regions and groups of regions in the percepts to
type-object matching techniques and on learning and denote the individual objects in the world they are to
updating prototypes from examples is proceeding. represent as concepts.

Update-facts
SUMMARY
In. Pending constraints, Slop facts.
In this paper we have taken a specific approach to the
Out. Slop facts. problem of vision based on the belief,that accumulated
knowledge gives us the ability to recognize what we see,
Procedure. Using Slop facts from apply_schemas, we and so that this ability must be modelled in some way
evaluate global constraints between concepts using the for a computer to approach competence. Moreover, we
relevant low-level procedures, e.g. the adjacency of areas have expounded an algorithm which we hope will act
of sky to areas of trees would be established at this as a springboard for future developments in the modell-
stage by considering the adjacencies of their constituent ing of the complex interactions between various sorts
regions. of knowledge and sensory data that characterize the
vision process.
The algorithm itself is not completely implemented
Evaluate_supports at the time of writing, but it is evolving and being
adapted as new ideas arise as well as being extended.
In. Slop rules, Slop facts. Thus there are no experimental results as yet except
in so far as the plan-forming subalgorithm produces
Out. Slop facts. a satisfactory output (c.J Figure 5). There is also an
apparent need for greater flexibility in the algorithm
Procedure. During the applications of the schemas, low- so that backtracking and feedback can occur, enabling
level procedures have been employed to evaluate evi- us to handle the various knowledge structures more
dence, and the results thus obtained have been recorded efficiently for different sorts of image. Also, the question
as Slop facts. The results of evaluating the global arises as to how we may generalize any of the techniques
constraints have also been thus recorded. The important incorporated in the algorithm; to find the answer to
predicate evaluate-supports now uses the Slop this we need to attempt the interpretation of a range
inference mechanism to draw a conclusion about the of images, necessitating the encoding of more and
compatibility of the type with that plan image area; different kinds of knowledge, this possibly entailing the
this is where the bundles and their associated calculus application of different techniques.
are used. To avoid circularities in the inference chain, The plan, for example, is immutable once formed,
bundles corresponding to background concepts in the and each region belongs to one and only one plan area.
schema are temporarily held ‘in abeyance’, and are The plan formation algorithm is a heuristic algorithm
brought out of abeyance afterwards. The results of this but is generally reasonable for the sort of images we
evaluation are the final compatibilities of labels with are currently dealing with. However, it seems sensible
areas in the image. to take a more flexible approach and form a more
nebulous plan with some regions associated with more
than one plan area, effectively giving us alternative plans,
and to create plan areas that may be split in one or
Instantiate_schemas
more predetermined ways.
Another problem in this area is the formation of a
In. Slop facts, schemas.
subplan. At present this is performed using a simplified
version of the plan formation algorithm. However, it
Out. Perceptions (instantiated schemas). has been thought that the criteria for dividing an image
into meaningful chunks are not necessarily the same as
Procedure. Compile Slop facts back into instantiated those for dividing those chunks into meaningful sub-
schemas - instantiate generic referents to groups of chunks. Specifically, the idea has been mooted that shape
regions. The referents of the resulting concepts are areas is more important at this level, since we are more likely
of the image and it would be more correct to call them to encounter discrete objects than background areas.
percepts. The mechanics of this routine have not yet The solution to the ‘vision problem’ is not going to
been elucidated. appear overnight. What we have endeavoured and are
endeavouring to do is to implement a working program
using established tools from the field of artificial intelli-
Join-perceptions gence and, by experimentation and introspection, search
for any principles of knowledge-based vision. At the
In. Perceptions. moment occlusion and relative distance seem to be at
the heart of the problem, but this picture might be
Out. Perceptual graph. different tomorrow.

~015 no 3 august 1987 215


REFERENCES 11 Ohta, Y Knowledge-based interpretation of outdoor
natural color scenes Pitman, London, UK (1985)
1 Marr, D ‘Artificial intelligence - a personal view’ 12 Dellepiane, S, Serpico, S B and Vernazza, G ‘Three-
Artificial Intelligence Vo19 No 1 (1977) pp 3748 dimensional organ recognition by tomographic
2 Baldwin, J F ‘Support logic programming’ Proc. image analysis’ NATO-ASZ ‘86 Conf (1986)
NATO Advanced Study Institute on Fuzzy Sets (July 13 Sowa, J F Conceptual structures Addison-Wesley,
1985) Wokingham, UK (1984)
3 Zadeh, L A ‘Fuzzy sets’ Inform. Control Vol8 (1965) 14 Minsky, M ‘A framework for representing
pp 338-353 knowledge’ in Winston, P H (ea.) The psychology
4 Shafer, G A mathematical theory of evidence of computer vision McGraw-Hill, New York, NY,
Princeton University Press, Princeton, NJ, USA USA (1975) pp 21 l-280
(1976) 15 Schank, R C and Abelson, R P Scripts, plans, goals
5 Monk, M R M and Baldwin, J F ‘Slop user’s manual: and understanding Lawrence Erlbaum Associates,
version 1.1’ Information Technology Research Centre Hillsdale, NJ, USA( 1977)
Report No ITRC 70 University of Bristol, Bristol, 16 Morton, S K and Popham, S J ‘Knowledge-based
UK (1986) interpretation of segmented images using schemas
6 Clocksin, W F and Mellish, C S Programming in PRO- and support logic’ Information Technology Research
LOG Springer-Verlag, Heidelberg, FRG (198 1) Centre Report No. ITRC 75 University of Bristol,
7 Ambler, A P, Barrow, H G, Brown, C M, Burstall, Bristol, UK (1986)
R M and Popplestone, R J ‘A versatile system for 17 Morton, S K and Baldwin, J F ‘Conceptual graphs
computer-controlled assembly’ Artificial Intelligence and fuzzy qualifiers in natural language interfaces’
Vo16 (1975) pp 129-156 Information Technology Research Centre Report No.
8 Hanson, A R and Riseman, E M ‘VISIONS: a com- ZTRC 54 University of Bristol, Bristol, UK (1984)
puter system for interpreting scenes’ in Hanson, A 18 Photographs of surface segmentations of Alvey vehicle
R and Riseman, E M (eds) Computer vision systems data - version AQ-02: images MCCS 2 to 19 Pattern
Academic Press, New York, NY, USA (1978) Recognition Group, Marconi Command and
9 Wesley, L P and Hanson, A R ‘The use of an Control Systems Ltd, Chobham Road, Frimley,
evidential-based model for representing knowledge Camberley, Surrey, UK (MCCS reference: T(F)T/
and reasoning about images in the VISIONS system’ 278)
Proc. IEEE Workshop Computer Vision: Represen- 19 Description of region attribute and adjacency file
tation and Control (1982) pp 14-25 formats - version AQ 01 Pattern Recognition
10 Kitchen, L and Rosenfeld, A ‘Scene analysis using Group, Marconi Command and Control Systems
region-based constraint filtering Pattern Recogn. Vol Ltd, Chobham Road, Frimley, Camberley, Surrey,
17 No 2 (1984) pp 189-203 UK (MCCS reference: T(F)T/277)

216 image and vision computing

You might also like