Neubert2019 Article AnIntroductionToHyperdimension
Neubert2019 Article AnIntroductionToHyperdimension
https://doi.org/10.1007/s13218-019-00623-z
TECHNICAL CONTRIBUTION
Received: 15 December 2018 / Accepted: 11 September 2019 / Published online: 18 September 2019
© Gesellschaft für Informatik e.V. and Springer-Verlag GmbH Germany, part of Springer Nature 2019
Abstract
Hyperdimensional computing combines very high-dimensional vector spaces (e.g. 10,000 dimensional) with a set of carefully
designed operators to perform symbolic computations with large numerical vectors. The goal is to exploit their representa-
tional power and noise robustness for a broad range of computational tasks. Although there are surprising and impressive
results in the literature, the application to practical problems in the area of robotics is so far very limited. In this work, we aim
at providing an easy to access introduction to the underlying mathematical concepts and describe the existing computational
implementations in form of vector symbolic architectures (VSAs). This is accompanied by references to existing applica-
tions of VSAs in the literature. To bridge the gap to practical applications, we describe and experimentally demonstrate the
application of VSAs to three different robotic tasks: viewpoint invariant object recognition, place recognition and learning
of simple reactive behaviors. The paper closes with a discussion of current limitations and open questions.
13
Vol.:(0123456789)
320 KI - Künstliche Intelligenz (2019) 33:319–330
10 40 d=0.05
d=0.10
k1∕n (cf. [11, p. 23]). If we require 100 samples for an
10 20
dense (d=1) accurate representation of a one dimensional problem, the
same problem in a 10 dimensional space would require
10 0
0 500 1000 1500 2000 10010 samples to achieve the same sample density.
# dimensions
Beyer et al. [4] showed a direct consequence for the
nearest neighbor problem (given a set of data points in an
Fig. 1 Capacity of dense and sparse vector spaces quickly becomes n-dimensional metric space, the task is to find the clos-
very large (d is the ratio of ones). A discussion of properties of sparse
representations can, e.g., be found in [2] est data point to some query point). They define a query
as unstable if the distance from the query point to most
datapoints is less than (1 + 𝜖) times the distance from the
over implementations and computing principles of VSAs query to the nearest neighbor. Under a broad range of prac-
in Sect. 3, and a short overview of existing applications in tically relevant conditions, for any fixed 𝜖 > 0 and increas-
Sect. 4, to three (novel) demonstrations how hyperdimen- ing number of dimensions, the probability that a query is
sional computing can address robotic problems in Sect. 5. unstable converges to 1. In other words, the distance to the
These demonstrations are intended as showcases to inspire nearest neighbor approaches the distance to the farthest
future applications in the field of robotics. Remaining data point.
impediments in form of current limitations and open ques- Based on the these results on the contrast in nearest
tions are discussed in Sect. 6. neighbor queries in high-dimensional spaces, Aggarwal
et al. [1] investigated the influence of the choice of the
metric. For example, the often used Euclidean L2 norm is
2 Properties of High‑Dimensional Spaces: not well suited for high-dimensional spaces, better choices
Curse and Blessing are Lp norms with smaller p (for some applications this
includes fractal norms with p < 1). Also angular distances
2.1 High‑Dimensional Spaces Have Huge Capacity for real vectors and Hamming distance for binary vectors
are suitable choices.
The most obvious property is high capacity. For example,
when we increase the number of dimensions in a binary
vector, the number of stored possible patterns increases
2.3 Random Vectors are Very Likely Almost
exponentially. For n dimensions, the capacity is 2n . For real
Orthogonal
valued vector spaces and practical implementations with
Random vectors are created by sampling each dimension
limited accuracy (i.e. a finite length representation in a com-
independently and uniformly from the underlying space.
puter) the capacity is also exponential in the number of
The distribution of angles between two such random vectors
dimensions. Interestingly, even for sparse binary vector
contradicts our intuition. In an n-dimensional real valued
spaces, the number of possibly stored patterns grows very
space, for any given vector, there are n − 1 exactly orthog-
fast. Figure 1 illustrates this behavior. For n dimensions and
onal vectors. However, the number of almost orthogonal
� n � d (the rate of ones in the vector), the capacity is
density
vectors, whose angular distance to the given random vec-
. Even if there are only 5% non-zero entries, a 1000
⌊d⋅n⌋ tor is ≤ 𝜋2 + 𝜖 , grows exponentially for any fixed 𝜖 > 0. An
dimensional vector can store more patterns than the sup-
posed number of atoms in the universe (presumably about
1080).
13
KI - Künstliche Intelligenz (2019) 33:319–330 321
Surface areas Ratio of surface areas Random sampling probability Random sampling probability (extended)
7 10 25 0.4 1
Similar
6 Almost orthogonal
/A Similar
10 20 0.8
0.3
area (varying unit)
10 15
probability
0.6
probability
4
AAlmost orthogonal
0.2
3 0.4
10 10
2
0.1
10 5 0.2
1 Similar Similar
Almost orthogonal Almost orthogonal
0 10 0 0 0
0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 0 200 400 600 800 1000
# dimensions # dimensions # dimensions # dimensions
Fig. 3 Analytical results on n-spheres. Note the logarithmic scale in the second plot. Due to numerical reasons, the dashed extension for
#dimensions > 300 in the right plot is not obtained analytically but using sampling
noise
=0.50
1
high-dimensional spaces, the transition from 2-D to 3-D 0.6
value
0.5
space already gives an idea of what is happening. Figure 2 0.4
0
shows the sets of similar and almost orthogonal vectors on 0.2
-0.5
unit spheres in 2-D and 3-D space. It can be easily seen
0 -1
that in the higher dimensional space, a random point on 0 50 100
# dimensions
150 200 0 50 100
dimension index
150 200
13
322 KI - Künstliche Intelligenz (2019) 33:319–330
1
k=2
1
k=2 3 How to do Hyperdimensional Computing:
Vector Symbolic Architectures (VSA)
Probability of wrong query answer
0.4 0.4
k=8
k=9
The previous Sect. 2 listed properties of high-dimensional
k=10
vector spaces and demonstrated how a bundle of vectors can
be represented by their sum. Formally, the resulting vector
0.2 0.2
0
0 200 400 600 800 1000
0
0 200 400 600 800 1000 represents the unordered set of the bundled vectors. To be of
broader value, we need to be able to represent more complex
# dimensions # dimensions
13
KI - Künstliche Intelligenz (2019) 33:319–330 323
Space 𝕍 Tensors of real numbers Real and complex vectors {0, 1}n [−1, 1]n (or {−1, 1}n)
Bundle + Elementwise sum Elementwise sum Thresholded elementwise sum Limited elementwise sum
Bind ⊗ Tensor product a Circular convolution Elementwise XOR Elementwise product
Protect 𝛱 (Not considered) (Not considered) Permutations Permutations
This example will be explained in the following subsections. available binding implementations. Section 2 illustrated that
Querying a record is a simple example of high-dimensional the distribution of similar and dissimilar vectors is an impor-
computing. Before we proceed with more sophisticated tant property of high-dimensional vectors spaces. Thus,
demonstrations in Sects. 4 and 5, we will characterize the the effect of VSA operations on these similarities is also
operators in more detail and explain how the properties of important. E.g., binding should be similarity preserving:
high-dimensional spaces are exploited in the query-record ∀A, B, X ∈ 𝕍 ∶ dist(A, B) = dist(A ⊗ X, B ⊗ X), the distance
example. However, there is no exact definition available of of two vectors remains constant when binding both vectors
the required properties of the VSA operators, the exact set of to the same third vector. Moreover, as the first sentence of
operators or the vector space 𝕍 . The following is a summary this section already said, the result vector has to be dissimi-
of properties from the literature, i.e. the VSAs from Table 1. lar to the two inputs. This is important for the combination
with the bundle operator explained in the following section.
3.1 Binding ⊗
3.2 Bundling +
Binding ⊗ ∶ 𝕍 × 𝕍 → 𝕍 combines two input vectors into
a single output vector that is not similar to the input vec- The goal of the bundling operation + ∶ 𝕍 × 𝕍 → 𝕍 is to com-
tors but allows to (approximately) recover any of the input bine two input vectors such that the output vector is similar
vectors given the output vector and the other input vec- to both inputs. This is also called superposition of vectors.
tor. E.g., we can bind the filler AliceV to the role nameV by Typically, the bundling operator is some kind of elementwise
N = AliceV ⊗ nameV and later recover the filler AliceV given sum of the vector elements (see Table 1). E.g., the Multi-
N and the role vector nameV . To recover this vector, we need ply–Add–Permute VSA of Gayler [8] uses elementwise sum
to unbind one vector from another. For VSAs were vectors on the same vector space [−1, 1]n as the experiments from
are also (approximately) self inverse, unbinding and binding Fig. 5 (the sum is limited to the range of the vector space ele-
are the same operation (e.g. [8, 15]). Self-inverse means: ments [−1, 1]). In these experiments, we already showed that
the elementwise sum of vectors is similar to each of the vec-
∀X ∈ 𝕍 ∶ X ⊗ X = 𝟏
tors; this was a direct consequence of the almost orthogonality
where 𝟏 is the neutral element of binding in the space 𝕍 . We of random vectors.
will use this property in the following examples. According to Kanerva [17] the bundle and bind operations
An intuitive example of such a binding operator is the should “form an algebraic field or approximate a field”. In par-
special case of Gayler’s VSA with 𝕍 = {−1, 1}n (instead of ticular, bundling should be associative and commutative and
[−1, 1]n ) and binding and unbinding as elementwise mul- binding should distribute over bundling. Let us illustrate this
tiplication. The self-invertibility is due to the limitation to with a closer look at the example of Alice’s record. For brevity
±1, since −1 ⋅ −1 = 1 ⋅ 1 = 1 and 1 is the neutral element of we use X, Y, Z for the role vectors “name”, “ year_of _birth”
multiplication. With such a VSA, recovering the name in the and “high_score” and A, B, C for the vector representations
above role-filler example works as follows: of their values “Alice”, “1980” and “1000”. The record vector
is formed by H = (X ⊗ A) + (Y ⊗ B) + (Z ⊗ Z). What hap-
N ⊗ nameV = (AliceV ⊗ nameV ) ⊗ nameV
pens when querying for the name by binding with its vector X?
= AliceV ⊗ (nameV ⊗ nameV )
= AliceV ⊗ 𝟏 = AliceV X ⊗ H = X ⊗ ((X ⊗ A) + (Y ⊗ B) + (Z ⊗ Z))
= (X ⊗ X ⊗ A) + (X ⊗ Y ⊗ B) + (X ⊗ Z ⊗ C)
This example also requires the binding operator to be asso- = A + noise
ciative. Further, to allow to change the order of the vectors,
binding is typically also commutative. Table 1 lists several
13
324 KI - Künstliche Intelligenz (2019) 33:319–330
The noise term includes the terms (X ⊗ Y ⊗ B) and traditional subject-predicate-object relationships (e.g. “aspi-
(X ⊗ Z ⊗ C). Both are non-similar to each of their elements rin TREATS headache”) for fast approximate inference on
(a property of binding). Thus the only known vector that the relationships of diseases, symptoms and treatments. Nat-
is similar to X ⊗ H is A. Again this exploits the property ural language processing is considered a challenging task.
of high-dimensional vectors to be almost sure non-similar Jackendoff [13] specified this statement to four theoretical
(i.e. almost orthogonal) to random vectors. The database challenges that a system that aims at processing language at
experiments from Sect. 2.4 already illustrated how a noise- a human level has to solve. According to Gayler [9], VSAs
free version of vector A can be recovered: given a database can solve these challenges. Hyperdimensional computing
with all elementary vectors, returning the nearest neighbor was also used to encode n-gram statistics to recognize the
to A + noise results very likely in A (to not return a vector language of a text [14]. There is evidence that distributed
which is similar to noise, we need the property of binding high-dimensional representations are widely used for rep-
to be non-similar to its inputs). In VSAs this database is resentation in the human brain [2]. This is extensively used
typically called clean-up or item memory [16]. It can be as in brain-inspired cognitive systems like Spaun [6] and in
simple as our look-up table or, e.g., an attractor network hierarchical temporal memory (HTM) [12], a computational
[17]. Section 5.2 will evaluate properties of such a clean-up model of working principles of the human neocortex. The
memory in combination with real-world data. latter was also applied for mobile robot place recognition
There can be trade-offs between the performance of the [24].
bundling and binding operators. For example, in the VSA of
Gayler, the bundling operator works well for 𝕍 = [−1, 1]n ;
however, the self-inverse property of binding holds only 5 Application to Robotic Tasks
exactly for the special case of 𝕍 = {−1, 1}n . The clean-up
memory can also be used to restore exact values in the non- This section showcases three examples, how hyperdimen-
exact inversion case. sional computing can be used for real robotic tasks. We do
not claim that the presented approaches are better than exist-
3.3 Permutation (or Protect) ˘ ing solutions to the considered tasks, however, they dem-
onstrate the versatility of hyperdimensional computing, its
Gayler [8] discussed the benefit of using an additional capability to work with real world data and advocate the
operator in order to protect vectors. Think of a situation practical value. Before we start with the applications, we
with two bound role-filler pairs: A ⊗ X and B ⊗ Y . When will describe how we bridge the gap between real world
binding these two pairs to (A ⊗ X) ⊗ (B ⊗ Y), it becomes sensors and vector computations.
necessary to prevent mixing roles and fillers: Since bind-
ing is associative and commutative, this is equivalent to 5.1 Encoding Real World Data
(A ⊗ Y) ⊗ (B ⊗ X). The permutation operator 𝛱 protects
a term from associative and distributive rules. In the above Section 2.4 used synthetic data to demonstrate the noise
example this is (A ⊗ X) ⊗ 𝛱(B ⊗ Y). It is typically imple- robustness of hyperdimensional computations and its appli-
mented as a permutation of vector dimensions. Its output is cation to bundling. The random vectors in this synthetic data
dissimilar to the input and by application of the reverse per- fulfill the requirements to achieve pairwise almost orthogo-
mutation, it is also invertible. For details please refer to [8]. nal vectors by design. What if we want to work with real
world data that does not provide thousands of independent
random dimensions? For simple data structures and the par-
4 Applications from the Literature ticular case of sparse binary vectors, Purdy [29] discusses
different encodings. Very recently, Kleyko et al. [20] dis-
VSAs have been applied to various problems like text clas- cussed trade-offs in binary hyperdimensional encodings of
sification [20], fault detection [19], analogy mapping [30], images. A comprehensive discussion of encoding approaches
and reinforcement learning [18]. Kanerva [17] discusses of real world sensor data is beyond the scope of this paper.
the general computational power of VSAs and concludes However, we want to shortly describe our approach to
one could create a “High dimensional computing-Lisp”. encode the real world image data in our experiments.
While this is still open, work in this direction includes syn- Any high-dimensional image feature vector can poten-
thesis of finite state automata [27] and hyperdimensional tially be used. Based on their recent success, we decided
stack machines [38]. Danihelka et al. [5] (Deepmind) used to use image descriptors from early layers from deep con-
a VSA to model long-short term memory. In the medical volutional neural networks in a similar fashion as they are
domain, Widdows and Cohen [37] used Predication-based used for place recognition [25, 33]. To get a descriptor for
Semantic Indexing which exploits a VSA to represent an image, it is fed to an off-the-shelf readily trained CNN
13
KI - Künstliche Intelligenz (2019) 33:319–330 325
1
0.2
0.9
0.15 0.8
Cosine distance
Individual
Accuracy
Bundle
0.7
0.1 Static B4
Static B8
0.6
Individual
0.05 Bundle
Static B4 0.5
Fig. 6 Example views of one of the 1000 ALOI objects from 0◦, 90◦ 0
Static B8
0.4
and 180◦ viewing angle 0 45 90 135 180
Angular distance of query to known vectors
0 45 90 135
Angular distance of query to known vectors
180
(we use AlexNet [21]) and instead of using the final out- Fig. 7 Object recognition performance on ALOI dataset (color figure
online)
put (e.g. the 1000 dimensional soft-max class output), the
intermediate output of an earlier layer is used [we use the
13 × 13 × 385 = 64,896 dimensional output of the third con- in between) and image index q. The task is to assoziate q to
volutional layer (conv3)]. To reduce computational effort the correct image index k.
and to get a distributed representation, we use a locality VSA approach We bundle the image descriptors Ixk and Iyk ,
sensitive hashing (LSH) approach and project the normal- i.e. create Ixk + Iyk for each k (there will be one vector for each
ized conv3 descriptor with a random matrix R to a lower object in the database).
Results When comparing a query image Iz to the data-
q
dimensional space. Each row in R is the normal of a 64,896
dimensional hyperplane (obtained by sampling R from a base, motivation (1) is achieved by design: instead of com-
paring Iz to Ixk and Iyk individually, we can now compare
q
standard normal distribution followed by normalization of
rows to length one). Since these products of the normalized against the single bundle vector and reduce the number of
conv3 descriptor and each row (hyperplane normal) reflect required comparisons by factor two.
the cosine of the angle between the vectors, they are in range The results in Fig. 7 demonstrate the better interpolation
[−1, 1] and can be directly used in the Multiply–Add–Per- capabilities from motivation (2): the bundled representation
mute Architecture [8] (see Table 1). We use 8192 rows in R. (red curve) has a smaller cosine distance to the object image
under a novel viewing angle than the individual images (blue
5.2 Bundling Views for Object Recognition curve). This also results in a better object recognition accu-
racy (right part). See footnote4 for details.
Robotic task For the first robotic use-case, we demonstrate To evaluate towards continiously integrating more views
the application of hyperdimensional computing to recog- [motivation (3)], the yellow and the purple curves in Fig. 7
nize objects from multiple viewpoints. This is important for show query results when bundling a (static) set of multiple
mobile robot localization by recognizing known landmarks, views from the four angles {0, 90, 180, 270} and the eight
recognizing objects for manipulation, and other robotics angles {0, 45, 90, … , 315}. Although the distance values
tasks.
Motivation We use this task to transfer the results on syn-
thetic data from Sect. 2.4 on bundling of high-dimensional 4
Details: the red curve in the left plot evaluates vector similari-
vectors to real world data. Bundling allows to combine mul- ties (the query image index q is known and we compare the simi-
tiple vectors into one. This can be straightforwardly used larity of Ixk + Iyk and Iz ), the red curve in the right plot evaluates
q=k
to combine two or more known views. The motivation is the accuracy of a nearest neighbor query (the query image index q
is not known to the system and it returns the index k of the nearest
threefold: (1) if we combine all known views into one repre- neighbor to Iz of all Ixk + Iyk , k ∈ {1 … 1000}). x is fixed at viewing
q
sentation, the comparison of a query vector to all known rep- angle 0 . y varies from 0◦ to 350◦. The horizontal axis is the mean
◦
resentations is a single vector comparison. (2) There might angular distance from z to x and y. As a reading example: in the left
be a better interpolation between the known views. (3) This plot, the red curve evaluated at 90◦ means that for x = 0◦, y = 180◦,
z = 90◦ (e.g. the images from Fig. 6), the average cosine distance of
allows to straightforwardly update the representation of an the bundle (I0k + I180
k
) and I90
k
is about 0.17, and the right plot tells us
object, particularly iteratively in an online-filter fashion. that for about 53% of the objects the query image was most similar
Experimental setup We practically demonstrate this to the correct bundle. For comparison without bundling, the blue
approach using the Amsterdam Library of Object Images curves in Fig. 7 show the results when comparing the query image
to the individual images Ixk and Iyk (instead of their bundle). For the
(ALOI) dataset [10], in particular the collection of 72,000 distance evaluation in the left plot, we use the closest of the two indi-
images of 1000 objects seen from 72 different horizontal vidual results for each query. For the query results in the right plot,
viewing angles (5◦ steps). Figure 6 shows example images. all views Ixk and Iyk are stored in the database and a single query is
In our experiments, given are a database of k ∈ {1 … 1000} made (the number of data base entries and thus comparisons has now
doubled compared to the bundling approach). The VSA approach not
known images Ixk and Iyk at viewing angles x and y, as well as only reduces the number of comparison, it also performs slightly bet-
a query image Iz at viewing angle z = x+y (the viewing angle
q
2
ter than using individual comparisons in both plots.
13
326 KI - Künstliche Intelligenz (2019) 33:319–330
13
KI - Künstliche Intelligenz (2019) 33:319–330 327
Fig. 9 (Left) Place recognition results on Nordland dataset. The origi- performance compared to a direct pairwise comparison (top-right is
nal SeqSLAM sequence processing approach is well approximated better). (Right) Schematic overview of data flow for behavior learning
by the vector sequence encoding. Both improve the place recognition
of original SeqSLAM operations is m ⋅ n ⋅ ds . For the VSA the system learns a representation that encodes this reac-
implementation it is m ⋅ ds + n ⋅ ds + m ⋅ n (the first two tive behavior and can resemble it during new runs in the
terms represent the descriptor bundling and the last term environment.
the final pairwise comparison). E.g., for our database and Motivation The goal of this final example is to showcase a
query size of 288 images and d = 5, the ratio of the numbers more complex VSA-based system.5 In contrast to the previ-
of operations is more than factor 10 and it becomes larger if ous applications this does also involve action selection by
any of these values increases. Unfortunately, while for the the robot. The goal is to encode the whole robot program
original SeqSLAM most of the operations deal with scalar (a set of reactive behavior rules) in a single vector. When
similarity values, for the VSA approach all operations are executing (and combining) such VSA-based behaviors, the
high-dimensional vector operations. Presumably, a practi- advantages of vectors (i.e. the representational power and
cal runtime improvement can only be achieved with special robustness to noise) are preserved. A particular beauty of
hardware for high-dimensional vector computations (which this approach is that it can learn encodings for behaviors that
then could also exploit the energy saving potential of VSAs). have exactly the same form (a single vector) no matter how
Extensions The Nordland data is perfectly suited for complex the sensor input or the behaviors are.
SeqSLAM and its vector variant since each train journey is Experimental setup We use the simulation task described
a single long sequence with constant speed. To account for in [22]. Figure 9 illustrates the used simple robot with dif-
a slightly varying speed between the sequences, the original ferential drive (i.e. a left and a right motor), a left and a right
SeqSLAM algorithm evaluates line segments with different distance sensor, and a central light sensor. The robot starts
slopes and uses the best choice. The proposed VSA imple- at a random location in a labyrinth and the task is to wander
mentation can be straightforwardly extended to these vary- around while avoiding obstacles, until the robot finds a light
ing velocities by superposing the different combinations of source. Then the robot should stay under this light. This is
image vectors and sequence position vectors. Further inter- a simple task that can be coded using a simple set of rules
esting directions would be to control the similarity between (e.g. see [22]).
neighbored sequence position vectors or to use other VSAs’ VSA approach The listing in Algorithm 1 describes the
ways of encoding sequence information, e.g. permutations learning procedure. Inputs are pairs of sensor measures and
[16]. corresponding actuator commands. The idea is to (1) encode
the sensor and actuator values individually, (2) combine a
5.4 Learning and Recall of Reactive Behavior sensor value in a condition vector and all actuator encodings
in a result vector, (3) combine the condition with the result
Robotic Task The task is to learn simple reactive behaviors vector to a rule vector, and finally (4) combine all rule vec-
from demonstration. ”Simple“ means that we can repre- tors to a single vector that contains the whole ”program“.
sent them as a set of sensor-action (condition-result) pairs. Algorithm 2 is used in the execution phase to find the best
Given a successful demonstration of a navigation run (e.g. actuator commands for the current sensor input.
from a human) by pairs of sensor input and actuator output,
5
This work was previously presented at an IROS workshop, see [26]
for details.
13
328 KI - Künstliche Intelligenz (2019) 33:319–330
Algorithm 1: Learning 3 . Since many of these rules are bundled to create the com-
Data: k training samples [S, A]1:k of sensor and actuator plete program 4 , the condition vector has to be protected
values, a VSA, an encoder, an empty program
progHV and an empty vector knownCondHV of
(think of it as using brackets that also prevent distribution
known conditions in an equation) to prevent mixing up sensor conditions from
Result: progHV - a vector representation of the behaviour
different training pairs. To allow later recall from noisy vec-
// get vector representations for each sensor and actor
1 [sensor, actuator] = V SA.assignRandomV ectors() tors, each actuator-value pair [e.g. actuator1 ⊗ encode(a1 )]
// for each training sample [S, A] and the result bundle have to be stored in the clean-up
2 foreach pair [S = (s1 , ..., sn ), A = (a1 , ..., am )] do memory.
// encode values, bind to device and bundle
condition/result
In this example, during query, the task is to obtain the
3 conditionHV := +n i=1 (sensori ⊗ encode(si )) left and right motor commands given the current sensor
resultHV := +m i=1 (actuatori ⊗ encode(ai ))
input and the program vector. To be able to get the cor-
4
5 if isDissimilar(knownCondHV, Π(conditionHV ))
then rect commands, also the encoder/decoder and the clean up
// protect the condition and append (bundle) to memory are required 5 . The given sensor information are
6
the program
progHV :=
combined to a condition vector as before 6 . Binding this
progHV + (Π(conditionHV ) ⊗ resultHV ) vector to the program vector retrieves the most similar rule
// also append (bundle) the condition to the set vector from training 7 . The clean-up memory can be used
of known conditions
7 knownCondHV := to obtain a noise-free version. By binding this result with an
knownCondHV + (Π(conditionHV )) actuator role vector (e.g. actuator1), a noisy version of the
// insert the result and the actuator encoding
to the clean-up memory
corresponding command is obtained 8 . Using the clean-up
8 V SA.addT oCU M (resultHV ) memory and the decoder, this can be translated in a motor
foreach actuatori do
9
10 V SA.addT oCU M (actuatori ⊗ encode(ai )) command and used to control the robot 9 .
11 end Results We implemented this system and were able to
end
successfully learn behaviors that solve the described simu-
12
13 end
lation task from [22] using human demonstration runs. For
more details, please refer to [26].
This demonstrates that VSAs can also be used to imple-
Algorithm 2: Query ment more complex programs, including action selection. It
Data: progHV - the output of the learning procedure Alg. is possible to encode a complete robot program in a single
1, the VSA and encoder/decoder used in Alg. 1, the vector. However, the complexity of the program is limited by
query sensor inputs S
Result: output actuator commands A the capacity of this vector. More work is required to investi-
// encode values, bind to device and bundle condition gate the practical potential of this example.
1 conditionHV := +n i=1 (sensori ⊗ encode(si ))
13
KI - Künstliche Intelligenz (2019) 33:319–330 329
There are trade-offs like the one between the binding and 2. Ahmad S, Hawkins J (2015) Properties of sparse distributed
bundling operators in the Multiply–Add–Permute architecture representations and their application to hierarchical temporal
memory. CoRR arxiv:abs/1503.07469
explained in Sect. 3, the first works better using {0, 1}n as vec- 3. Bellman RE (1961) Adaptive Control Processes: A Guided Tour.
tor space, the other when using the interval [0, 1]n . This is MIT Press, Cambridge
not an unscalable problem, neither in theory (e.g by using a 4. Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When
clean-up memory) nor in practice (we also used this VSA in Is nearest neighbor meaningful? In: Beeri C, Buneman P (eds)
Database theory—ICDT’99. Springer, Berlin Heidelberg, Ber-
the robotics experiments in Sect. 5). Although some theoretical lin, Heidelberg, pp 217–235
insights on properties of VSAs are available (e.g. on the bundle 5. Danihelka I, Wayne G, Uria B, Kalchbrenner N, Graves A
capacity [7]), better insights in such trade-offs and limitations (2016) Associative long short-term memory. In: Balcan MF,
would support the practical application. Weinberger KQ (eds) Proceedings of the 33rd international
conference on machine learning, proceedings of machine learn-
A particularly important and challenging task is the encod- ing research, vol 48. PMLR, New York, pp 1986–1994. http://
ing of real world data into vectors. Our examples in Sect. 2 proceedings.mlr.press/v48/danihelka16.html
and most applications from Sect. 4 use random vectors— 6. Eliasmith C, Stewart TC, Choo X, Bekolay T, DeWolf T,
which are very likely pairwise almost orthogonal. However, Tang Y, Rasmussen D (2012) A large-scale model of the
functioning brain. Science 338(6111):1202–1205. https://doi.
for the shown robotic experiments in Sects. 5.2 and 5.3, we org/10.1126/science.1225266. http://science.sciencemag.org/
used encodings obtained from images using a CNN and LSH content/338/6111/1202
(Sect. 5.1). The resulting vectors span only a subspace of the 7. Frady EP, Kleyko D, Sommer FT (2018) A theory of sequence
vectorspace. Thus, presumably, the VSA mechanisms work indexing and working memory in recurrent neural networks.
Neural Comput 30(6):1449–1513. https : //doi.org/10.1162/
only approximately—nevertheless, they provide reasonable neco_a_01084
results. Insights to requirements on properties of the encoding/ 8. Gayler RW (1998) Multiplicative binding, representation opera-
decoding could have a huge influence for practical application. tors, and analogy. In: Advances in analogy research: integr of the-
The fact that in hyperdimensional computing most things ory and data from the cogn, comp, and neural sciences. Bulgaria
9. Gayler RW (2003) Vector symbolic architectures answer Jacken-
work only approximately, requires a different engineer’s mind doff’s challenges for cognitive neuroscience. In: Proc. of ICCS/
set. In the foreseeable future, complex machines like robots ASCS Int. Conf. on cognitive science, pp 133–138. Sydney,
will very likely contain a lot of engineering work—an easier Australia
access for non-mathematicians to what works why and when 10. Geusebroek JM, Burghouts GJ, Smeulders AWM (2005)
The Amsterdam library of object images. Int J Comput Vis
in these systems would presumably be a very helpful contribu- 61(1):103–112
tion. A very interesting direction would also be the connection 11. Hastie T, Tibshirani R, Friedman J (2009) The elements of sta-
to the probabilistic methods that are widely used in this field. tistical learning: data mining, inference and prediction, 2 edn.
Beside access to theoretical findings for applications of Springer. http://www-stat.stanford.edu/~tibs/ElemStatLearn/
12. Hawkins J, Ahmad S (2016) Why neurons have thousands of syn-
hyperdimensional computing, a structured way to practically apses, a theory of sequence memory in neocortex. Front Neural
designing systems using VSAs is missing. Currently, almost Circ 10:23. https://doi.org/10.3389/fncir.2016.00023
every problem that is solved using hyperdimensional comput- 13. Jackendoff R (2002) Foundations of language (brain, meaning,
ing is a somehow isolated application. Although the same prin- grammar, evolution). Oxford University Press, Oxford
14. Joshi A, Halseth JT, Kanerva P (2017) Language geometry using
ciples are used on every occasion, a structured approach how random indexing. In: de Barros JA, Coecke B, Pothos E (eds)
to solve problems, e.g. by means of design patterns, would be Quantum interaction. Springer International Publishing, Cham,
very desirable. Also related is the very fundamental question, pp 265–274
which parts of the system have to be designed manually and 15. Kanerva P (1997) Fully distributed representation. In: Proc. of
real world computing symposium, pp 358–365. Tokyo, Japan
which parts can be learned. Currently, many results are due 16. Kanerva P (2009) Hyperdimensional computing: an introduction
to elaborate design rather than learning. However, the high- to computing in distributed representation with high-dimensional
dimensional representations presumably provide easy access random vectors. Cognit Comput 1(2):139–159
to connectionists’ learning approaches—potentially an elegant 17. Kanerva P (2014) Computing with 10,000-bit words. In: 2014
52nd annual Allerton conference on communication, control,
bridge between (deep) artificial neural networks and (vector) and computing (Allerton), pp 304–310 . https://doi.org/10.1109/
symbolic processing. ALLERTON.2014.7028470
18. Kleyko D, Osipov E, Gayler RW, Khan AI, Dyer AG (2015) Imita-
tion of honey bees’ concept learning processes using Vector Sym-
bolic Architectures. Biol Inspired Cognit Arch 14:57–72. https://
References doi.org/10.1016/j.bica.2015.09.002
19. Kleyko D, Osipov E, Papakonstantinou N, Vyatkin V, Mousavi
1. Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising A (2015) Fault detection in the hyperspace: towards intelligent
behavior of distance metrics in high dimensional space. In: Van automation systems. In: 2015 IEEE 13th international confer-
den Bussche J, Vianu V (eds) Database theory—ICDT 2001. ence on industrial informatics (INDIN), pp 1219–1224. https://
Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 420–434 doi.org/10.1109/INDIN.2015.7281909
13
330 KI - Künstliche Intelligenz (2019) 33:319–330
20. Kleyko D, Rahimi A, Rachkovskij DA, Osipov E, Rabaey JM 28. Plate TA (1994) Distributed representations and nested composi-
(2018) Classification and recall with binary hyperdimensional tional structure. Ph.D. thesis, Toronto, Ont., Canada, Canada
computing: tradeoffs in choice of density and mapping character- 29. Purdy S (2016) Encoding data for HTM systems. CoRR arxiv
istics. IEEE Trans Neural Netw Learn Syst 29(12):5880–5898. :abs/1602.05925
https://doi.org/10.1109/TNNLS.2018.2814400 30. Rachkovskij DA, Slipchenko SV (2012) Similarity-based
21. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet clas- retrieval with structure-sensitive sparse binary distributed repre-
sification with deep convolutional neural networks. In: Pereira sentations. Comput Intell 28(1):106–129. https://doi.org/10.111
F, Burges C, Bottou L, Weinberger K (eds) Advances in neural 1/j.1467-8640.2011.00423.x
information processing systems, vol 25. Curran Associates, Inc., 31. Rahimi A, Datta S, Kleyko D, Frady EP, Olshausen B, Kanerva P,
pp 1097–1105. http://papers.nips.cc/paper/4824-imagenet-class Rabaey JM (2017) High-dimensional computing as a nanoscalable
ification-with-deep-convolutional-neural-networks.pdf paradigm. IEEE Trans Circ Syst I Regular Pap 64(9):2508–2521.
22. Levy SD, Bajracharya S, Gayler RW (2013) Learning behavior https://doi.org/10.1109/TCSI.2017.2705051
hierarchies via high-dimensional sensor projection. In: Proc. of 32. Smolensky P (1990) Tensor product variable binding and the rep-
AAAI conference on learning rich representations from low-level resentation of symbolic structures in connectionist systems. Artif
sensors, AAAIWS’13–12, pp 25–27 Intell 46(1–2):159–216
23. Milford M, Wyeth GF (2012) SeqSLAM: visual route-based 33. Sünderhauf N, Dayoub F, Shirazi S, Upcroft B, Milford M (2015)
navigation for sunny summer days and stormy winter nights. In: On the performance of ConvNet features for place recognition.
Proceedings of the IEEE international conference on robotics and CoRR arxiv:abs/1501.04158
automation (ICRA) 34. Sünderhauf N, Neubert P, Protzel P (2013) Are we there yet? Chal-
24. Neubert P, Ahmad S, Protzel P (2018) A sequence-based neuronal lenging SeqSLAM on a 3000 km journey across all four seasons.
model for mobile robot localization. In: Proc of KI: advances in In: Proceedings of the IEEE international conference on robotics
artificial intelligence and automation (ICRA), workshop on long-term autonomy
25. Neubert P, Protzel P (2015) Neubert P, Protzel P (2015) Local 35. Sünderhauf N, Brock O, Scheirer W, Hadsell R, Fox D, Leitner J,
region detector+ CNN based landmarks for practical place recog- Upcroft B, Abbeel P, Burgard W, Milford M, Corke P (2018) The
nition in changing environments. In: Proceedings of the European limits and potentials of deep learning for robotics. Int J Robot Res
conference on mobile robotics (ECMR) 37(4–5):405–420. https://doi.org/10.1177/0278364918770733
26. Neubert P, Schubert S, Protzel P (2016) Learning vector symbolic 36. Thrun S, Burgard W, Fox D (2005) Probabilistic robotics (intelli-
architectures for reactive robot behaviours. In: Proc of Intl Conf gent robotics and autonomous agents). The MIT Press, Cambridge
on intelligent robots and systems (IROS) workshop on machine 37. Widdows D, Cohen T (2015) Reasoning with vectors: a continu-
learning methods for high-level cognitive capabilities in robotics ous model for fast robust inference. Logic J IGPL/Interest Group
27. Osipov E, Kleyko D, Legalov A (2017) Associative synthesis of Pure Appl Logics 2:141–173
finite state automata model of a controlled object with hyperdi- 38. Yerxa T, Anderson A, Weiss E (2018) The hyperdimensional stack
mensional computing. In: IECON 2017—43rd annual conference machine. In: Proceedings of Cognitive Computing, Hannover, pp.
of the IEEE industrial electronics society, pp 3276–3281. https:// 1–2
doi.org/10.1109/IECON.2017.8216554
13