0% found this document useful (0 votes)
17 views15 pages

Automatic Generation and Detection of Highly Relia

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/260251570

Automatic generation and detection of highly reliable fiducial markers under


occlusion

Article in Pattern Recognition · June 2014


DOI: 10.1016/j.patcog.2014.01.005

CITATIONS READS

1,873 27,338

4 authors, including:

Rafael Muñoz-Salinas Francisco J. Madrid-Cuevas


University of Cordoba (Spain) University of Cordoba (Spain)
108 PUBLICATIONS 4,797 CITATIONS 48 PUBLICATIONS 3,333 CITATIONS

SEE PROFILE SEE PROFILE

Manuel J. Marín-Jiménez
University of Cordoba (Spain)
97 PUBLICATIONS 4,698 CITATIONS

SEE PROFILE

All content following this page was uploaded by Rafael Muñoz-Salinas on 11 October 2017.

The user has requested enhancement of the downloaded file.


Automatic generation and detection of highly reliable fiducial markers
under occlusion
S. Garrido-Jurado, R. Muñoz-Salinas, F.J Madrid-Cuevas, M.J. Marı́n-Jiménez
Department of Computing and Numerical Analysis.
University of Córdoba.
14071 Córdoba (Spain)
{i52gajus,rmsalinas,fjmadrid,mjmarin}@uco.es

Abstract setting a predefined dictionary are twofold. First, in some


cases, the number of markers required by the application
This paper presents a fiducial marker system specially ap- might be higher than the dictionary size. Second, if the
propriated for camera pose estimation in applications such number of markers required is smaller, then it is preferable
as augmented reality, robot localization, etc. Three main to use a smaller dictionary whose inter-marker distance is
contributions are presented. First, we propose an algo- as high as possible, so as to reduce the inter-marker con-
rithm for generating configurable marker dictionaries (in fusion rate.
size and number of bits) following a criterion to maximize
the inter-marker distance and the number of bit transi- Another common problem in augmented reality applica-
tions. In the process, we derive the maximum theoretical tions is related to the occlusion. The problem occurs when
inter-marker distance that dictionaries of square binary a real object appears occluding the virtual scene. In this
markers can have. Second, a method for automatically case, the virtual objects are rendered on the real object,
detecting the markers and correcting possible errors is pro- which should be visible (see Fig. 1(c,d)). This is indeed a
posed. Third, a solution to the occlusion problem in aug- limitation to the augmented experience since user can not
mented reality applications is shown. To that aim, multi- interact freely.
ple markers are combined with an occlusion mask calcu-
lated by color segmentation. The experiments conducted This paper presents a fiducial marker system based on
show that our proposal obtains dictionaries with higher square markers offering solutions to the above mentioned
inter-marker distances and lower false negative rates than problems. First, we propose a general method for gener-
state-of-the-art systems, and provides an effective solution ating configurable dictionaries (both in size and number
to the occlusion problem. of bits). Our algorithm creates dictionaries following a
criterion to maximize the inter-marker distance and the
number of bit transitions. In the process, we derive the
1 Introduction maximum theoretical inter-marker distance that a dictio-
nary of square binary markers can have. Then, a method
Camera pose estimation (Fig. 1(a,b)) is a common prob- for automatically detecting markers in images and correct-
lem in many applications requiring a precise localization ing possible errors, based on our generated dictionaries, is
in the environment such as augmented and virtual real- presented. Third, we propose a solution to the occlusion
ity applications, robotics, etc [1, 2, 3, 4]. Obtaining the problem based on combining multiple markers and an oc-
camera pose from images requires to find the correspon- clusion mask calculated using color information. While
dences between known points in the environment and their using multiple markers provides robustness against occlu-
camera projections. While some approaches seek natural sion, color information is used to determine the occluded
features such as key points or textures [5, 6, 7, 8, 9], fidu- pixels avoiding rendering on them.
cial markers are still an attractive approach because they
are easy to detect and allows to achieve high speed and The rest of the paper is structured as follows. Section 2
precision. presents the most relevant works related to ours. Section
Amongst the several fiducial marker systems proposed 3 explains the proposed method to generate marker dictio-
in the literature, those based on square markers have naries. Section 4 shows the process proposed for marker
gained popularity, specially in the augmented reality com- detection and error correction. Section 5 presents our so-
munity [10, 11, 12]. The reason why is that they allow lution to the occlusion problem. Finally, Section 6 shows
to extract the camera pose from their four corners, given the experimentation carried out, and Section 7 draws some
that the camera is properly calibrated. In most of the conclusions.
approaches, markers encode an unique identification by a
binary code that may include error detection and correc- Finally, it must be indicated that our work has been
tion bits. In general, each author has proposed its own implemented in the ArUco library which is freely avail-
predefined set of markers (dictionary). The problems of able [13].

1
Figure 1: Example of augmented reality scene. (a) Input image containing a set of fiducial markers. (b) Markers
automatically detected and used for camera pose estimation. (c) Augmented scene without considering user’s occlusion.
(d) Augmented scene considering occlusion.

2 Related work
A fiducial marker system is composed by a set of valid
markers and an algorithm which performs its detection,
and possibly correction, in images. Several fiducial marker
systems have been proposed in the literature as shown in
Figure 2.
The simplest proposals consist in using points as fiducial
markers, such as LEDs, retroreflective spheres or planar
dots [14, 15], which can be segmented using basic tech-
niques over controlled conditions. Their identification is
usually obtained from the relative position of the markers
and often involves a complex process.
Other approaches use planar circular markers where the
identification is encoded in circular sectors or concentric
rings [16][17]. However, circular markers usually provide Figure 2: Examples of fiducial markers proposed in previ-
just one correspondence point (the center), making neces- ous works.
sary the detection of several of them for pose estimation.
Other types of fiducial markers are based on blob de-
tection. Cybercode[18] or VisualCode[19] are derived from codes. Matrix[23] is one of the first and simplest pro-
2D-barcodes technology as MaxiCode or QR but can also posals. It uses a binary code with redundant bits for er-
accurately provide several correspondence points. Other ror detection. The ARTag [11] system is based on the
popular fiducial markers are the ReacTIVision amoeba same principles but improves the robustness to lighting
markers [20] which are also based on blob detection and its and partial occlusion by using an edge-based square detec-
design was optimized by using genetic algorithms. Some tion method, instead of a fixed threshold. Additionally, it
authors have proposed the use of trained classifiers to im- uses a binary coding scheme that includes checksum bits
prove detection in cases of bad illumination and blurring for error detection and correction. It also recommends
caused by fast camera movement [21]. using its dictionary markers in a specific order so as to
An alternative to the previous approaches are the maximize the inter-marker distances. Its main drawback
square-based fiducial markers systems. Their main ad- is that the proposed marker dictionary is fixed to 36 bits
vantage is that the presence of four prominent points can and the maximun number of erroneous bits that can be
be employed to obtain the pose, while the inner region is corrected is two, independently of the inter-marker dis-
used for identification (either using a binary code or an ar- tances of the subset of markers used.
bitrary pattern such as an image). In the arbitrary pattern ARToolKit Plus [24] improves some of the features of
category, one of the most popular systems is ARToolKit ARToolKit. First, it includes a method to automatically
[10], an open source project which has been extensively update the global threshold value depending on pixel val-
used in the last decade, especially in the academic com- ues from previously detected markers. Second, it employs
munity. ARToolkit markers are composed by a wide black binary codes including error detection and correction, thus
border with an inner image which is stored in a database achieving higher robustness than its predecessor. The
of valid patterns. Despite of its popularity, it has some last known version of ARToolKitPlus employs a binary
drawbacks. First, it uses a template matching approach to BCH [25] code for 36 bits markers which presents a min-
identify markers, obtaining high false positive and inter- imun Hamming distance of two. As a consequence, AR-
marker confusion rates [22]. Second, the system uses a ToolKitPlus BCH markers can detect a maximun error of
fixed global threshold to detect squares, making it very one bit and cannot perform error correction. ARToolkit-
sensitive to varying lighting conditions. Plus project was halted and followed by the Studierstube
Most of the square-based fiducial systems uses binary Tracker[12] project which is not publicly available.

2
BinARyID[26] proposes a method to generate binary
coded markers focused on avoiding rotation ambiguities,
however it only achieves Hamming distance of one between
two markers and does not present any error correction
process. There are also some closed-source systems which
employ square markers such as the SCR, HOM and IGD
[27] marker systems used by the ARVIKA project [28]. Figure 3: Examples of markers of different sizes, n, gener-
ated with the proposed method. From left to right: n = 5,
This paper proposes a square-based fiducial marker sys- n = 6 and n = 8.
tem with binary codes. However, instead of using a pre-
defined set of markers, we propose a method for generat-
ing configurable marker dictionaries (with arbitrary size 3 Automatic dictionary
and number of markers), containing only the number of
markers required. Our algorithm produces markers us- generation
ing a criterion to maximize the inter-marker distance and
The most relevant aspects to consider when designing a
the number of bit transitions. Additionally, a method for
marker dictionary are the false positive and negative rates,
detecting and correcting errors, based on the dictionary
the inter-marker confusion rate, and the number of valid
obtained, is proposed. This method allows error correc-
markers [11]. The first two are often tackled in the lit-
tion of a greater number of erroneous bits compared to the
erature using error detection and correction bits, which,
current state of the art systems.
on the other hand, reduces the number of valid markers.
Our last contribution is related to the occlusion prob- The third one, depends only on the distance between the
lem in augmented reality applications. When designing markers employed. If they are too close, a few erroneous
an augmented reality application, interactivity is a key bits can lead to another valid marker of the dictionary,
aspect to consider. So, one may expect users to occlude and the error could not be even detected.
the markers. ARTag handles the problem in two ways. Another desirable property of markers is having a high
First, the marker detection method allows small breaks in number of bit transitions, so that they are less likely to be
the square sides. Second, they employ several markers si- confused with environment objects. For instance, the bi-
multaneously, thus, the occlusion of some of them does not nary codes with only zeros or ones will be printed as com-
affect the global pose estimation. Despite of being robust pletely black or white markers respectively, which would
to occlusion, ARTag still has a main drawback: it can not be easily confused with environment objects.
detect precisely occlusion. As a consequence, if an object While previous works impose fixed dictionaries, we pro-
moves between the camera and the augmented scene (e.g. pose an automatic method for generating them with the
user’s hands), the virtual objects will be rendered on the desired number of markers and with the desired number
hands, hiding it (see Fig. 1(c,d)). of bits. Our problem is then to select m markers, from the
space of all markers with n×n bits, D , so that they are as
Proposals to detect the occluded regions usually fall
far as possible from each other and with as many bit tran-
into three main categories: depth-based, model-based, and
sitions as possible. In general, the problem is to find the
color-based approaches. Depth-based approaches try to
dictionary D∗ that maximizes the desired criterion τ̂ (D):
calculate the depth of the image pixels to detect occlu-
sions. However, these approaches require depth-based sen-
D∗ = argmax {τ̂ (D)} (1)
sors, such as stereo, time of flight or structured light cam- D∈D
eras [29, 30, 31]. When a single camera is used, some
Since a complete evaluation of the search space is not
authors have adopted model-based approaches [32, 33].
feasible even for a small n, an stochastic algorithm that
The idea is to provide geometric models of the objects
finds suboptimal solutions is proposed.
which can occlude the scene, and detect their pose. This
solution is not practical in many applications where the
occluding objects are not known in advance, and imposes 3.1 Algorithm overview
very strong performance limitations. Finally, color-based Our algorithm starts from an empty dictionary D that is
approaches [34], can be employed. The idea is to create a incrementally populated with new markers. Our markers
color model of the scene (background) which is then com- are encoded as a (n + 2) × (n + 2) grid (Fig. 3 ) where the
pared to the foreground objects. external cells are set to black, creating an external border
In this work, we propose the use of multiple markers to easily detectable. The remaining n × n cells are employed
handle occlusion (as in ARTag). However, we also propose for coding. Thus, we might define a marker,
the use of a color map for precisely detecting the visible
pixels, so that the virtual scene is only rendered on them. m = (w0 , w1 , ..., wn−1 ), (2)
In order to improve segmentation, we employ blue and
as a tuple composed by n binary words w of length n such
green markers, instead of classical black-and-white ones.
that
As we experimentally show, our proposal is an effective
w = (b0 , . . . , bn−1 | bi ∈ {0, 1}). (3)
method for improving current augmented reality applica-
tions such as in gaming or film industry, although not Let us also denote W as the set of all possible words of n
limited to that. bits, whose cardinal is |W| = 2n .

3
At each iteration of the algorithm, a marker is selected being wij the j-bit of the word wi , and δ is 1 if both el-
based on a stochastic process that assigns more probability ements are equal and 0 otherwise. So, T (wi ) tends to 1
to markers with a higher number of bit transitions and as the number of transitions between consecutive bits in-
whose words have not been yet added to D. If the distance creases and to 0 as the number of transitions decreases.
between the generated marker and these in D is greater For instance, the words 010110 and 000011 present values
than a minimum value τ , then it is added. Otherwise, the of T = 4/5 and T = 1/5, respectively, which are propor-
marker is rejected and a new marker is randomly selected. tional to the number of bit transitions.
The process stops when the required number of markers On the other hand, the function O(wi , D) accounts for
is achieved. the number of times the word wi appears amongst the
Because of the probabilistic nature of the algorithm, the markers in D. The idea is to reduce the probability
acceptance of new markers could be improbable or even of choosing words that have already been selected many
impossible in some cases. To guarantee the convergence times. It is defined in the interval [0, 1] as
of the algorithm, the distance threshold is initially set to
the maximum possible inter-marker distance that the dic- ( P P
δ(wj ,wi )
mi ∈D wj ∈mi
tionary can have τ 0 . Along the process, the value of τ is O(wi , D) = 1− n|D| if |D| =
6 0 .
reduced after a number of unproductive iterations ψ. The 1 otherwise
final value τ̂ (D) represents the minimum distance between (6)
any two markers in D, and it will be used as the base for The double sum counts the appearances of w amongst
error detection and correction (explained in Sect. 4). The the markers in D, while the denominator counts the total
proposed algorithm is summarized in Alg. 1. number of words in D. Thus, O(wi , D) is 1 if wi is not in
D, and tends to 0 as it appears a higher number of times.
Algorithm 1 Dictionary generation process Finally, in the first iteration (|D| = 0), the function is
D ← Ø # Reset dictionary defined as 1 so that all words have the same probability
τ ← τ 0 #Initialize target distance, see Sect. 3.4 of being selected.
% ← 0 # Reset unproductive iteration counter
while D has not desired size do 3.3 Distance calculation
Generate a new marker m #Sect. 3.2
if distance of m to elements in D is ≥ τ then As previously indicated, a marker is added to the dictio-
D ← D ∪ m # Add to dictionary nary if its distance to the markers in the dictionary is be-
%←0 low τ . The concept of distance between markers must be
else defined considering that they are printed as binary grids
% ← % + 1 # It was unproductive of n × n bits that can be observed under rotation. Then,
# maximium unproductive iteration reached ?
let us define the distance between two markers as
if % = ψ then
τ ← τ − 1 # Decrease target distance D(mi , mj ) = min { H(mi , Rk (mj )) }. (7)
%←0 k∈{0,1,2,3}
end if
end if The function H is the Hamming distance between two
end while markers, which is defined as the sum of hamming distances
between each pair of marker words. The function Rk is
an operator that rotates the marker grid k × 90 degrees in
clockwise direction. The function D is then the rotation-
3.2 Marker generation invariant Hamming distance between the markers.
Let us also define the distance of a marker to a dictio-
As previously pointed out, markers are selected using a
nary:
random process leaded by a probability distribution that
D(mi , D) = min {D(mi , mj )}, (8)
assigns a higher probability to these markers with a high mj ∈D
number of transitions and whose words are not yet present
as the distance of the marker to nearest one in the dictio-
in D. The proposed process for generating a marker con-
nary.
sists in selecting n words from W with replacement. To do
Finally, it is not only important to distinguish markers
so, each word wi ∈ W has a probability of being selected
from each other, but also to correctly identify the marker
at each iteration that is defined as:
orientation. Otherwise, pose estimation would fail. So, a
valid marker must also guarantee that the minimum dis-
T (wi )O(wi , D)
P {w = wi } = P . (4) tance to its own rotations is above τ . Thus, we define the
wj ∈W T (wj )O(wj , D) marker self-distance as
Eq. 4 defines the probability of selecting a word as the
S(mi ) = min { H(mi , Rk (mi )) }. (9)
combination of two functions. The first one, T (wi ) ∈ [0, 1], k∈{1,2,3}
is related to the number of bit transitions of the word. It
is defined as In summary, we only add a marker to the dictionary
if both S(mi ) and D(mi , D) are greater or equal than τ .
Pn−2 j+1
j=0 δ(wi , wij ) Otherwise, the marker is rejected and a new one generated.
T (wi ) = 1 − , (5) After a number of unproductive iterations ψ, the value of
n−1

4
Hamming distances
Group Quartets
90 deg 180 deg 270 deg
Q1 0000 0 0 0
Q2 1000 → 0100 → 0010 → 0001 2 2 2
Q3 1100 → 0110 → 0011 → 1001 2 4 2
Q4 0101 → 1010 4 0 4
Q5 1110 → 0111 → 1011 → 1101 2 2 2
Q6 1111 0 0 0

Figure 4: Examples of quartets for a 2×2 and 3×3 marker. Table 1: Quartet groups and quartet Hamming distances
Each arrow indicates the destination of a bit after a 90 for each rotation.
degrees clockwise rotation.

by the distance of its successive rotations to the original


τ is decreased by one so as to allow new markers to be quartet. For instance, quartet 1100 contributes to Eq. 9
added. with distances (2, 4, 2) as it rotates:
In the end, the markers of the generated dictionary have
a minimum distance between them and to themselves, τ̂ , H(1100, 0110) = 2; H(1100, 0011) = 4; H(1100, 1001) = 2.
that is the last τ employed. This value can be calculated
for any marker dictionary (manually or automatically gen- But also, if we start from quartet 0110 and rotate it suc-
erated) as: cessively, we obtain the quartets (0011 → 1001 → 1100),
that again provide the distances (2, 4, 2):
 
H(0110, 0011) = 2; H(0110, 1001) = 4; H(0110, 1100) = 2.
τ̂ (D) = min min {S(mi )}, min {D(mi , mj )} .
mi ∈D mi 6=mj ∈D
(10) In fact, there are only 6 quartet groups (shown in Table
1), thus reducing the problem considerably.
As previously indicated, calculating Sn∗ is the problem
3.4 Maximum inter-marker distance: τ 0 of obtaining the marker with highest self-distance, and we
The proposed algorithm requires an initial value for the have turned this problem into assigning quartets groups to
parameter τ 0 . If one analyzes the first iteration (when the C quartets of a maker. It can be seen that it is in fact
the dictionary is empty), it is clear that the only distance a multi-objective optimization, where each quartet group
to consider is the self distance (Eq. 9), since the distance Qi is a possible solution and the objectives to maximize
to other markers is not applicable. So, the maximum self are the distances for each rotation. If the Pareto front
distance for markers of size n×n (let us denote it by Sn∗ ) is is obtained, it can be observed that the groups Q3 and
the maximum distance that a dictionary can have for these Q4 dominates the rest of solutions. Thus, the problem is
type of markers. This section explains how to determine simplified, again, to assign Q3 and Q4 to the C quartets
Sn∗ , which is equivalent to find the marker of size n × n of a marker.
with highest self-distance. From a brief analysis, it can be deduced that Sn∗ is ob-
If we analyze the path of the bits when applying 90 de- tained by assigning the groups {Q3 , Q3 , Q4 } (in this order)
grees rotations to a marker, it is clear that any bit (x, y) repeatedly until completing the C quartets. For instance,

changes its position to another three locations until it re- the simplest marker is a 2 × 2 marker (C = 1), Sn = 2 and
turns to its original position (see Figure 4). It can be is obtained by assigning Q3 . For a 3 × 3 marker (C = 2),

understood, that the Hamming distance provided by a Sn = 4 which is obtained by assigning Q3 twice. For a

marker bit to Eq. 9 is only influenced by these other three 4 × 4 marker (C = 4), Sn = 10 obtained by assigning the
bits. So, let us define a quartet as the set composed by groups {Q3 , Q3 , Q4 , Q3 }. This last case is showed in detail
these positions: {(x, y), (n − y − 1, x), (n − x − 1, n − y − in Table 2.
1), (y, n − x − 1)}. Therefore, for a generic marker with C quartets, the
In general, a marker of size n × n, has a total of C value Sn∗ follows the rule:
quartets that can be calculated as: 
4C


Sn = 2 (12)
 2
n 3
C= , (11)
4 Then, we employ the value:
where b·c represents the floor function. If n is odd, the τ 0 = Sn∗ , (13)
central bit of the marker constitutes a quartet by itself
which does not provide extra distance to S. as starting point for our algorithm.
If a quartet is expressed as a bit string, a 90 degrees
rotation can be obtained as a circular bit shift operation.
For instance, the quartet 1100, becomes (0110 → 0011 → 4 Marker detection and error
1001) in successive rotations. In fact, for the purpose of correction
calculating Sn∗ , these four quartets are equivalent, and we
will refer to them as a quartet group Qi . It can be seen This section explains the steps employed to automatically
from Eq. 9, that the contribution of any quartet is given detect the markers in an image (Fig. 5(a)). The process

5
Hamming distances
Quartet Group
90 degrees 180 degrees 270 degrees
using the Suzuki and Abe [36] algorithm. It produces
1 Q3 2 4 2 the set of image contours, most of which are irrelevant
2 Q3 2 4 2 for our purposes (see Figure 5(c)). Then, a polygo-
3 Q4 4 0 4
4 Q3 2 4 2
nal approximation is performed using the Douglas-
Total distances 10 12 10 Peucker [37] algorithm. Since markers are enclosed
Vs∗ min(10, 12, 10) = 10 in rectangular contours, these that are not approx-
imated to 4-vertex polygons are discarded. Finally,
Table 2: Quartet assignment for a 4 × 4 marker (C = we simplify near contours leaving only the external
4) to obtain Sn∗ . It can be observed as the sequence ones. Figure 5(d) shows the resulting polygons from
{Q3 , Q3 , Q4 } is repeated until filling all the quartets in this process.
the marker.
• Marker Code extraction: The next step consists in an-
alyzing the inner region of these contours to extract
its internal code. First, perspective projection is re-
moved by computing the homography matrix (Fig.
5(e)). The resulting image is thresholded using the
Otsu’s method [38], which provides the optimal im-
age threshold value given that image distribution is
bimodal (which holds true in this case). Then, the bi-
narized image is divided into a regular grid and each
element is assigned the value 0 or 1 depending on the
values of the majority of pixels into it (see Fig. 5(e,f)).
A first rejection test consists in detecting the presence
of the black border. If all the bits of the border are
zero, then the inner grid is analyzed using the method
described below.

• Marker identification and error correction: At this


point, it is necessary to determine which of the marker
candidates obtained actually belongs to the dictio-
nary and which are just part of the environment.
Once the code of a marker candidate is extracted,
four different identifiers are obtained (one for each
possible rotation). If any of them is found in D, we
consider the candidate as a valid marker. To speed
up this process, the dictionary elements are sorted
Figure 5: Image Process for automatic marker detection. as a balanced binary tree. To that aim, markers are
(a) Original Image. (b) Result of applying local threshold- represented by the integer value obtained by concate-
ing. (c) Contour detection. (d) Polygonal approximation nating all its bits. It can be deduced then, that this
and removal of irrelevant contours. (e) Example of marker process has a logarithmic complexity O(4 log2 (|D|)),
after perspective transformation. (f) Bit assignment for where the factor 4 indicates that it is necessary one
each cell. search for each rotation of the marker candidate.
If no match is found, the correction method can be
is comprised by several steps aimed at detecting rectan- applied. Considering that the minimum distance be-
gles and extracting the binary code from them. For that tween any two markers in D is τ̂ , an error of at most
purpose, we take as input a gray-scale image. While the b(τ̂ −1)/2c bits can be detected and corrected. There-
image analysis is not a novel contribution, the marker fore, our marker correction method consists in calcu-
code identification and error correction is a new approach lating the distance of the erroneous marker candidate
specifically designed for the generated dictionaries of our to all the markers in D (using Eq. 8). If the distance
method. Following are described the steps employed by is equal or smaller than b(τ̂ − 1)/2c, we consider that
our system. the nearest marker is the correct one. This process,
though, presents a linear complexity of O(4|D|), since
• Image segmentation: Firstly, the most prominent each rotation of the candidate has to be compared to
contours in the gray-scale image are extracted. Our the entire dictionary. Nonetheless, it is a highly par-
initial approach was employing the Canny edge detec- allelizable process that can be efficiently implemented
tor [35], however, it is very slow for our real-time pur- in current computers.
poses. In this work, we have opted for a local adaptive
thresholding approach which has proven to be very Please note that, compared to the dictionaries of
robust to different lighting condition (see Fig. 5(b)). ARToolKitPlus (which can not correct errors) and
ARTag ( only capable of recovering errors of two bits),
• Contour extraction and filtering: Afterward, a con- our approach can correct errors of b(τ̂ − 1)/2c bits.
tour extraction is performed on the thresholded image For instance, for a dictionary generated in the exper-

6
imental section with 6 × 6 bits and 30 markers, we
obtained τ̂ = 12. So, our approach can correct 5 bits
of errors in this dictionary. Additionally, we can gen-
erate markers with more bits which leads to a larger
τ̂ , thus increasing the correction capabilities. Actu-
ally, our detection and correction method is a general
framework that can be used with any dictionary (in-
cluding ARToolKitPlus and ARTag dictionaries). In
fact, if our method is employed with the ARTag dic-
tionary of 30 markers, for instance, we could recover
from errors of 5 bits, instead of the 2 bits they can Figure 6: Occlusion mask example. (a) Hue component of
recover from. Fig. 1(a) with the detected markers (b) Occlusion mask:
white pixels represent visible regions of the board.
• Corner refinement and pose estimation: Once a
marker has been detected, it is possible to estimate
the board image pixels p into the map space
its pose respect to the camera by iteratively minimiz-
ing the reprojection error of the corners (using for in- pm = Hm p.
stance the Levenberg-Marquardt algorithm[39, 40]).
While, many approaches have been proposed for cor- Then, the corresponding cell pc is obtained by discretizing
ner detection [41, 42, 43], we have opted for doing a the result to its nearest value pc = [pm ]. Let us denote by
linear regression of the marker side pixels to calculate Ic the set of image board pixels that maps onto cell c.
their intersections. This approach was also employed If the grid size of M is relatively small compared to
in ARTag [11], ARToolKit [10] and ARToolKitPlus the size of the board in the images, Ic will contain pix-
[24]. els of the two main board colors. It is assumed then,
that the distribution of the colors in each cell can be
modeled by a mixture of two Gaussians [44], using the
5 Occlusion detection Expectation-Maximization algorithm [45] to obtain its pa-
rameters. Therefore, the pdf of the color u in a cell c can
be approximated by the expression
Detecting a single marker might fail for different reasons
such as poor lighting conditions, fast camera movement, X
P (u, c) = πk Nkc (u|µck , Σkc ), (14)
occlusions, etc. A common approach to improve the ro-
k=1,2
bustness of a marker system is the use of marker boards. A
marker board is a pattern composed by multiple markers where Nkc (u|µck , Σkc ) is the k-th Gaussian distribution and
whose corner locations are referred to a common reference πkc is the mixing coefficient, being
system. Boards present two main advantages. First, since
X
there are more than one marker, it is less likely to lose πkc = 1.
all of them at the same time. Second, the more mark- k=1,2
ers are detected, the more corner points are available for
computing the camera pose, thus, the pose obtained is less In an initial step, the map must be created from a view
influenced by noise. Figure 1(a) shows the robustness of of the board without occlusion. In subsequent frames,
a marker board against partial occlusion. color segmentation is done analyzing if the probability of
Based on the marker board idea, a method to overcome a pixel is below a certain threshold θc . However, to avoid
the occlusion problem in augmented reality applications the hard partitioning imposed by the discretization, the
(i.e., virtual objects rendered on real objects as shown in probability of each pixel is computed as the weighted av-
Fig. 1(c,d)) is proposed. Our approach consists in defining erage of the probabilities obtained by the neighbor cells in
a color map of the board that is employed to compute an the map:
occlusion mask by color segmentation. P
c∈H(pc ) w(pm , c)P (pu , c)
Although the proposed method is general enough to P̄ (p) = P , (15)
work with any combinations of colors, we have opted in c∈H(pc ) w(pm , c)
our tests to replace black and white markers by others
where pu is the color of the pixel, H(pc ) ⊂ M are the
with higher chromatic contrast so as to improve color seg-
nearest neighbor cells of pc , and
mentation. In our case, blue and green have been selected.
Additionally we have opted for using only the hue compo- w(pm , c) = (2 − |pm − c|1 )2 (16)
nent of the HSV color model, since we have observed that
it provides the highest robustness to lighting changes and is a weighting factor based on the L1 -norm between the
shadows. mapped value pm and the center of the cell c. The value
Let us define the color map M as a nc × mc grid, where 2 represents the maximum possible L1 distance between
each cell c represents the color distribution of the pixels of neighbors. As a consequence, the proposed weighting
a board region. If the board pose is properly estimated, value is very fast to compute and provides good results
it is possible to compute the homography Hm that maps in practice.

7
Considering that the dimension of the observed board Candidates detection 8.17 ms/image
Marker identification 0.17 ms/candidate
in the image is much bigger than the number of cells in Error correction 0.71 ms/candidate
the color map, neighbor pixels in the image are likely to Total time (|D| = 24) 11.08 ms/image
have similar probabilities. Thus, we can speed up com-
putation by downsampling the image pixels employed for Table 3: Average processing times for the different steps
calculating the mask and assigning the same value to its of our method.
neighbors.
Figure 6 shows the results of the detection and segmen-
tation obtained by our method using as input the hue
channel and a downsampling factor of 4. As can be seen,
the occluding hand is properly detected by color segmen-
tation.
Finally, it must considered that the lighting conditions
might change, thus making it necessary to update the map.
This process can be done with each new frame, or less
frequently to avoid increasing the computing time exces-
sively. In order to update the color map, the probability
distribution of the map cells are recalculated using only
the visible pixels of the board. The process only applies
to cells with a minimum number of visible pixels γc , i.e.,
only if |Ic | > γc .

6 Experiments and results


Figure 7: Inter-marker distances of ARTag dictionaries
This section explains the experimentation carried out to and ours (Eq. 10) for an increasing number of markers.
test our proposal. First, the processing times required ArUco values correspond to the mean of 30 runs of our al-
for marker detection and correction are analyzed. Then, gorithm (with and without considering reflection). Higher
the proposed method is compared with the state-of-the- distances reduces the possibility of inter-marker confusion
art systems in terms of inter-marker distances, number of in case of error.
bit transitions, robustness against noise and vertex jitter.
Finally, an analysis of the occlusion method proposed is
made. this is an off-line process done only once, we consider that
As already indicated, this work is available under the the computing times obtained are appropriated for real
BSD license in the ArUco library [13]. applications. It must be considered, though, that gen-
erating the first elements of the dictionary is more time
consuming because the high inter-distances imposed. As
6.1 Processing Time τ decreases, the computation speed increases.
Processing time is a crucial feature in many real time fidu- Finally, the time required for creating the color map and
cial applications (such as augmented reality). The marker the occlusion mask in the sequences reported in Sect. 6.6,
detection process of Sect. 4 can be divided in two main are 170 and 4 ms, respectively. In these sequences, the
steps: finding marker candidates and analyzing them to board has an average dimension of 320 × 240 pixels.
determine if they actually belong to the dictionary.
The detection performance of our method has been 6.2 Analysis of Dictionary distances
tested for a dictionary size of |D| = 24. The processing
time for candidate detection, marker identification and er- The inter-marker confusion rate is related to the distances
ror correction was measured for several video sequences. between the markers in the dictionary τ̂ (D) (Eq. 10). The
The tests were performed using a single core of a system higher the distance between markers, the more difficult is
equipped with an Intel Core 2 Quad 2.40 Ghz processor, to confuse them in case of error. The marker dictionary
2048 MB of RAM and Ubuntu 12.04 as the operating sys- proposed by Fiala in the ARTag [11] system improves the
tem with a load average of 0.1. Table 3 summarizes the distances of other systems such as ARToolKitPlus [24] or
average obtained results for a total of 6000 images with BinARyID [26]. His work recommends using its dictionary
resolution of 640 × 480 pixels. The sequences include in- (of 6 × 6 markers) in a specific order so as to maximize the
door recording with several markers and marker boards distance.
arranged in the environment. We have compared the dictionaries generated with our
In addition, we have evaluated the computing time method to these obtained by incrementally adding the first
required for generating dictionaries with the proposed 1000 recommended markers of ARTag. For our algorithm,
method for 6 × 6 markers. The value of τ was reduced the initial distance employed is τ 0 = 24 (Eq. 13), which
after ψ = 5000 unproductive iterations. The computing has been decremented by one after ψ = 5000 unproduc-
times for dictionaries of sizes 10, 100 and 1000 elements are tive iterations. Since ARTag considers the possibility of
approximately 8, 20 and 90 minutes, respectively. Since marker reflection (i.e. markers seen in a mirror), we have

8
Figure 8: Standard deviations of inter-marker distances Figure 9: Number of bit transitions of ARTag dictionaries
obtained by our method in Figure 7 (with and without and ours for an increasing number of markers. Higher
considering reflection). number of transitions reduces the possibility of confusion
with environment elements.

also tested our method including the reflection condition.


However, we consider this is as an uncommon case in fidu-
cial marker applications.
Figure 7 shows the values τ̂ (D) for the dictionaries as
their size increases. The results shown for our method
represent the average values of 30 runs of our algorithm.
As can be seen, our system outperforms the ARTag dic-
tionaries in the majority of the cases and obtains the same
results in the worst ones. Even when considering reflec-
tion, our method still outperforms the ARTag results in
most cases. ARToolKitPlus system has not been com-
pared since it does not include a recommended marker
order as ARTag. However, the minimum distance in AR-
ToolKitPlus considering all the BCH markers is 2, which
is a low value in comparison to our method, or ARTag.
Figure 8 shows standard deviations for 30 runs of the
tests shown in Figure 7. It can be observed that there are Figure 10: False negative rates for different levels of addi-
two patterns in the deviation results: (i) peaks which cor- tive Gaussian noise.
respond to the slopes in Figure 7, and (ii) intervals with-
out deviation where the inter-marker distance remains the
same in all runs. As can be observed, the higher deviations It can be observed that our approach generates markers
occurs at the transitions of τ̂ (D) in Figure 7. It must be with more transitions than ARTag. Also, the number of
noted, though, that in most of the cases, the maximun de- transitions does not decrease drastically as the number of
viation is 0.5. Just in the generation of the first markers, markers selected grows. The mean bit transitions for all
the deviation ascends up to 1.4 and 0.7 (with and without the BCH markers in ARToolKitPlus is 15.0 which is also
considering reflection respectively). below our method.

6.3 Evaluation of the bit transitions 6.4 Error detection


Our marker generation process encourages markers with The false positive and false negative rates are related to
a high number of bit transitions, thus, reducing the pos- the coding scheme and the number of redundant bits em-
sibility of confusion with environment elements. Figure ployed for error detection and correction. In our approach,
9 shows the number of bit transitions of the dictionaries however, false positives are not detected by checking re-
generated in the previous section with our method and dundant bits but analyzing the distance to the dictionary
with ARTag. The number of transitions are obtained as markers. A comparison between the correction capabili-
the sum of the transitions for each word in the marker. ties of ARToolKitPlus, ARTag and our method has been
As in the previous case, our results represent the average performed by comparing the false negative rates from a
values obtained for 30 different marker dictionaries gen- set of 100 test images for each system. The images showed
erated with our algorithm. It must be indicated that the markers of each system from different distances and view-
maximum standard deviation obtained in all cases was 1.7. points. The images were taken in the same positions for

9
Figure 11: Image examples from video sequences used to test the proposed fiducial marker system. First row shows
cases of correct marker detection. Second row shows cases where false positives have not been detected.

each of the tested systems. Different levels of additive


Gaussian noise have been applied to the images to mea-
sure the robustness of the methods. Figure 10 shows the
false negative rates obtained as a function of the noise
level.
As can be observed, the proposed method is more robust
against high amounts of noise than the rest. ARToolKit-
Plus false negative rate increases sharply for high levels of
noise. ARTag presents a higher sensitivity for low levels of
noise, however it is nearly as robust as our method for high
levels. Figure 11 shows some examples of the sequences
used to test the proposed system. It must be indicated,
though, that no false positives have been detected by any
method in the video sequences tested during our experi-
mentation.
Figure 12: Vertex jitter measures for different marker sys-
6.5 Vertex jitter tems.
An important issue in many augmented reality applica-
tions is the vertex jitter, which refers to the noise in the
localization of the marker corner. Errors in the location has been repeated both for black-and-white markers and
of corners are propagated to the estimation of the camera green-and-blue markers. Please note that the hue chan-
extrinsic parameters, leading to an unpleasant user expe- nel employed for detecting the latter presents less contrast
riences. This section analyzes the obtained vertex jitter of than the black-and-white markers (see Fig. 6(a)). Thus,
i) the result of the polygonal approximation (see Sect. 4), evaluating the different corner refinement systems is espe-
ii) our method implemented in the ArUco library, iii) the cially relevant in that case.
ARToolKitPlus library and iv) the ARTag library. The Figure 12 shows the results obtained as a box plot [46]
first method is the most basic approach (i.e., no corner for both, black-and-white markers and green-and-blue
refinement) and is applied to analyze the impact of the markers. The lower and upper ends of the whiskers rep-
other methods. Then, since the techniques used by AR- resent the minimum and maximum distribution values re-
ToolKitPlus, ARTag and our method are based on the spectively. The bottom and top of the boxes represent the
same principle (linear regression of marker side pixels), it lower and upper quartiles, while the middle band repre-
is expected that they obtain similar results. sents the median.
For the experiments, the camera has been placed at It can be observed that the jitter level is lower in black-
a fixed position respect to a set of markers and several and-white markers than in green-and-blue ones. Nonethe-
frames have been acquired. Then, the camera has been less, it is small enough to provide a satisfactory user’s
moved farther away from the marker thus obtaining sev- experience. As expected, not performing any refinement
eral view points at different distances. The standard de- produces higher deviations. It can also be noted that our
viation of the corners locations estimated by each method method obtains similar results than these obtained by AR-
has been measured in all the frames. The experiment ToolKitPlus and ARTag libraries. We consider that dif-

10
ferences obtained between the three methods can be at-
tributed to implementation details.

6.6 Analysis of Occlusion


Along with the marker system described, a method to
overcome the occlusion problem in augmented reality ap-
plications has been proposed. First, we employ marker
boards so as to increase the probability of seeing complete
markers in the presence of occlussion. Then, we propose
using a color map to calculate an occlusion mask of the
board pixels. We have designed two set of experiments to
validate our proposal. Firstly, it has been analyzed how
different occlusion levels affects to the estimation of the
camera pose. While ARTag introduces the idea of mul-
tiple markers, no analysis of occlussion is made in their
work. Secondly, a qualitative evaluation of the occlusion Figure 13: Rotation error for different degrees of marker
mask generated has been performed under different light- board occlusion and for three camera distances.
ing conditions. It must be noticed that the estimation
of the occlusion mask is not present in any of the previ-
ous works (ARTag, ARToolKit or ARToolKitPlus), thus a
comparison with them is not feasible.
For our tests, the parameters

θc = 10−4 , γc = 50, nc = mc = 5,

have been employed, providing good results in a wide


range of sequences.

6.6.1 Occlusion tolerance


In this experiments we aim at analyzing the tolerance to
occlusion of our system. To do so, a video sequence is
recorded showing a board composed by 24 markers with-
out occlusion so that all markers are correctly detected.
Assuming gaussian noise, the ground truth camera pose
Figure 14: Traslation error for different degrees of marker
is assumed to be the average in all the frames. Then, we
board occlusion and for three camera distances.
have artificially simulated several degrees of oclussion by
randomly removing a percentage of the detected markers
in each frame and computing the pose with the remaining consist in a piece of virtual floor and a virtual character
ones. Thus, the deviation from the ground truth at each doing some actions around. It can be observed that the
frame is the error introduced by occlusion. This process user hand and other real objects are not occluded by vir-
has been repeated for three distances from the board to tual objects since they have different tonalities than the
analyze the impact of distance in the occlussion handling. board and thus can be recognized by our method.
The 3D rotation error is computed using the inner prod- Nonetheless, as any color-based method, it is sensitive
uct of unit quaterions[47] to lighting conditions, i.e., too bright or too dark regions
φ(q1 , q2 ) = 1 − |q1 · q2 | makes it impossible to detect the markers nor to obtain
a precise occlusion mask. Fig. 16 shows an example of
which gives values in the range [0, 1]. The traslation error scene where a lamp has been placed besides the board.
has been obtained using the Euclidean distance. It can be seen that there is a bright spot saturating the
Figures 13-14 show the obtained results for different lower right region of the board, where markers can not be
camera distances to the marker board. It can be observed detected. Additionally, because of the light saturation, the
that, both in rotation and traslation, the error originated chromatic information in that region (hue channel) is not
by the occlusion are insignificant until the occlusion de- reliable, thus producing segmentation errors in the board.
gree is above 85%. It can also be noted that the error
increases as camera is farther from the board.
7 Conclusions
6.6.2 Qualitative evaluation of the occlusion
This paper has proposed a fiducial marker system specially
mask
appropriated for camera localization in applications such
Figure 15 shows some captures from a user session using as augmented reality applications or robotics. Instead of
the green-and-blue marker board. The augmented objects employing a predefined set of markers, a general method

11
Figure 15: Examples of users’ interaction applying the occlusion mask. Note that hands and other real objects are
not occluded by the virtual character and the virtual floor texture.

Figure 16: Example of occlusion mask errors due to light saturation. (a) Original input image. (b) Markers detected.
(c) Occlusion mask. (d) Augmented scene.

to generate configurable dictionaries in size and number References


of bits has been proposed. The algorithm relies on a prob-
abilistic search maximizing two criteria: the inter-marker [1] R. T. Azuma, A survey of augmented reality, Pres-
distances and the number of bit transitions. Also, the ence 6 (1997) 355–385.
theoretical maximum inter-marker distance that a dictio-
[2] H. Kato, M. Billinghurst, Marker tracking and HMD
nary with square makers can have has been derived. The
calibration for a Video-Based augmented reality con-
paper has also proposed an automatic method to detect
ferencing system, Augmented Reality, International
the markers and correct possible errors. Instead of using
Workshop on 0 (1999) 85–94.
redundant bits for error detection and correction, our ap-
proach is based on a search on the generated dictionary. [3] V. Lepetit, P. Fua, Monocular model-based 3d track-
Finally, a method to overcome the occlusion problem in ing of rigid objects: A survey, in: Foundations and
augmented reality applications has been presented: a color Trends in Computer Graphics and Vision, 2005, pp.
map employed to calculate the occlusion mask. 1–89.

[4] B. Williams, M. Cummins, J. Neira, P. Newman,


I. Reid, J. Tardós, A comparison of loop closing tech-
The experiments conducted have shown that the dictio- niques in monocular slam, Robotics and Autonomous
naries generated with our method outperforms state-of- Systems.
the-art systems in terms of inter-marker distance, number
of bit transitions and false positive rate. Finally, this work [5] W. Daniel, R. Gerhard, M. Alessandro, T. Drum-
has been set publicly available in the ArUco library [13]. mond, S. Dieter, Real-time detection and tracking for
augmented reality on mobile phones, IEEE Transac-
tions on Visualization and Computer Graphics 16 (3)
(2010) 355–368.
Acknowledgments. We are grateful to the finan-
cial support provided by Science and Technology Min- [6] G. Klein, D. Murray, Parallel tracking and map-
istry of Spain and FEDER (projects TIN2012-32952 and ping for small ar workspaces, in: Proceedings of the
BROCA). 2007 6th IEEE and ACM International Symposium

12
on Mixed and Augmented Reality, ISMAR ’07, IEEE [18] J. Rekimoto, Y. Ayatsuka, Cybercode: designing
Computer Society, Washington, DC, USA, 2007, pp. augmented reality environments with visual tags, in:
1–10. Proceedings of DARE 2000 on Designing augmented
reality environments, DARE ’00, ACM, New York,
[7] K. Mikolajczyk, C. Schmid, Indexing based on scale NY, USA, 2000, pp. 1–10.
invariant interest points., in: ICCV, 2001, pp. 525–
531. [19] M. Rohs, B. Gfeller, Using camera-equipped mobile
[8] D. G. Lowe, Object recognition from local scale- phones for interacting with real-world objects, in:
invariant features, in: Proceedings of the Interna- Advances in Pervasive Computing, 2004, pp. 265–271.
tional Conference on Computer Vision-Volume 2 -
[20] M. Kaltenbrunner, R. Bencina, reactivision: a
Volume 2, ICCV ’99, IEEE Computer Society, Wash-
computer-vision framework for table-based tangible
ington, DC, USA, 1999, pp. 1150–.
interaction, in: Proceedings of the 1st international
[9] P. Bhattacharya, M. Gavrilova, A survey of landmark conference on Tangible and embedded interaction,
recognition using the bag-of-words framework, in: In- TEI ’07, ACM, New York, NY, USA, 2007, pp. 69–74.
telligent Computer Graphics, Vol. 441 of Studies in
Computational Intelligence, Springer Berlin Heidel- [21] D. Claus, A. Fitzgibbon, Reliable automatic calibra-
berg, 2013, pp. 243–263. tion of a marker-based position tracking system, in:
Workshop on the Applications of Computer Vision,
[10] H. Kato, M. Billinghurst, Marker tracking and hmd 2005, pp. 300–305.
calibration for a video-based augmented reality con-
ferencing system, in: Proceedings of the 2nd IEEE [22] M. Fiala, Comparing artag and artoolkit plus fiducial
and ACM International Workshop on Augmented Re- marker systems, in: IEEE International Workshop on
ality, IWAR ’99, IEEE Computer Society, Washing- Haptic Audio Visual Environments and their Appli-
ton, DC, USA, 1999, pp. 85–. cations, 2005, pp. 147–152.
[11] M. Fiala, Designing highly reliable fiducial markers, [23] J. Rekimoto, Matrix: A realtime object identifica-
IEEE Trans. Pattern Anal. Mach. Intell. 32 (7) (2010) tion and registration method for augmented reality,
1317–1324. in: Third Asian Pacific Computer and Human Inter-
[12] D. Schmalstieg, A. Fuhrmann, G. Hesina, action, July 15-17, 1998, Kangawa, Japan, Proceed-
Z. Szalavári, L. M. Encarnaçäo, M. Gervautz, ings, IEEE Computer Society, 1998, pp. 63–69.
W. Purgathofer, The studierstube augmented reality
project, Presence: Teleoper. Virtual Environ. 11 (1) [24] D. Wagner, D. Schmalstieg, Artoolkitplus for pose
(2002) 33–54. tracking on mobile devices, in: Computer Vision
Winter Workshop, 2007, pp. 139–146.
[13] R. Munoz-Salinas, S. Garrido-Jurado, Aruco li-
brary, http://sourceforge.net/projects/aruco/, [25] S. Lin, D. Costello, Error Control Coding: Funda-
[Online; accessed 01-December-2013] (2013). mentals and Applications, Prentice Hall, 1983.

[14] K. Dorfmller, H. Wirth, Real-time hand and head [26] D. Flohr, J. Fischer, A Lightweight ID-Based Exten-
tracking for virtual environments using infrared sion for Marker Tracking Systems, in: Eurographics
beacons, in: in Proceedings CAPTECH98. 1998, Symposium on Virtual Environments (EGVE) Short
Springer, 1998, pp. 113–127. Paper Proceedings, 2007, pp. 59–64.
[15] M. Ribo, A. Pinz, A. L. Fuhrmann, A new optical
[27] X. Zhang, S. Fronz, N. Navab, Visual marker de-
tracking system for virtual and augmented reality ap-
tection and decoding in ar systems: A comparative
plications, in: In Proceedings of the IEEE Instrumen-
study, in: Proceedings of the 1st International Sym-
tation and Measurement Technical Conference, 2001,
posium on Mixed and Augmented Reality, ISMAR
pp. 1932–1936.
’02, IEEE Computer Society, Washington, DC, USA,
[16] V. A. Knyaz, R. V. Sibiryakov, The development of 2002, pp. 97–.
new coded targets for automated point identification
and non-contact surface measurements, in: 3D Sur- [28] W. Friedrich, D. Jahn, L. Schmidt, Arvika - aug-
face Measurements, International Archives of Pho- mented reality for development, production and ser-
togrammetry and Remote Sensing, Vol. XXXII, part vice, in: DLR Projektträger des BMBF für Infor-
5, 1998, pp. 80–85. mationstechnik (Ed.), International Status Confer-
ence - Lead Projects Human-Computer Interaction
[17] L. Naimark, E. Foxlin, Circular data matrix fiducial (Saarbrücken 2001), DLR, Berlin, 2001, pp. 79–89.
system and robust image processing for a wearable
vision-inertial self-tracker, in: Proceedings of the 1st [29] S. Zollmann, G. Reitmayr, Dense depth maps from
International Symposium on Mixed and Augmented sparse models and image coherence for augmented
Reality, ISMAR ’02, IEEE Computer Society, Wash- reality, in: 18th ACM symposium on Virtual reality
ington, DC, USA, 2002, pp. 27–. software and technology, 2012, pp. 53–60.

13
[30] M. o. Berger, Resolving occlusion in augmented re- [43] S. M. Smith, J. M. Brady, Susan - a new approach to
ality: a contour based approach without 3d recon- low level image processing, International Journal of
struction, in: In Proceedings of CVPR (IEEE Confer- Computer Vision 23 (1995) 45–78.
ence on Computer Vision and Pattern Recognition),
Puerto Rico, 1997, pp. 91–96. [44] A. Sanjeev, R. Kannan, Learning mixtures of arbi-
trary gaussians, in: Proceedings of the thirty-third
[31] J. Schmidt, H. Niemann, S. Vogt, Dense disparity annual ACM symposium on Theory of computing,
maps in real-time with an application to augmented STOC ’01, ACM, New York, NY, USA, 2001, pp.
reality, in: Proceedings of the Sixth IEEE Work- 247–257.
shop on Applications of Computer Vision, WACV
[45] A. P. Dempster, N. M. Laird, D. B. Rubin, Max-
’02, IEEE Computer Society, Washington, DC, USA,
imum likelihood from incomplete data via the EM
2002, pp. 225–.
algorithm, Journal of the Royal Statistical Society.
[32] A. Fuhrmann, G. Hesina, F. Faure, M. Gervautz, Series B (Methodological) 39 (1) (1977) 1–38.
Occlusion in collaborative augmented environments,
[46] D. F. Williamson, R. A. Parker, J. S. Kendrick, The
Tech. Rep. TR-186-2-98-29, Institute of Computer
box plot: a simple visual method to interpret data.,
Graphics and Algorithms, Vienna University of Tech-
Ann Intern Med 110 (11).
nology, Favoritenstrasse 9-11/186, A-1040 Vienna,
Austria (Dec. 1998). [47] D. Q. Huynh, Metrics for 3d rotations: Comparison
and analysis, J. Math. Imaging Vis. 35 (2) (2009)
[33] V. Lepetit, M. odile Berger, A semi-automatic 155–164.
method for resolving occlusion in augmented reality,
in: In Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, 2000, pp. 225–
230.

[34] R. Radke, Image change detection algorithms: a sys-


tematic survey, Image Processing, IEEE Transactions
on 14 (3) (2005) 294–307.

[35] J. Canny, A computational approach to edge detec-


tion, IEEE Trans. Pattern Anal. Mach. Intell. 8 (6)
(1986) 679–698.

[36] S. Suzuki, K. Be, Topological structural analysis of


digitized binary images by border following, Com-
puter Vision, Graphics, and Image Processing 30 (1)
(1985) 32–46.

[37] D. H. Douglas, T. K. Peucker, Algorithms for the re-


duction of the number of points required to represent
a digitized line or its caricature, Cartographica: The
International Journal for Geographic Information and
Geovisualization 10 (2) (1973) 112–122.

[38] N. Otsu, A threshold selection method from gray-level


histograms, IEEE Transactions on Systems, Man and
Cybernetics 9 (1) (1979) 62–66.

[39] D. W. Marquardt, An algorithm for Least-Squares


estimation of nonlinear parameters, SIAM Journal on
Applied Mathematics 11 (2) (1963) 431–441.

[40] R. Hartley, A. Zisserman, Multiple View Geometry in


Computer Vision, 2nd Edition, Cambridge University
Press, New York, NY, USA, 2003.

[41] C. Harris, M. Stephens, A combined corner and edge


detector, in: In Proc. of Fourth Alvey Vision Confer-
ence, 1988, pp. 147–151.

[42] W. Förstner, E. Gülch, A Fast Operator for Detection


and Precise Location of Distinct Points, Corners and
Centres of Circular Features (1987).

14

View publication stats

You might also like