Three Dimensional CityGML Building Models in Mobile Augmented Reality A Smartphone Based Pose Tracking System
Three Dimensional CityGML Building Models in Mobile Augmented Reality A Smartphone Based Pose Tracking System
To cite this article: Christoph Blut & Jörg Blankenbach (2021) Three-dimensional CityGML
building models in mobile augmented reality: a smartphone-based pose tracking system,
International Journal of Digital Earth, 14:1, 32-51, DOI: 10.1080/17538947.2020.1733680
1 Introduction
Geo-referenced virtual 3D models increasingly represent a valuable addition to traditional 2D geos-
patial data, since they provide more detailed visualizations and an improved understanding of spatial
contexts (Herbert and Chen 2015). The Geography Markup Language (GML)-based XML encoding
schema City Geography Markup Language (CityGML) is an open standardized semantic infor-
mation model with a modular structure and a level of detail system that satisfies the increasing
need for storing and exchanging virtual 3D city models (Gröger et al. 2012). Specifically, the rep-
resentation of semantic and topological properties distinguishes CityGML from pure graphical 3D
city models and enables thematic and topological queries and analyses. Typical use cases are
large-scale solar potential analyses, shadow analyses, disaster analyses and more using the CityGML
Application Domain Extension (ADE) (Gröger et al. 2012). An ADE augments the base data model
with use case-specific concepts, as, for example, the CityGML Energy ADE, to perform energy simu-
lations (Agugiaro et al. 2018) or the CityGML Noise ADE, to perform noise simulations (Biljecki,
Kumar, and Nagel 2018). Therefore, the list of CityGML use cases is extensive, as shown by Biljecki
et al. (2015), naming ∼30 use cases and 100 applications for CityGML and virtual 3D models in gen-
eral, in their state-of-the-art review. Unfortunately, most of these are employed only on desktop per-
sonal computers.
With the increasing hardware capabilities of low-cost off-the-shelf mobile devices and growing
popularity of mobile augmented reality (AR), we see the potential for the geospatial domain, specifi-
cally with geo-referenced visualizations (e.g. points of interest, navigational paths or city objects,
such as buildings) and analyses with CityGML. Using AR, a key advantage is the first-person per-
spective, allowing users to view the information in a much more natural way, without the need of
imaginarily transferring virtual objects from a 2D screen into 3D reality. Schall et al. (2011) name
some concrete examples for CityGML in AR:
. Real estate and planning offices could offer their clients the possibility to visualize planned build-
ings on parcels of land. Clients could inspect the virtual 3D building models freely on-site and
easily compare colours, sizes, look and feel or overall integration into the cityscape.
. Physical buildings could be augmented to enable the visualization of hidden building parts, such
as cables, pipes or beams.
. Tourist information centres could offer tourists visual-historic city tours by visualizing historic
buildings on-site and displaying additional information and facts about these and the location.
One of the challenges of realizing a mobile AR system aimed at these use cases is the 3D regis-
tration in real time, i.e. accurately aligning the virtual 3D models with reality, so they seamlessly
are integrated into the scene or overlay their physical counterparts as closely as possible. The esti-
mation and synchronization of translation and rotation in real time is, generally, referred to as
pose tracking (Schmalstieg and Höllerer 2016), which is possible in a local and a global reference
frame. In a local reference frame, poses, for example, can be tracked relative to an arbitrary starting
point, in a global reference frame, poses are determined in a global reference system, for example, in
the World Geodetic System 1984 (WGS84).
We propose that, next to models for visualizations and analyses, CityGML’s geometric-topologi-
cal and thematic data provide the required information for pose tracking in a global reference frame.
This has multiple advantages, since, on the one side, the same data can be used for on-device render-
ing and pose tracking and, on the other side, a common global reference system eases the infor-
mation exchange between participating decision-makers, for example, in city planning.
In this paper, we introduce a proof-of-concept CityGML-based pose tracking system, comprising
optical and inertial sensor methods, for AR. Followed by the introduction in Section 1, we discuss
related work focussing on similar AR projects and their pose tracking frameworks in Section 2. In
Section 3, we describe our approach to pose tracking with CityGML, specifically focussing on optical
pose estimation with doors. We describe how we extract the required 3D information from CityGML
data and 2D information from camera images and in what manner we combine this information for
the pose estimation. The results of the in-field evaluation with a fully fledged AR system on different
smartphones are presented in Section 4, focussing on the performance of the door detection, the
optical pose estimation and the pose fusion. In Section 5, we provide an outlook on future work
in this area.
2 Background
Generally, the existing pose tracking solutions can be categorized into stationary external infrastruc-
ture-based, device-based or a combination of both. Primarily, the choice of pose tracking type depends
on the application environment, for example in- or outdoors or in small- or large-scale spaces.
In outdoor environments, typically, combined external infrastructure- and device-based methods
are applied. Global navigation satellite systems (GNSS) are a well-established solution for determin-
ing the translation in combination with device-coupled inertial measurement units (IMU), typically
34 C. BLUT AND J. BLANKENBACH
Figure 1. Evaluation of the rotation error of three smartphones, the Google Nexus 5, the Sony Xperia Z2 and the Google Pixel 2 XL,
in indoor and outdoor environments.
INTERNATIONAL JOURNAL OF DIGITAL EARTH 35
pose with the Perspective-n-Point (PnP) algorithm. Reitmayr and Drummond (2006) use their sys-
tem to overlay virtual wireframe models over physical buildings. The drawbacks of these approaches
are that the objects must be captured continuously by the camera to estimate poses, restricting mobi-
lity, and that, typically, prepared wireframe models are required.
In recent years, the increasing capabilities of low-cost off-the-shelf mobile devices have enabled
AR systems to be implemented with smartphones. Multiple software frameworks like Vuforia
(PTC), ARKit (Apple) or ARCore/Sceneform (Google), therefore, have been introduced, to realize
AR on smartphones with readily available 3D real-time rendering and pose tracking. Primarily,
the frameworks employ monocular visual-inertial odometry (VIO), a combination of single-camera
natural feature and IMU tracking (Linowes and Babilinski 2017). The drawbacks of these frame-
works are, on the one side, that pose estimation is only possible in a local reference frame, allowing
only relative pose tracking, i.e. the poses are estimated relative to an arbitrary initial pose, and, on the
other side, that there is no native support for CityGML, i.e. the data must be converted to a supported
format like filmbox (FBX), with the risk of losing information in the process.
We introduce a more flexible custom pose tracking system that (1) can be utilized indoors and
outdoors on off-the-shelf smartphones, (2) is fully self-contained and decoupled from external sys-
tems and (3) incorporates geometric-topologically and semantically rich CityGML data for the visu-
alizations and the pose tracking in a global reference frame in parallel.
To realize real-time pose tracking with readily available data and hardware, we designed the sys-
tem, as shown in Figure 2. The main challenges lie in the automated (1) extraction of 3D CityGML
door corners, (2) extraction of 2D image door corners, and (3) pose estimation using corresponding
2D/3D door corners. These are described in the following subsections.
Figure 3. Due to an inaccurate pose, the virtual CityGML door is shifted and does not align with the physical door.
The 3D corners are now ready to be used for the pose estimation. As shown in Figure 2, the cor-
responding 2D corners are next searched for in the camera image.
in which such corners were marked and noted manually that are generally regarded as corners by
humans, thus, are at a right angle of two edges. In the following, these are referred to as true corners.
The algorithms were run on 480 × 360 px images. In average, there were 51 true corners in the image
set. The parameters of each algorithm were set, so that at least the door corners could be detected in
the majority of the images.
As Table 1 shows, the Harris corner detector in average finds the most true corners, but on the
downside also detects the most false corners, which are ∼20 times higher than with the Shi-Tomasi
detector or curvature-based detector. This is crucial for the detection speed of the door detector,
since larger corner numbers, generally, result in longer detection runtimes. Comparing the Shi-
Tomasi detector to the curvature-based detector, the results are quite similar, with equal average
found true corners and only slight differences in the amount of detected false corners. The Shi-
Tomasi detector in average detects less false corners, which speaks for this solution, but the curva-
ture-based detector showed better performance for the specific case of door detection, in which for
89% of the evaluation images the door corners were included. Therefore, the curvature-based detec-
tor was used for our implementation of the door detection algorithm.
The door detector proceeds as follows: before a search for edges and corners can be applied, the
image is down-sampled to reduce computational cost. Gaussian blur is applied to remove noise.
Next, a binary edge map is created using the Canny algorithm from which the contours are extracted.
These contours are used for the corner detector based on Chen He and Yung (2008). The following
steps consist of grouping the found corner candidates into groups of four according to the geometric
model, which defines that a door consists of four corners C1 , C2 , C3 , C4 and four edges
E1 , E2 , E3 , E4 , and checking these against some geometric requirements such as that :
To rate the width and height of a door, the relative size is calculated using the lengths of the door
edges Eij and the length of the diagonal of the image lengthimageDiagonal . The ratio is represented by the
below equation
(xi − xj )2 + (yi − yj )2
Sizij = (1)
lengthimageDiagonal
The orientation of the edges Eij is calculated using the below equation
|xi − xj | 180
Oriij = tan−1 x (2)
|yi − yj | p
The values are tested if these fall in a predefined threshold, set for each condition. Candidates that
comply with the requirements are further tested by combining them with the edges found by the
Canny algorithm. Edges of door candidates are checked if they match with the edge map by using
a fill ratio which defines the amount that the edges overlap the edge map (below equation)
OverLapij
FRij = (3)
Lengthij
where OverLapij is the length of the overlapping part of the edge and Lengthij the total length of the
edge Eij .
In a last step, the 2D corner points are sorted counter-clock-wise, so these are in the same order as
the 3D corners found in the previous step (Section 3.1).
INTERNATIONAL JOURNAL OF DIGITAL EARTH 39
where K(R|t) is the camera matrix that projects a 3D point M ′ onto a 2D point m′ in a 2D plane and s is
an arbitrary scaling factor. K, the calibration matrix, contains the intrinsic parameters and [R|t] the
extrinsic parameters of the camera, where R is the rotation matrix and t the translation vector. Typi-
cally, pose tracking algorithms assume that K is known and estimate R and t.
The relationship between 2D and 3D points was already investigated nearly 180 years ago by Gru-
nert (1841). The pose estimation approach is based on the observation that the angles between the
2D points are equal to the corresponding 3D points. A straightforward method was later introduced
by Sutherland (1968) with the direct linear transformation (DLT) that can be used to estimate both
the intrinsic and extrinsic camera parameters (see the below equation)
The equation can be solved using the singular value decomposition (SVD). The disadvantages of
this solution are that a relatively high number of correspondences are necessary and that the results
can be inaccurate (Faugeras 1993; Hartley and Zisserman 2004).
Therefore, more favourable solutions use a pre-determined camera calibration matrix K so that
instead of 11 parameters, only 6 must be estimated. Using K and n = 3 known correspondences gen-
erally is referred to as the perspective-three-point (P3P) problem. Many solutions to P3P have been
introduced over the years, for example, Linnainmaa, Harwood, and Davis (1988), DeMenthon and
Davis (1992), Gao et al. (2003) and Kneip, Scaramuzza, and Siegwart (2011). Typically, with n = 3
known correspondences, four possible solutions are produced so that n ≥ 4 correspondences are
necessary for a unique solution. The n ≥ 4 case is referred to as the PnP problem and, generally, is
preferred, since the pose accuracy often can be increased with a higher number of points. Some
PnP solutions were presented by Lepetit, Moreno-Noguer, and Fua (2009), Li, Xu, and Xie (2012),
Garro, Crosilla, and Fusiello (2012) and Gao and Zhang (2013). Iterative solutions typically optimize
a cost function based on error minimization, for example, using the Gauss–Newton algorithm (Gauß
1809) or the Levenberg–Marquardt algorithm (Levenberg 1944; Marquardt 1963). Examples are the
minimization of the geometric error (e.g. 2D reprojection) or algebraic error. Some disadvantages of
iterative solutions are that these require a good initial estimate. Noisy data strongly influence the
results with the risk of falling into local minima. Also, the high computational complexity of many
solutions renders them unusable for real-time applications. Therefore, an alternative to iterative sol-
utions are non-iterative ones. A popular non-iterative PnP method is EPnP (Lepetit, Moreno-Noguer,
and Fua 2009). The main idea is to represent the n points of the 3D object space as a weighted sum of
four virtual control points. The coordinates of the control points then can be estimated by expressing
them as the weighted sum of the eigenvectors. Some quadric equations then are solved to determine the
weights. An advantage of this approach is the O(n) complexity.
As shown in Figure 2, the received pose is utilized to correct the virtual camera and fused with
accelerometer/gyroscope readings, to enable arbitrary rotations relative to the optical pose. Figure 4
shows an AR view with poses estimated with our pose tracking system. The virtual objects accurately
overlay their physical counterparts.
40 C. BLUT AND J. BLANKENBACH
4 Evaluation
The introduced proof-of-concept pose tracking system was integrated into an Android-based
AR system (Figure 5) and deployed to three smartphones, the Google Nexus 5, Sony Xperia
Z2 and Google Pixel 2 XL, to evaluate its performance. For each smartphone, the system was
Figure 5. Our AR system architecture. The entire system is Android-smartphone-based. The pose tracking system is one of three
core AR components, and binds the physical and virtual world. All components utilize third party libraries, such as OpenCV, jMonkey
and SpatiaLite. The system is accessible by the front-end graphical user interface.
INTERNATIONAL JOURNAL OF DIGITAL EARTH 41
Figure 6. Some examples of doors used for evaluating the optical pose estimation.
calibrated to account for lens distortion effects of the smartphone cameras according to
Luhmann et al. (2019).
The pose tracking system was evaluated in terms of accuracy, runtime and stability, with an image
dataset of various doors (Figure 6). Images were captured from distances of 3–9 m and at angles up
to 70° in different lighting conditions, so that results can be summed up into the six measurement
cases M1–M6, which represent our range of typical real-life application scenarios:
In total, 80 images that fit the cases (M1–M6) were captured with each smartphone. Most images
contained a door in a complex scene. We define complex scenes as cluttered environments with arbi-
trary objects such as wall paintings, books or other furniture. Additionally, 20 images with no door,
but some similar objects such as pictures on walls, were added to the dataset. For M1 and M2, the
camera was placed straight towards the door at different distances, for M3 and M4, the camera was
placed next to the door, at an angle. For good lighting conditions (M5), images were captured during
the day. Indoors, the windows were fully opened, and artificial lighting was turned on. For bad light-
ing conditions (M6), images were captured on an overcast day with the blinds of the windows shut
and all artificial lighting turned off. Figure 7 shows the used evaluation set-up.
In the following sections, we present the evaluation results of the door detection, the pose esti-
mation and the pose fusion, since these are the key components of our pose tracking system; there-
fore, the evaluation also reflects the practical applicability of our solution.
. The image contains a door and the door was properly detected (true positive – TP) (Figure 8).
. The image contains a door, but the door was not detected (false negative – FN) (Figure 9).
. The image contains no door and no door was detected (true negative – TN).
. The image contains no door, but a door was detected (false positive – FP).
Figure 10 shows the success rate (TP) of detecting a door in images at different resolutions con-
taining a door and Figure 11 the success rate of determining that no door is included in images (TN)
at different resolutions. As Figure 10 depicts, with a resolution of 240 × 180 px, the algorithm only
has a TP detection rate of 19% and increases rapidly to more than four times as much with a
Figure 8. Example of the detected door. The red dots are corner points and the green rectangle is the correctly detected door.
INTERNATIONAL JOURNAL OF DIGITAL EARTH 43
resolution of 480 × 360 px. Higher resolutions only slightly improve the TP rate. A similar trend is
found for the TN rate (Figure 11). With a resolution of 240 × 180 px, in more than 50%, a door is
falsely detected in an image without a door. For 480 × 360 px images, the rate is reduced to 20% (80%
TN rate) and only increases slightly for higher resolution images.
The measured detection time is defined as the duration of the entire detection process, including
steps such as downscaling of the image, extracting edges and analysing these. Figure 12 shows the
mean time required to detect a door in images with different resolutions. The total detection time
is lowest for 240 × 180 px with 17 ms for the Google Pixel 2 XL, 56 ms for the Sony Xperia Z2
and 72 ms for the Google Nexus 5 and increases with the size of the images to 1.5 s for the Google
Pixel 2 XL, 3.5 s for the Sony Xperia Z2 and 5.1 s for the Google Nexus 5.
Generally, we found that a resolution of 480 × 360 px represents the best compromise between
detection rate (Figures 10 and 11) and runtime (Figure 12). We, therefore, based our further evalu-
ations, specifically M1–M6, on this resolution.
We found that the TP detection rate of 84% is independent of the cases M1–M6, but the mean
detection time varies. As Figure 13 shows, the time does not directly correlate with the distance,
angle or lighting, but with the background of the door. Backgrounds with various different colours
and shapes increase the required detection time, since the algorithm finds more potential corners in
such cluttered backgrounds. Therefore, doors that are surrounded by objects require more time com-
pared to doors that are only surrounded by white walls. Typically, the images captured at an angle
Figure 10. True positive detection rate of the door detection algorithm for images with different resolutions.
44 C. BLUT AND J. BLANKENBACH
Figure 11. True negative detection rate of the door detection algorithm for images with different resolutions.
Figure 12. Time required to detect a door in cases using the same images in different resolutions.
and from a distance contained more of the surrounding environment, so that the detection process
was more complex.
The accuracy of the detection algorithm describes how well the automatically derived corners
match the actual corners of the door. A good accuracy is crucial for accurate pose estimations, as
shown by Händler (2012). For this, the door corners were manually extracted from each image
and the automatically derived corners were compared to them. Figure 14 shows that the automati-
cally extracted door corners of our algorithm in average only lie 1.3 px from the reference corners,
which is sufficient for accurate optical pose estimation.
Figure 13. Time required to detect a door in cases M1–M6 with 480 × 360 px images.
Figure 14. Accuracy of the automatically derived door corners with 480 × 360 px images.
stable pose using a sturdy tripod. The initial pose was automatically saved after a delay of 2 s to prevent
any distortions caused by interactions with the touch screen. After a pre-determined amount of time,
the next pose was saved and compared to the initial pose. The resulting rotation errors for each smart-
phone and the different sensor types are shown in Figures 17–19, respectively.
Figure 17 shows that the Rotation Vectors, generally, perform well in a 10 min window, with the mean
rotation errors below 1°. While the error of the Google Nexus 5 increases the slowest, the Sony Xperia Z2
shows a rather steep linear increase. The curve of the Google Pixel 2 XL shows that the error stabilizes after
a certain time. The Game Rotation Vectors (Figure 18) show the mean rotation errors under 0.5° in a
10 min window. The Google Pixel 2 XL performs the best but shows a linear error increase. In compari-
son, the errors of the Google Nexus 5 and the Sony Xperia Z2 each stabilize after a certain time. While the
average absolute rotation from the Rotation Vector suffers from the inaccurate measurements of the mag-
netometer resulting in bearing errors up to 25° [also see (Schall et al. 2009)], the relative rotation results are
significantly better. A comparison of Figures 17 and 18 with Figure 19 shows that the quality of the
rotation benefits from sensor fusion in the Rotation Vector and Game Rotation Vector. While the rotation
derived from the gyroscope (Figure 19) alone is comparatively good for the Sony Xperia Z2 and Google
Pixel 2 XL, the Google Nexus 5 suffers significantly from the sensor drift
The impact of movement was evaluated by comparing the initial pose to the pose after the move-
ments. To find the exact physical pose, the device was in before being moved, a tripod was used of
INTERNATIONAL JOURNAL OF DIGITAL EARTH 47
Figure 17. Quality of relative rotation over time using the rotation vector.
Figure 18. Quality of relative rotation over time using the game rotation vector.
which the initial pose was marked on the corresponding surface. The entire tripod with the smart-
phone then was moved accordingly and later placed exactly in the markings again. The results for
each smartphone and type of sensor are displayed in Figures 20–22, respectively. Figure 20 shows
that the mean rotation error is not influenced by movement when using the Rotation Vector.
Figures 21 and 22 show that the mean rotation error increases with each turn when using the
Game Rotation Vector and gyroscope. In both cases, the Sony Xperia Z2 shows the best and the Goo-
gle Pixel 2 XL the poorest results.
Figure 19. Quality of relative rotation over time using the gyroscope.
48 C. BLUT AND J. BLANKENBACH
Figure 20. Quality of relative rotation using the rotation vector when AR system is rotated.
Figure 21. Quality of relative rotation using the game rotation vector when AR system is rotated.
Figure 22. Quality of relative rotation using the gyroscope when AR system is rotated.
visualizations of model data, but also, for example, extending models with newly captured geo-refer-
enced information. For instance, damaged areas could be documented, building parts annotated or
even missing models added.
The performance especially benefits from newer generation smartphones, such as the Google
Pixel 2 XL. The fusion of the optical pose with local IMU poses allows the user to freely look around.
The low-cost sensors of the evaluation smartphones provide sufficient results, but the pose tracking
could be further improved. Currently, the door detection relies on features found by a corner detec-
tor. The entirety of points is analysed and tested against selected conditions to find possible door
candidates. While the current implementation already provides good results with a detection
speed of 215 ms and a positive detection rate of 80%, the detection rate could still be improved
by increasing the image resolution, which would increase feature detectability, at the cost of detection
speed. To counter this, the amount of corner points could be reduced by additional conditions. An
approach could be to exploit the colour of the doors. This information is available in the CityGML
model, so an additional test could be performed to check if the found corner points are part of a
certain coloured CityGML object.
The initial pose derived from the door-based pose estimation in average has a positional accuracy
of 17 cm and a rotation accuracy of 2.5°. The following relative IMU pose tracking in average has a
rotation error of 1–6°. This is noticeable in the visual augmentations, but not a major issue, since the
virtual objects augment their physical counterparts sufficiently for the AR use cases described in Sec-
tion 1. To further increase the quality of the augmentations and stabilize them, a possibility would be
to combine edge-based pose tracking, as shown by Reitmayr and Drummond (2006), and our door-
based method. Since the edge-based solution requires a good initial pose, our door detection solution
could be utilized for initialization. The disadvantage here would be that the door must always be vis-
ible for the edges to be matched. Therefore, the solution would need to be extended to use other
objects in the vicinity. A challenge hereby would be to find reliable physical objects, which, on the
one hand, have a constant pose over time and, on the other hand, do not have an overly complex
shape to be detected reliably.
Alternatively, VIO, as, for example, in ARCore, could be effective. As described by Google (10
December 2019, https://developers.google.com/ar/discover/concepts), ARCore finds unambiguously
re-identifiable image features and tracks these across subsequent images, deriving a relative camera
motion. The resulting poses are fused with IMU data. Additionally, ARCore uses the sets of detected
features to find and derive planes, such as wall-, ceiling-, floor- or table surfaces. These planes could
be utilized for a geometry matching process with the corresponding geo-referenced CityGML sur-
faces, to obtain poses in a global reference frame. Disadvantageous is that the plane detection relies
on image features, so that flat surfaces must be sufficiently texturized. White walls, therefore, may not
be detectable for example.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
Agugiaro, Giorgio, Joachim Benner, Piergiorgio Cipriano, and Romain Nouvel. 2018. “The Energy Application
Domain Extension for CityGML: Enhancing Interoperability for Urban Energy Simulations.” Open Geospatial
Data, Software and Standards 3 (1). doi: 10.1186/s40965-018-0042-y.
Biljecki, Filip, Kavisha Kumar, and Claus Nagel. 2018. “CityGML Application Domain Extension (ADE): Overview of
Developments.” Open Geospatial Data, Software and Standards 3 (1): 1–17. doi:10.1186/s40965-018-0055-6.
Biljecki, Filip, Jantien Stoter, Hugo Ledoux, Sisi Zlatanova, and Arzu Çöltekin. 2015. “Applications of 3D City Models:
State of the Art Review.” ISPRS International Journal of Geo-Information 4 (4): 2842–2889. doi:10.3390/ijgi4042842.
Blum, Jeffrey R., Daniel G. Greencorn, and Jeremy R. Cooperstock. 2013. “Smartphone Sensor Reliability for
Augmented Reality Applications.” doi:10.1007/978-3-642-40238-8_11.
50 C. BLUT AND J. BLANKENBACH
Blut, Christoph, Timothy Blut, and Jörg Blankenbach. 2017. “CityGML Goes Mobile: Application of Large 3D CityGML
Models on Smartphones.” International Journal of Digital Earth, 1–18. doi:10.1080/17538947.2017.1404150.
Canny, John. 1986. “A Computational Approach to Edge Detection.” IEEE Transactions on Pattern Analysis and
Machine Intelligence (6): 679–698. doi:10.1109/TPAMI.1986.4767851.
Chen He, Xiao, and Nelson Hon Ching Yung. 2008. “Corner Detector Based on Global and Local Curvature
Properties.” Optical Engineering 47 (5). doi:10.1117/1.2931681.
Choi, Changhyun, and Henrik I Christensen. 2012. “3D Textureless Object Detection and Tracking an Edge Based
Approach.” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 3877–3884. doi:10.
1109/IROS.2012.6386065.
DeMenthon, Daniel F., and Larry S. Davis. 1992. “Model-Based Object Pose in 25 Lines of Code.” In European
Conference on Computer Vision, 335–343. Springer. doi:10.1007/3-540-55426-2_38.
Faugeras, Oliver. 1993. Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press. doi:10.1007/978-3-
642-82429-6_2.
Feiner, Steven, Blair Macintyre, Tobias Höllerer, and Anthony Webster. 1997. “A Touring Machine: Prototyping 3D
Mobile Augmented Reality Systems for Exploring the Urban Environment.” In ISWC ‘97. Cambridge, MA, USA, 1,
74–81.
Gao, Xiao Shan, Xiao Rong Hou, Jianliang Tang, and Hang Fei Cheng. 2003. “Complete Solution Classification for the
Perspective-Three-Point Problem.” IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (8): 930–
943. doi:10.1109/TPAMI.2003.1217599.
Gao, Jingyi, and Yalin Zhang. 2013. “An Improved Iterative Solution to the PnP Problem.” Proceedings - 2013
International Conference on Virtual Reality and Visualization, ICVRV 2013, no. 3, 217–220. doi:10.1109/
ICVRV.2013.41.
Garro, Valeria, Fabio Crosilla, and Andrea Fusiello. 2012. “Solving the PnP Problem with Anisotropic Orthogonal
Procrustes Analysis.” Second International Conference on 3D Imaging, Modeling, Processing, Visualization
Transmission, IEEE, 262–269. doi:10.1109/3DIMPVT.2012.40.
Gauß, Carl Friedrich. 1809. “Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientium.”
Hamburgi: Sumtibus Frid. Perthes et I. H. Besser. doi:10.3931/e-rara-522.
Gröger, Gerhard, Thomas Kolbe, Claus Nagel, and Karl-Heinz Häfele. 2012. “OGC City Geography Markup Language
(CityGML) En-Coding Standard.” OGC, 1–344. doi:OGC 12-019.
Grunert, Johann August. 1841. “Das Pothenotische Problem in Erweiterter Gestalt Über Seine Anwendungen in Der
Geodäsie.” In Archiv Der Mathematik Und Physik, Band 1, 238–248. Greifswald: Verlag von C. A. Koch.
Händler, Verena. 2012. Konzeption Eines Bildbasierten Sensorsystems Zur 3D- Indoorpositionierung Sowie Analyse
Möglicher Anwendungen. Fachrichtung Geodäsie Fachbereich Bauingenieurwesen und Geodäsie Technische
Universität Darmstadt.
Harris, Chris, and Mike Stephens. 1988. “A Combined Corner and Edge Detector.” In Proc. of Fourth Alvey Vision
Conference. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.434.4816&rep=rep1&type=pdf.
Hartley, Richard, and Andrew Zisserman. 2004. Multiple View Geometry in Computer Vision. 2nd ed. Cambridge
University Press. doi:10.1016/S0143-8166(01)00145-2.
Herbert, Grant, and Xuwei Chen. 2015. “A Comparison of Usefulness of 2D and 3D Representations of Urban
Planning.” Cartography and Geographic Information Science. doi:10.1080/15230406.2014.987694.
Hollerer, T., S. Feiner, and J. Pavlik. 1999. “Situated Documentaries: Embedding Multimedia Presentations in the Real
World.” In ISWC ‘99, 99, 79–86. San Francisco, CA: IEEE. doi:10.1109/ISWC.1999.806664.
Hough, V., and C. Paul. 1962. Method and Means for Recognizing Complex Patterns. 3069654, issued 1962. doi:10.
1007/s10811-008-9353-1.
Kneip, Laurent, Davide Scaramuzza, and Roland Siegwart. 2011. “A Novel Parametrization of the Perspective-Three-
Point Problem for a Direct Computation of Absolute Camera Position and Orientation.” Proceedings of the IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, 2969–2976. doi:10.1109/CVPR.
2011.5995464.
Lepetit, Vincent, Francesc Moreno-Noguer, and Pascal Fua. 2009. “EPnP: An Accurate O(n) Solution to the PnP
Problem.” International Journal of Computer Vision 81 (2): 155–166. doi:10.1007/s11263-008-0152-6.
Levenberg, Kenneth. 1944. “A Method for the Solution of Certain Non-Linear Problems in Least Squares.” Quarterly of
Applied Mathematics 2 (2): 164–168. doi:10.1090/qam/10666.
Li, Shiqi, Chi Xu, and Ming Xie. 2012. “A Robust O(n) Solution to the Perspective-n-Point Problem.” IEEE
Transactions on Pattern Analysis and Machine Intelligence 34 (7): 1444–1450. doi:10.1109/TPAMI.2012.41.
Lima, João Paulo, Francisco Simões, Lucas Figueiredo, and Judith Kelner. 2010. “Model Based Markerless 3D Tracking
Applied to Augmented Reality.” Journal on 3D Interactive Systems 1: 2–15. ISSN: 2236-3297.
Linnainmaa, Seppo, David Harwood, and Larry S. Davis. 1988. “Pose Determination of a Three-Dimensional Object
Using Triangle Pairs.” IEEE Transactions on Pattern Analysis and Machine Intelligence 10 (5): 634–647. doi:10.
1109/34.6772.
Linowes, Jonathan, and Krystian Babilinski. 2017. Augmented Reality for Developers: Build Practical Augmented Reality
Applications with Unity, ARCore, ARKit, and Vuforia. Birmingham: Packt Publishing Ltd.
INTERNATIONAL JOURNAL OF DIGITAL EARTH 51
Luhmann, Thomas, Stuart Robson, Stephen Kyle, and Jan Boehm. 2019. Close-Range Photogrammetry and 3D
Imaging. doi:10.1515/9783110607253.
Marquardt, Donald W. 1963. “An Algorithm for Least-Squares Estimation of Nonlinear Parameters.” Journal of the
Society for Industrial and Applied Mathematics 11 (2): 431–441. doi: 10.1137/0111030.
Petit, Antoine, Eric Marchand, and Keyvan Kanani. 2013. “A Robust Model-Based Tracker Combining Geometrical and
Color Edge Information.” IEEE International Conference on Intelligent Robots and systems, IEEE, 3719–3724.
doi:10.1109/IROS.2013.6696887.
Piekarski, Wayne, and Bruce Thomas. 2001. “Augmented Reality with Wearable Computers Running Linux.” 2nd
Australian Linux Conference (Sydney), 1–14.
Priyantha, Nissanka B, Anit Chakraborty, and Hari Balakrishnan. 2000. “The Cricket Location-Support System.”
Proceedings of the 6th Annual International Conference on Mobile Computing and Networking, 32–43. doi:10.
1145/345910.345917.
Real Ehrlich, Catia, and Jörg Blankenbach. 2019. “Indoor Localization for Pedestrians with Real-Time Capability Using
Multi-Sensor Smartphones.” Geo-Spatial Information Science 22 (2): 73–88. doi:10.1080/10095020.2019.1613778.
Reitmayr, Gerhard, and Tom W. Drummond. 2006. “Going Out: Robust Model-Based Tracking for Outdoor
Augmented Reality.” Proceedings of the 5th IEEE and ACM International Symposium on Mixed and
Augmented reality, IEEE, 109–118. doi:10.1109/ISMAR.2006.297801.
Santana, José Miguel, Jochen Wendel, Agustín Trujillo, José Pablo Suárez, Alexander Simons, and Andreas Koch. 2017.
“Multimodal Location Based Services – Semantic 3D City Data as Virtual and Augmented Reality.” In Progress in
Location-Based Services 2016, 329–353. Springer. doi:10.1007/978-3-319-47289-8_17.
Schall, Gerhard, Dieter Schmalstieg, and Sebastian Junghanns. 2010. “Vidente – 3D Visualization of Underground
Infrastructure Using Handheld Augmented Reality.” GeoHydroinformatics: Integrating GIS and Water, 207–219.
doi:10.1.1.173.3513.
Schall, Gerhard, Johannes Schöning, Volker Paelke, and Georg Gartner. 2011. “A Survey on Augmented Maps and
Environments: Approaches, Interactions and Applications.” Advances in Web-Based GIS, Mapping Services and
Applications 9: 207–225.
Schall, G., D. Wagner, G. Reitmayr, E. Taichmann, M. Wieser, D. Schmalstieg, and B. Hofmann-Wellenhof. 2009.
“Global Pose Estimation Using Multi-Sensor Fusion for Outdoor Augmented Reality.” 8th IEEE International
Symposium on Mixed and Augmented reality, IEEE, 153–162. http://ieeexplore.ieee.org/xpls/abs_all.jsp?
arnumber=5336489%5Cnpapers2://publication/uuid/0F6141AD-F7D5-476E-93F0-425FC456B02F.
Schmalstieg, Dieter, and Tobias Höllerer. 2016. Augmented Reality – Principles and Practice. Boston, MA: Addison-
Wesley Professional.
Shi, Jianbo, and Carlo Tomasi. 1994. “Good Features to Track.” Proceedings of IEEE Conference on Computer Vision
and Pattern Recognition, 593–600. doi:10.1109/CVPR.1994.323794.
Stylianidis, E., E. Valari, K. Smagas, A. Pagani, J. Henriques, A. Garca, E. Jimeno, et al. 2016. “LARA: A Location-based
and Augmented Reality Assistive System for Underground Utilities’ Networks Through GNSS.” Proceedings of the
2016 International Conference on Virtual Systems and Multimedia, IEEE, 1–9. doi:10.1109/VSMM.2016.7863170.
Sutherland, Ivan E. 1968. “A Head-Mounted Three Dimensional Display.” Proceedings of the December 9–11, 1968,
Fall Joint Computer Conference, Part I on – AFIPS ‘68 (fall, part I), San Francisco, CA, 757–764. doi:10.1145/
1476589.1476686.
Vacchetti, Luca, Vincent Lepetit, and Pascal Fua. 2004. “Combining Edge and Texture Information for Real-Time
Accurate 3D Camera Tracking.” Proceedings of the Third IEEE and ACM International Symposium on Mixed
and Augmented Reality (ISMAR 2004), IEEE, 48–56. doi:10.1109/ISMAR.2004.24.
Ward, Andy, Alan Jones, and Andy Hopper. 1997. “A New Location Technique for the Active Office.” IEEE Personal
Communications 4 (5): 42–47. doi:10.1109/98.626982.
White, Sean, and Steven Feiner. 2009. “SiteLens: Situated Visualization Techniques for Urban Site Visits.” Proceedings
of the SIGCHI Conference on Human factors in Computing Systems, Boston, MA, USA, ACM, 1117–1120. doi:10.
1145/1518701.1518871.
Wuest, Harald, Florent Vial, and Didier Stricker. 2005. “Adaptive Line Tracking with Multiple Hypotheses for
Augmented Reality.” Proceedings – Fourth IEEE and ACM International Symposium on Symposium on Mixed
and Augmented Reality, ISMAR 2005, IEEE Computer Society, 62–69. doi:10.1109/ISMAR.2005.8.
Yang, Xiaodong, and Yingli Tian. 2010. “Robust Door Detection in Unfamiliar Environments by Combining Edge and
Corner Features.” 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition –
Workshops, CVPRW 2010, IEEE, 57–64. doi:10.1109/CVPRW.2010.5543830.