Jiao 2018
Jiao 2018
Jialin Jiao
Uber Technologies, Inc.
San Francisco, U.S.A.
[email protected]
Abstract— In recent years, autonomous driving technologies and efficiently at runtime. To assist with those, HD Maps
have attracted broad and enormous interests from both capture other useful prior information besides what is needed
academia and industry and are under rapid development. for localization and store the result of pre-computation for
High-Definition (HD) Maps are widely used as an many other problems autonomous vehicles needs to solve,
indispensable component of an autonomous vehicle system by including perception, prediction, motion planning, vehicle
researchers and practitioners. HD Maps are digital maps that control etc. [51,57,59]. One example of such pre-
contain highly precise, fresh and comprehensive geometric computation is the mapping of the 3D locations of the traffic
information as well as semantics of the road network and lights, which allows autonomous vehicles to only examine a
surrounding environment. They provide critical inputs to
small region instead of the whole field of view to efficiently
almost all other components of autonomous vehicle systems,
including localization, perception, prediction, motion planning,
detect the state of a traffic light [13]. While there are debates
vehicle control etc. Traditionally, it is very laborious and costly about the possibility of building a fully autonomous vehicle
to build HD Maps, requiring a significant amount of manual system without using pre-built HD Maps, no existing highly
annotation work. In this paper, we first introduce the automated driving (HAD) systems we know of are running
characteristics and layers of HD Maps; then we provide a in urban environments without using some kind of HD Map.
formal summary of the workflow of HD Map creation; and Historically, it is a complex and mostly-manual or semi-
most importantly, we present the machine learning techniques automated process to build HD Maps, requiring quite a wide
being used by the industry to minimize the amount of manual range and significant amount of software and manual effort
work in the process of HD Map creation. [22,51,57,64]. It is especially laborious and costly to extract
semantics from data. Automation of such manual work is
Keywords-High-Definition Map, HD Map Creation, critical to improve the efficiency of the process and the
Autonomous Vehicle System quality of the HD Maps. Heuristics based (e.g. [25,27-30])
and machine learning based approaches are both used for
I. INTRODUCTION such automation. This paper will be focused on examining
the use of machine learning techniques to assist with the
Fully autonomous vehicle systems had never been closer
creation of HD Maps. Also, while recently some companies
to reality as they are today: thousands of them are being
are trying to create HD Maps from camera images only
tested daily on the roads around the world [53,54]; experts
[48,57,58], the HD Map creation process discussed here uses
have predicted that between 2020 to 2040, autonomous
both LiDAR point clouds and camera images. It is not
vehicles will become very normal in our life and we will be
intended to be an exhaustive survey in every aspects, but we
seeing more and more autonomous vehicles running within
are hoping it could help to attract more and more interests in
the current traffic systems, self-driving alongside with
applying machine learning techniques in this area.
human-driving cars, cyclists and pedestrians [1].
The paper is organized as follows: section II briefly
High-Definition (HD) Maps for autonomous vehicles are
introduces the basic concepts of HD Maps, section III talks
pre-built digital model of the driving environment with
about the general workflow of HD Map creation, section IV
highly precise, fresh and comprehensive geometric
and V discuss the reasons of applying machine learning
information and semantics. As early as in DARPA
techniques in HD Map creation and an aspect of this kind of
challenges in the 2000s, HD Maps have already been used
machine learning, followed by a review of examples of
for precise localization of the autonomous vehicles
machine learning applications in different steps in section
[2,3,6,60]. However, the usefulness of HD Maps is more
VI; lastly, section VII reviews some recent advances on end-
than just for precise localization. To build a fully
to-end deep learning directly on 3D point clouds.
autonomous vehicle system operating in real-world
environments (most challengingly, in urban environments II. BASICS OF HIGH-DEFINITION MAP
[3,6,59]), many components need to work closely together
and be optimized holistically. This task is so complicated HD Maps for autonomous vehicles are different from
with enormous inputs, parameters and uncertainty that it is regular digital maps (web based or mobile based) used by us
very challenging to do all the computation in real time while human and have the following special characteristics.
still meeting the performance and safety requirement; also, A. Characteristics of High-Definition Maps
some static elements and properties of the driving
HD Maps for autonomous vehicle systems should have:
environment might be difficult to detect by sensors reliably
368
TABLE I. COMPARISON OF POINT CLOUDS AND IMAGES 2) Map Data Fusion and Data Processing
Once we have accurate poses, we then could do map data
Data Characteristics Uses in HD Map Creation
(LiDAR point clouds and camera images) fusion [50]. Note
that for HD Mapping, the resolution and quality of videos are
Point Clouds 3D, precise, independent of 3D location detection usually not satisfactory, and we don’t need such a high frame
illumination, could be directly, geometry rate as from videos, so higher resolution still images taken a
noisy, sparse, lacking extraction, some
texture and color semantics/attributes few frames per second (usually below 10 frame/sec) are
extraction commonly used [13]. During the data fusion, multiple scans
of point clouds are aligned and calibrated to get denser point
Images 2D, high resolution, 3D location detection clouds; and point clouds and camera images are registered to
quality affected by lighting through triangulation,
conditions 3D reconstruction, each other so that we could use the point cloud to get the 3D
Semantics/attributes location of objects directly and the registered images to
extraction recognize the semantics because point clouds provide 3D
position but usually too sparse for sign content while images
B. HD Map Generation will do a great job on those but do not provide 3D
This is the back-office work that processes the collected information.
data and generates the HD Maps. Roughly it could be further Other data processing work are also carried out including
broken down into 4 steps (Fig. 1): road plane generation, removal of irrelevant objects (e.g.
1) Sensor Fusion and Pose Estimation dynamic objects and objects too far away from the road), and
Knowing the accurate pose (location and orientation) of a texturing to generate photorealistic orthographic images etc.
data collection vehicle is key to generating HD Maps. If the 3) 3D Object Location Detection
poses of the vehicles are inaccurate, it is impossible to For road elements whose precise geometry and location
produce precise maps. Once we have the accurate poses of are important (e.g. lane boundaries, stop lines, curbs, traffic
the vehicle, and given we know where the sensors are lights, overpasses, railway tracks, guardrails, light poles,
mounted and their relative angles to the vehicle frame, we speed bumps, even potholes etc.), we need to map their
could infer the accurate pose of the point cloud and image precise 3D locations. LiDAR point clouds contain 3D
frames easily. location information and 3D object detection on point clouds
Although accurate pose could not be acquired directly at are performed either using geometry-based method [27-30]
runtime due to the limitation of GPS, IMU and wheel or deep learning on 3D point clouds [10-12]. We could also
odometry etc. [59] (unless performing online localization detect 3D object locations by triangulation from images, one
against a pre-built HD Maps, but we do not have HD Maps such example could be found in [13].
yet), accurate pose could be estimated by offline 4) Semantics/Attributes Extraction
optimization by fusing log of different sensors using graph- The last and also the step with most work is to extract
based SLAM (Simultaneous Localization and Mapping) semantics and attributes from data. The work usually
[6,9]. includes: lane/road model construction, traffic signs
recognition and association with lanes, association of traffic
lights with lanes, road marking semantics extraction, various
road elements (e.g. light poles) detection etc.
There are actually other works need to be done before a
large-scale HD Map could be generated, but the
aforementioned steps are the major ones.
C. Quality Control and Validation
Once HD Maps are generated, pre-defined quality
metrics must be met, and HD Maps could be validated by
different means including testing on road and verification by
using other survey methods.
D. Update and Maintenance
This stage is the continuous work to keep the HD Maps
updated timely and fix issues discovered during the use of
them.
Figure 1. Workflow of HD Map Creation
IV. MACHINE LEARNING ASSISTED HD MAP CREATION
369
The following aspects of HD Map creation process have
made machine learning techniques a natural choice to
improve the efficiency of HD Map creation and quality of
the HD Maps:
A. Significant human labor required
Months and years of hundreds or even thousands of
people are needed to build large-scale HD Maps manually or
semi-automatically, and much of the manual work are
repetitive, tedious; not just they are time consuming and
costly, but also prone to error because the tasks require great Figure 2. “Human-in-the-loop” Machine Learning in HD Map Generation
deal of attention and focus. Because of this aspect, there is learning models is out of the scope of this paper, a few
no better choice than implementing automation by software, methods for estimating confidence scores of neural network
and machine learning has proven to be able to do a great job predictions are worth mentioning, including Bayesian
in many of such manual tasks. approach using Monte Carlo dropout [14], entropy based
B. Massive data with high-dimension confidence score with Adversarial Training [15], and
distance-based confidence score [16]). Once we have the
Whenever there is a lot of data and data has high
confidence scores of the machine learned model output, we
dimensions, the tool we usually seek for help is machine
could save the high-confident output into the HD Maps
learning techniques.
directly and send the low-confident output to human
C. Shared problems with other autonomous driving tasks operators to judge, and the manual labels for those low-
Many of the problems we need to solve for building HD confident results then go into the HD Maps and are fed back
Maps overlap with the problems from perception, to re-train the machine learning models in next iteration. In
localization, prediction etc. Since there are already a lot of some sense, “human-in-the-loop” machine learning is a type
machine learning work done for those tasks, it is wise to of active learning [17].
borrow some of those tools and algorithms to use for HD
Map creation. VI. EXAMPLES OF MACHINE LEARNING USES IN HD MAP
There are actually other reasons to apply machine
CREATION
learning in HD Map creation, including the argument that
machine learned model are easier to generalize and be In this section we are going to examine the applications
transferred to solving similar problems in other cities or of machine learning techniques in some of the HD Map
regions. creation steps. Note that the effectiveness of machine
learning in assisting HD Map creation mainly lies in
V. “HUMAN-IN-THE-LOOP” MACHINE LEARNING semantics/attributes extraction.
The way machine learning techniques are used in HD A. Pose Estimation
Map creation has a very evident characteristics that it is
commonly understood as a “human-in-the-loop” machine Traditionally, pose estimation could be done with
learning [42-44,46]. “Human-in-the-loop” machine learning techniques such as filter-based [38,39] or graph-based
in HD Map creation is the process of iteratively improving SLAM[37] [37] or visual SLAM [40]. Recently there are
the machine learning models during the process of creating deep learning based approaches to tackle the pose estimation
the HD Maps by acting on feedback to the results of the problem using images [18,19]; one example is the PoseNet
models with the involvement of human (including operators [18], which is a modified GoogLeNet for estimating 6-DOF
for annotation and researchers/engineers for training and (Degree of Freedom) pose. It replaces the softmax classifiers
improving the machine learning models). “Human-in-the- with affine regressors and outputs a pose vector at the final
loop” machine learning is especially useful when the tasks fully connected layer. Although the performance of the
require very high accuracy, but the performance of the PoseNet is not satisfactory for use for making HD Maps, it is
machine learning model isn’t there yet. As shown in Fig.2, in after all a first step in the direction toward applying deep
the very beginning, there is no machine learned models, all learning in solving the pose estimation problem.
we have is just unlabeled data collected by the MMS B. Lane/Road Marking Extraction
vehicles; trained operators will have to manually label all the
Lane/Road marking extraction is the extraction of
data and generates HD Maps; now that we have labelled semantics from the markings on the road surface. Lane/road
data, researchers/engineers could train supervised machine
markings are markings painted on the road surface, including
learning models which are used to classify remaining or lane boundaries, lane divider lines, arrows, crosswalk (zebra
newly collected unlabeled data. There is a critical ingredient
crossing) markings, speed limit texts, lane type markings etc.
that we must have to make the “human-in-the-loop” machine Quite some work has been done in extraction of lane/road
learning work: confidence estimation of the machine
markings from LiDAR point cloud: a lot of them use
learning model predictions (while the topic of estimating the heuristics and rule-based approaches, and we will only
confidence of the output of different types of machine
370
discuss those using machine learning in some way. [20] uses F. Light Poles Extraction
Deep Boltzmann Machine to classify small-size road Light poles are the kind of road elements that could help
markings (arrows and rectangles) after extracting them from with localization, especially in time when there is no much
road surface points using reflectivity thresholds, and then features from road surface. Historically a lot of work in
uses PCA to further differentiate crosswalk (rectangles detecting light poles are done with geometry-based heuristics
stacked vertically) from dashed lane line (rectangles lined up or using energy function, e.g. [27]. Works using machine
horizontally); the authors reported a completeness (i.e. learning also show good performance; usually machine
recall) of 93%, and correctness (i.e. precision) of 92%. Baidu learning models are used in classification of the pole-like
uses a CNN to extract from reflectivity imaging generated objects after candidates are being identified by geometry-
from point clouds, and by using multiple deconvolution based methods. For example, [28] uses a Gaussian Mixture
layers with un-pooling, the resulting HD Maps could reach Model to recognize the lighting pole from candidates, the
impressive resolution (up to 1cm x 1cm precision) and the overall performance reaches a true positive rate of 90%; [29]
pixel level recall and precision of their method are 93.80% use linear discriminant analysis and support vector machines
and 95.49% [21]. to classify different types of pole-like objects including
C. Traffic Light Mapping lighting pole, reaching accuracy over 90%. [30] uses random
forest to classify pole-like objects and the precision and
Traffic Light mapping is to map the location of the traffic recall for light pole reach 94.8% and 97.5%.
lights and associate them with corresponding lanes so that it
could speed up the traffic light state detection for G. HD Map Refresh/Update
autonomous vehicles in run-time. Google maps traffic lights In order to refresh HD Map timely, at least two things
by using an image-based method. They first get accurate need to happen: collecting fresh HD data timely and
poses either by offline optimization with SLAM or online detecting changes from the fresh data. To collect fresh data,
localization with a pre-built HD Maps, then they filter out it is usually not cost-efficient to send the mapping vehicles
most images that won’t contain traffic lights according to out everywhere to re-collect data constantly [52]. While
their proximity to intersections, then a machine learned autonomous driving companies could rely on their
classifier is used to detect the traffic lights in the images and autonomous vehicles being tested on road to collect fresh
an iterative process is used to triangulate the 3D location data with high quality, the coverage is still in question; Some
from multiple images of the same traffic lights and associate HD Maps suppliers work with automakers to get fresh map
images groups by same 3D locations until converge. Results data from collected intelligent vehicles equipped with
show their method could map 95% ~ 99% of the traffic lights various sensors [63]; currently many HD Maps builders are
with location error less than 15 cm [13]. also taking the path of crowdsourcing [49] where the
D. Traffic Sign Mapping challenges lie in ensuring the quality of the data meet the
need of HD Maps. Once fresh data is in, road changes
Traffic sign mapping are usually done in two ways: detection and road events (e.g. road closure) detection must
1) Image-based method similar to the way of traffic be done. The problem of training a machine learning model
light mapping aforementioned; to detect road events from images is that training data for
2) Method based on fused images-point clouds data: road events might be comparatively rare (data imbalance
Basically, the idea is to first detect the location of the traffic between positive instances and negative instances). To solve
signs using 3D point clouds and recognize the semantics of this, usually transfer learning [31] is used to leverage pre-
the signs using registered images (this could be done easily trained models. One might also try to use aggregated GPS
by CNN with great performance nowadays). traces from connected vehicles or mobile devices to detect
[22] uses SVM to detect all types of traffic signs from 3D changes on traffic pattern to infer the possibility of road
point clouds directly (without reading their content) with a changes. [45,47,52,61,62].
precision of 89%, but their work does not recognize the
content of the signs (except they detect stop sign by their VII. DEEP LEARNING ON 3D POINT CLOUD
unique shape). Although LiDAR scanners are quite costly today, they
are still the primary sensor for HD Mapping mainly because
E. Road Edge/Curb Extraction many benefit of LiDAR point clouds (see Table 1). As deep
Road edge/curb extraction is a task for building learning has become dominant in computer vision from
vectorized lane/road model. Traditionally, this task has been images, researchers and practitioners had begun to explore
done by operators manually outlining the edge/curb with applying deep learning to 3D point clouds directly.
visualization tools. There are also automations mainly based There are many challenges to learn from point clouds
on heuristics of elevation difference of points of the curbs directly due to the characteristics of point clouds:
and the road surface [23,24], some also use heuristics based (1) Different from pixels in 2D images, points in point
on density and slope changes [23] or other heuristics [25]. clouds are unordered and unstructured;
Some works using deep learning has been published but the (2) Points are sparse;
performance is not there yet [26]. (3) Highly variable point density;
(4) Point cloud data is noisy: missing data is common;
371
(5) Lacking color and texture; [2] Buehler, M., Iagnemma, K. and Singh, S. eds., 2007. The 2005
(6) Misalignments due to vehicle motion etc. DARPA grand challenge: the great robot race (Vol. 36). Springer
Science & Business Media.
Historically, when doing machine learning on point
[3] Buehler, M., Iagnemma, K. and Singh, S. eds., 2009. The DARPA
clouds data, point clouds are first converted to other urban challenge: autonomous vehicles in city traffic (Vol. 56).
representation (e.g. voxelization [32-34] or projection into a springer.
perspective view [32,35]) before handcrafted feature [4] https://automotive.tomtom.com/wordpress/wp-
engineering for specific tasks. There is hardly any work that content/uploads/2017/01/Brochure-TomTom-Automotive-1.pdf
could train a more generic machine learning model end-to- [5] Liu, S., Li, L., Tang, J., Wu, S. and Gaudiot, J.L., 2017. Creating
end directly from point clouds data until the proposals of Autonomous Vehicle Systems. Synthesis Lectures on Computer
PointNet, PointNet++, VoxelNet [10-12] etc. Science, 6(1), pp. i-186.
PointNet [10] is a deep network architecture that could be [6] Levinson, J., Montemerlo, M. and Thrun, S., 2007, June. Map-Based
trained end-to-end from point clouds directly. Point cloud is Precision Vehicle Localization in Urban Environments. In Robotics:
Science and Systems (Vol. 4, p. 1).
input to the network as an N (number of points) by D
[7] Automatic laser calibration, mapping, and localization for
(dimension) 2D matrix. PointNet is proved to be pretty autonomous vehicles, Jesse Levinson. Thesis (Ph.D.), Stanford
robust when there is data corruption. The weakness of University, 2011
PointNet is that it can’t learn local structure thus hard to be [8] https://medium.com/waymo/building-maps-for-a-self-driving-car-
generalized to large scale scene. It has a classification 723b4d9cd3f4
network and a segmentation network whose performance is [9] Thrun, S. and Montemerlo, M., 2006. The graph SLAM algorithm
on par or slightly better than previous deep networks learned with applications to large-scale mapping of urban structures. The
from point clouds after they are converted to other forms of International Journal of Robotics Research, 25(5-6), pp.403-429.
representation. [10] Qi, C.R., Su, H., Mo, K. and Guibas, L.J., 2017. Pointnet: Deep
PointNet++ [11] improves on PointNet. It is a learning on point sets for 3d classification and segmentation. Proc.
Computer Vision and Pattern Recognition (CVPR), IEEE, 1(2), p.4.
hierarchical neural network that applies PointNet recursively
[11] Qi, C.R., Yi, L., Su, H. and Guibas, L.J., 2017. Pointnet++: Deep
on a nested partitioning of the input point set. The hierarchical feature learning on point sets in a metric space. In
hierarchical point set feature learning is analogous to the Advances in Neural Information Processing Systems (pp. 5105-
multiple layers of convolution operations of a ConvNet: 5114).
PointNet++ extracts local features capturing fine geometric [12] Zhou, Y. and Tuzel, O., 2017. VoxelNet: End-to-End Learning for
structures from small neighborhoods; such local features are Point Cloud Based 3D Object Detection. arXiv preprint
further grouped into larger units and processed to produce arXiv:1711.06396.
higher level features. The 3D shape classification accuracy [13] Fairfield, N. and Urmson, C., 2011, May. Traffic light mapping and
detection. In Robotics and Automation (ICRA), 2011 IEEE
of PointNet++ reach 91.9% when tested on ModelNet40 data International Conference on (pp. 5421-5426). IEEE.
set. [14] Gal, Y. and Ghahramani, Z., 2016, June. Dropout as a Bayesian
VoxelNet [12] is introduced by Apple and is another end- approximation: Representing model uncertainty in deep learning. In
to-end trainable deep neural network architecture on 3D international conference on machine learning (pp. 1050-1059).
point clouds directly. It proposes a novel voxel feature [15] Lakshminarayanan, B., Pritzel, A. and Blundell, C., 2017. Simple and
encoder to transform point clouds into descriptive volumetric scalable predictive uncertainty estimation using deep ensembles. In
representation before feeding into an RPN (region proposal Advances in Neural Information Processing Systems (pp. 6405-6416).
network) to generate detection. VoxelNet was once the [16] Mandelbaum, A. and Weinshall, D., 2017. Distance-based
leading classifier in KITTI car detection benchmark [36]. Confidence Score for Neural Network Classifiers. arXiv preprint
arXiv:1709.09844.
VIII. CONCLUSION [17] Krishnakumar, A., 2007. Active learning literature survey. Technical
Report, University of California, Santa Cruz.
The contribution of this paper is two-fold: first, it [18] Kendall, A., Grimes, M. and Cipolla, R., 2015, December. Posenet: A
provides a first of a kind (to the best of my knowledge) of convolutional network for real-time 6-dof camera relocalization. In
formal summary of the complete workflow of HD Map Computer Vision (ICCV), 2015 IEEE International Conference on
(pp. 2938-2946). IEEE.
creation process used in the industry; second, it presents a [19] Walch, F., Hazirbas, C., Leal-Taixé, L., Sattler, T., Hilsenbeck, S. and
detailed review of machine learning techniques used in Cremers, D., 2016. Image-based localization using LSTMs for
assisting the creation of HD Maps by industry practitioners structured feature correlation. arXiv preprint arXiv:1611.07890.
as well as academic researchers. Hopefully this paper will [20] Yu, Y., Li, J., Guan, H., Jia, F. and Wang, C., 2015. Learning
hierarchical features for automated extraction of road markings from
be informative and interesting to audience who work in HD 3-D mobile LiDAR point clouds. IEEE Journal of Selected Topics in
Map creation and autonomous vehicles systems and to those Applied Earth Observations and Remote Sensing, 8(2), pp.709-726.
who care about improving the efficiency of HD Map [21] He, B., Ai, R., Yan, Y. and Lang, X., 2016, November. Lane marking
creation and the quality of the HD Maps. detection based on Convolution Neural Network from point clouds. In
Intelligent Transportation Systems (ITSC), 2016 IEEE 19th
International Conference on (pp. 2475-2480). IEEE.
REFERENCES
[22] Levinson, J., Askeland, J., Becker, J., Dolson, J., Held, D., Kammel,
[1] Liu, S., Peng, J. and Gaudiot, J.L., 2017. Computer, Drive My Car! S., Kolter, J.Z., Langer, D., Pink, O., Pratt, V. and Sokolsky, M.,
Computer, 50(1), pp.8-8. 2011, June. Towards fully autonomous driving: Systems and
algorithms. In Intelligent Vehicles Symposium (IV), 2011 IEEE (pp.
163-168). IEEE.
372
[23] Zhou, L. and Vosselman, G., 2012. Mapping curbstones in airborne [43] https://www.computerworld.com/article/3004013/robotics/why-
and mobile laser scanning data. International Journal of Applied Earth human-in-the-loop-computing-is-the-future-of-machine-learning.html
Observation and Geoinformation, 18, pp.293-304. [44] Xin, D., Ma, L., Liu, J., Macke, S., Song, S. and Parameswaran, A.,
[24] Zhang, W., 2010, June. Lidar-based road and road-edge detection. In 2018. Accelerating Human-in-the-loop Machine Learning:
Intelligent Vehicles Symposium (IV), 2010 IEEE (pp. 845-848). Challenges and Opportunities. arXiv preprint arXiv:1804.05892.
IEEE. [45] Stanojevic, R., Abbar, S., Thirumuruganathan, S., Morales, G.D.F.,
[25] Yang, B., Fang, L. and Li, J., 2013. Semi-automated extraction and Chawla, S., Filali, F. and Aleimat, A., 2018, January. Road Network
delineation of 3D roads of street scene from mobile laser scanning Fusion for Incremental Map Updates. In LBS 2018: 14th International
point clouds. ISPRS Journal of Photogrammetry and Remote Sensing, Conference on Location Based Services (pp. 91-109). Springer,
79, pp.80-93. Cham.
[26] Rachmadi, R.F., Uchimura, K., Koutaki, G. and Ogata, K., 2017, [46] https://blog.mapillary.com/update/2017/10/12/human-in-the-
September. Road edge detection on 3D point cloud data using loop.html
Encoder-Decoder Convolutional Network. In Knowledge Creation [47] Chen, C., Lu, C., Huang, Q., Yang, Q., Gunopulos, D. and Guibas, L.,
and Intelligent Computing (IES-KCIC), 2017 International 2016, August. City-scale map creation and updating using GPS
Electronics Symposium on (pp. 95-100). IEEE. collections. In Proceedings of the 22nd ACM SIGKDD International
[27] Yu, Y., Li, J., Guan, H., Wang, C. and Yu, J., 2015. Semiautomated Conference on Knowledge Discovery and Data Mining (pp. 1465-
extraction of street light poles from mobile LiDAR point-clouds. 1474). ACM.
IEEE Transactions on Geoscience and Remote Sensing, 53(3), [48] https://www.mapillary.com/
pp.1374-1386.
[49] Dabeer, O., Gowaiker, R., Grzechnik, S.K., Lakshman, M.J.,
[28] Zheng, H., Wang, R. and Xu, S., 2017. Recognizing Street Lighting Reitmayr, G., Somasundaram, K., Sukhavasi, R.T. and Wu, X., 2017.
Poles from Mobile LiDAR Data. IEEE Transactions on Geoscience An End-to-End System for Crowdsourced 3d Maps for Autonomous
and Remote Sensing, 55(1), pp.407-420. Vehicles: The Mapping Component. arXiv preprint
[29] Ordóñez, C., Cabo, C. and Sanz-Ablanedo, E., 2017. Automatic arXiv:1703.10193.
Detection and Classification of Pole-Like Objects for Urban [50] De Silva, V., Roche, J. and Kondoz, A., 2017. Fusion of LiDAR and
Cartography Using Mobile Laser Scanning Data. Sensors, 17(7), Camera Sensor Data for Environment Sensing in Driverless Vehicles.
p.1465. arXiv preprint arXiv:1710.06230.
[30] Fukano, K. and Masuda, H., 2015. Detection and Classification of [51] Franke, U., Pfeiffer, D., Rabe, C., Knoeppel, C., Enzweiler, M., Stein,
Pole-like Objects from Mobile Mapping Data. ISPRS Annals of F. and Herrtwich, R.G., 2013, December. Making bertha see. In
Photogrammetry, Remote Sensing & Spatial Information Sciences, 2. Computer Vision Workshops (ICCVW), 2013 IEEE International
[31] Pan, S.J. and Yang, Q., 2010. A survey on transfer learning. IEEE Conference on (pp. 214-221). IEEE.
Transactions on knowledge and data engineering, 22(10), pp.1345- [52] Massow, K., Kwella, B., Pfeifer, N., Häusler, F., Pontow, J., Radusch,
1359. I., Hipp, J., Dölitzscher, F. and Haueis, M., 2016, November.
[32] C. R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, and L. Guibas. Deriving HD maps for highly automated driving from vehicular probe
Volumetric and multi-view cnns for object classification on 3d data. data. In Intelligent Transportation Systems (ITSC), 2016 IEEE 19th
In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, International Conference on (pp. 1745-1752). IEEE.
2016. [53] https://en.wikipedia.org/wiki/Autonomous_car#Testing
[33] D. Maturana and S. Scherer. Voxnet: A 3d convolutional neural [54] https://techcrunch.com/2017/11/07/waymo-now-testing-its-self-
network for real-time object recognition. In IEEE/RSJ International driving-cars-on-public-roads-with-no-one-at-the-wheel/
Conference on Intelligent Robots and Systems, September 2015.
[55] https://youtu.be/YXylqtEQ0tk?t=266
[34] D. Z. Wang and I. Posner. Voting for voting in online point cloud
object detection. In Proceedings of Robotics: Science and Systems, [56] https://youtu.be/Uj-rK8V-rik?t=640
Rome, Italy, July 2015. [57] Ziegler, J., Bender, P., Schreiber, M., Lategahn, H., Strauss, T.,
Stiller, C., Dang, T., Franke, U., Appenrodt, N., Keller, C.G. and
[35] B. Li, T. Zhang, and T. Xia. Vehicle detection from 3d lidar using
Kaus, E., 2014. Making Bertha drive—An autonomous journey on a
fully convolutional network. In Robotics: Science and Systems, 2016.
historic route. IEEE Intelligent Transportation Systems Magazine,
[36] http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark= 6(2), pp.8-20.
3d
[58] https://lvl5.ai/
[37] Grisetti, G., Kummerle, R., Stachniss, C. and Burgard, W., 2010. A
[59] Schreiber, M., Knöppel, C. and Franke, U., 2013, June. Laneloc: Lane
tutorial on graph-based SLAM. IEEE Intelligent Transportation
marking based localization using highly accurate maps. In Intelligent
Systems Magazine, 2(4), pp.31-43.
Vehicles Symposium (IV), 2013 IEEE (pp. 449-454). IEEE.
[38] Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D.,
[60] Levinson, J. and Thrun, S., 2010, May. Robust vehicle localization in
Neira, J., Reid, I. and Leonard, J.J., 2016. Past, present, and future of
urban environments using probabilistic maps. In Robotics and
simultaneous localization and mapping: Toward the robust-perception
Automation (ICRA), 2010 IEEE International Conference on (pp.
age. IEEE Transactions on Robotics, 32(6), pp.1309-1332.
4372-4378). IEEE.
[39] Montemerlo, M., Thrun, S., Koller, D. and Wegbreit, B., 2002.
[61] Biagioni, J. and Eriksson, J., 2012. Inferring road maps from global
FastSLAM: A factored solution to the simultaneous localization and
positioning system traces: Survey and comparative evaluation.
mapping problem. Aaai/iaai, 593598.
Transportation Research Record: Journal of the Transportation
[40] Ros, G., Sappa, A., Ponsa, D. and Lopez, A.M., 2012, June. Visual Research Board, (2291), pp.61-71.
slam for driverless cars: A brief survey. In Intelligent Vehicles
[62] Liu, X., Biagioni, J., Eriksson, J., Wang, Y., Forman, G. and Zhu, Y.,
Symposium (IV) Workshops (Vol. 2).
Mining large-scale, sparse GPS traces for map inference. In
[41] Liu, S., Tang, J., Wang, C., Wang, Q. and Gaudiot, J.L., 2017. A Proceedings of the 18th ACM SIGKDD International Conference on
Unified Cloud Platform for Autonomous Driving. Computer, 50(12), Knowledge Discovery and Data Mining—KDD (Vol. 12, p. 669)
pp.42-49.
[63] https://techcrunch.com/2018/01/08/intels-mobileye-will-have-2-
[42] https://www.figure-eight.com/resources/human-in-the-loop/ million-cars-on-roads-building-hd-maps-in-2018/
[64] https://www.wired.com/2014/12/google-maps-ground-truth/
373