1608 07916 PDF

This document describes a method for vehicle detection from 3D lidar point clouds using fully convolutional networks (FCNs). Specifically, it projects the 3D point cloud data into a 2D point map representation and uses a single 2D end-to-end FCN to simultaneously predict objectness confidence and bounding boxes. Experiments on the KITTI dataset demonstrate state-of-the-art performance of the proposed method for 3D vehicle detection from lidar scans.

Uploaded by

hungbkpro90

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views8 pages

1608 07916 PDF

Uploaded by

hungbkpro90

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Vehicle Detection from 3D Lidar Using Fully

Convolutional Network
Bo Li, Tianlei Zhang and Tian Xia
Baidu Research Institute for Deep Learning
{libo24, zhangtianlei, xiatian}@baidu.com

AbstractConvolutional network techniques have recently example when processing the point cloud captured by an
achieved great success in vision based detection tasks. This autonomous vehicle, simply removing the ground plane and
arXiv:1608.07916v1 [cs.CV] 29 Aug 2016

paper introduces the recent development of our research on cluster the remaining points can generate reasonable segmen-
transplanting the fully convolutional network technique to the
detection tasks on 3D range scan data. Specifically, the scenario tation [10, 5]. More delicate segmentation can be obtained
is set as the vehicle detection task from the range data of Velodyne by forming graphs on the point cloud [32, 14, 21, 29, 30].
64E lidar. We proposes to present the data in a 2D point map and The subsequent object detection is done by classifying each
use a single 2D end-to-end fully convolutional network to predict segments and thus is sometimes vulnerable to incorrect seg-
the objectness confidence and the bounding boxes simultaneously. mentation. To avoid this issue, Behley et al. [2] suggests
By carefully design the bounding box encoding, it is able to
predict full 3D bounding boxes even using a 2D convolutional to segment the scene hierarchically and keep segments of
network. Experiments on the KITTI dataset shows the state-of- different scales. Other methods directly exhaust the range scan
the-art performance of the proposed method. space to propose candidates to avoid incorrect segmentation.
For example, Johnson and Hebert [13] randomly samples
I. I NTRODUCTION points from the point cloud as correspondences. Wang and
For years of the development of robotics research, 3D lidars Posner [31] scan the whole space by a sliding window to
have been widely used on different kinds of robotic platforms. generate proposals.
Typical 3D lidar data present the environment information by To classify the candidate data, some early researches assume
3D point cloud organized in a range scan. A large number of known shape model and match the model to the range scan
research have been done on exploiting the range scan data in data [6, 13]. In recent machine learning based detection works,
robotic tasks including localization, mapping, object detection a number of features have been hand-crafted to classify the
and scene parsing [16]. candidates. Triebel et al. [29], Wang et al. [32], Teichman
In the task of object detection, range scans have an specific et al. [28] use shape spin images, shape factors and shape
advantage over camera images in localizing the detected distributions. Teichman et al. [28] also encodes the object mov-
objects. Since range scans contain the spatial coordinates of ing track information for classification. Papon et al. [21] uses
the 3D point cloud by nature, it is easier to obtain the pose and FPFH. Other features include normal orientation, distribution
shape of the detected objects. On a robotic system including histogram and etc. A comparison of features can be found
both perception and control modules, e.g. an autonomous in [1]. Besides the hand-crafted features, Deuge et al. [4], Lai
vehicle, accurately localizing the obstacle vehicles in the 3D et al. [15] explore to learn feature representation of point cloud
coordinates is crucial for the subsequent planning and control via sparse coding.
stages. We would also like to mention that object detection on
In this paper, we design a fully convolutional network RGBD images [3, 17] is closely related to the topic of
(FCN) to detect and localize objects as 3D boxes from range object detection on range scan. The depth channel can be
scan data. FCN has achieved notable performance in computer interpreted as a range scan and naturally applies to some
vision based detection tasks. This paper transplants FCN to the detection algorithms designed for range scan. On the other
detection task on 3D range scans. We strict our scenario as 3D hand, numerous researches have been done on exploiting both
vehicle detection for an autonomous driving system, using a depth and RGB information in object detection tasks. We omit
Velodyne 64E lidar. The approach can be generalized to other detailed introduction about traditional literatures on RGBD
object detection tasks on other similar lidar devices. data here but the proposed algorithm in this paper can also
be generalized to RGBD data.
II. R ELATED W ORKS
A. Object Detection from Range Scans B. Convolutional Neural Network on Object Detection
Tranditional object detection algorithms propose candidates The Convolutional Neural Network (CNN) has achieved
in the point cloud and then classify them as objects. A common notable succuess in the areas of object classification and detec-
category of the algorithms propose candidates by segmenting tion on images. We mention some state-of-the-art CNN based
the point cloud into clusters. In some early works, rule-based detection framework here. R-CNN [8] proposes candidate
segmentation is suggested for specific scene [10, 20, 5]. For regions and uses CNN to verify candidates as valid objects.
(a) (b)

(c) (d)
Fig. 1. Data visualization generated at different stages of the proposed approach. (a) The input point map, with the d channel visualized. (b) The output
confidence map of the objectness branch at oa p . Red denotes for higher confidence. (c) Bounding box candidates corresponding to all points predicted as
positive, i.e. high confidence points in (b). (d) Remaining bounding boxes after non-max suppression. Red points are the groundtruth points on vehicles for
reference.

OverFeat [25], DenseBox [11] and YOLO [23] uses end-to- where p = (x, y, z)> denotes a 3D point and (r, c) denotes the
end unified FCN frameworks which predict the objectness con- 2D map position of its projection. and denote the azimuth
fidence and the bounding boxes simultaneously over the whole and elevation angle when observing the point. and is
image. Some research has also been focused on applying CNN the average horizontal and vertical angle resolution between
on 3D data. For example on RGBD data, one common aspect consecutive beam emitters, respectively. The projected point
is to treat the depthmaps as image channels and use 2D CNN map is analogous to cylindral images. We fill the element at
for classification or detection [9, 24, 26]. For 3D range scan (r, c)pin the 2D point map with 2-channel data (d, z) where
some works discretize point cloud along 3D grids and train d = x2 + y 2 . Note that x and y are coupled as d for rotation
3D CNN structure for classification [33, 19]. These classifiers invariance around z. An example of the d channel of the 2D
can be integrated with region proposal method like sliding point map is shown in Figure 1a. Rarely some points might
window [27] for detection tasks. The 3D CNN preserves more be projected into a same 2D position, in which case the point
3D spatial information from the data than 2D CNN while 2D nearer to the observer is kept. Elements in 2D positions where
CNN is computationally more efficient. no 3D points are projected into are filled with (d, z) = (0, 0).
In this paper, our approach project range scans as 2D maps
similar to the depthmap of RGBD data. The frameworks of B. Network Architecture
Huang et al. [11], Sermanet et al. [25] are transplanted to The trunk part of the proposed CNN architecture is similar
predict the objectness and the 3D object bounding boxes in a to Huang et al. [11], Long et al. [18]. As illustrated in Figure
unified end-to-end manner. 2, the CNN feature map is down-sampled consecutively in the
III. A PPROACH first 3 convolutional layers and up-sampled consecutively in
deconvolutional layers. Then the trunk splits at the 4th layer
A. Data Preparation into a objectness classification branch and a 3D bounding box
We consider the point cloud captured by the Velodyne 64E regression branch. We describe its implementation details as
lidar. Like other range scan data, points from a Velodyne scan follows:
can be roughly projected and discretized into a 2D point map, The input point map, output objectness map and bounding
using the following projection function. box map are of the same width and height, to provide
= atan2(y, x) point-wise prediction. Each element of the objectness
p map predicts whether its corresponding point is on a
= arcsin(z/ x2 + y 2 + z 2 )
(1) vehicle. If the corresponding point is on a vehicle, its
r = b/c corresponding element in the bounding box map predicts
c = b/c the 3D bounding box of the belonging vehicle. Section
concat bounding box map
point map deconv5b deconv6b (obp )

Fig. 2. The proposed FCN structure to

predict vehicle objectness and bounding
box simultaneously. The output feature map
of conv1/deconv5a, conv1/deconv5b and
conv2/deconv4 are first concatenated and then
ported to their consecutive layers, respectively.

conv1 conv2 conv3 concat concat deconv6a (oap)

deconv4 deconv5a objectness map

III-C explains how the objectness and bounding box is Corresponding to this 24d vector, deconv6b outputs a 24-
encoded. channel feature map accordingly.
In conv1, the point map is down-sampled by 4 hori- The transform (3) is designed due to the following two
zontally and 2 vertically. This is because for a point reasons:
map captured by Velodyne 64E, we have approximately Translation part Compared to cp which distributes over
= 2, i.e. points are denser on horizotal direction. the whole lidar perception range, e.g. [100m, 100m]
Similarly, the feature map is up-sampled by this factor of [100m, 100m] for Velodyne, the corner offset cp p
(4, 2) in deconv6a and deconv6b, respectively. The rest distributes in a much smaller range, e.g. within size of a
conv/deconv layers all have equal horizontal and vertical vehicle. Experiments show that it is easier for the CNN
resolution, respectively, and use squared strides of (2, 2) to learn the latter case.
when up-sampling or down-sampling. Rotation part R
>
ensures the rotation invariance of the
The output feature map pairs of conv3/deconv4, corner coordinate encoding. When a vehicle is moving
conv2/deconv5a, conv2/deconv5b are of the same sizes, around a circle and one observes it from the center, the
respectively. We concatenate these output feature map appearance of the vehicle does not change in the observed
pairs before passing them to the subsequent layers. This range scan but the bounding box coordinates vary in the
follows the idea of Long et al. [18]. Combining features range scan coordinate system. Since we would like to
from lower layers and higher layers improves the predic- ensure that same appearances result in same bounding
tion of small objects and object edges. box prediction encoding, the bounding box coordinates
are rotated by R> to be invariant. Figure 3b illustrates a
C. Prediction Encoding
simple case. Vehicle A and B have the same appearance
We now describe how the output feature maps are defined. for an observer at the center, i.e. the right side is observed.
The objectness map deconv6a consists of 2 channels corre- Vehicle C has a difference appearance, i.e. the rear-right
sponding to foreground, i.e. the point is on a vehicle, and part is observed. With the conversion of (3), the bounding
background. The 2 channels are normalized by softmax to box encoding b0p of A and B are the same but that of C
denote the confidence. is different.
The encoding of the bounding box map requires some extra
conversion. Consider a lidar point p = (x, y, z) on a vehicle. D. Training Phase
Its observation angle is (, ) by (1). We first denote a rotation 1) Data Augmentation: Similar to the training phase of a
matrix R as CNN for images, data augmentation significantly enhances the
R = Rz ()Ry () (2) network performance. For the case of images, training data
are usually augmented by randomly zooming or rotating the
where Rz () and Ry () denotes rotations around z and y
original images to synthesis more training samples. For the
axes respectively. If denote R as (rx , ry , rz ), rx is of the
case of range scans, simply applying these operations results
same direction as p and ry is parallel with the horizontal
in variable and in (1), which violates the geometry
plane. Figure 3a illustrate an example on how R is formed. A
property of the lidar device. To synthesis geometrically correct
bounding box corner cp = (xc , yc , zc ) is thus transformed as:
3D range scans, we randomly generate a 3D transform near
c0p = R> (cp p) (3) identity. Before projecting point cloud by (1), the random
transform is applied the point cloud. The translation com-
Our proposed approach uses c0p to encode the bounding box ponent of the transform results in zooming effect of the
corner of the vehicle which p belongs to. The full bounding synthesized range scan. The rotation component results in
box is thus encoded by concatenating 8 corners in a 24d vector rotation effect of the range scan.
as 2) Multi-Task Training: As illustrated Section III-B, the
b0p = (c0> 0> 0> >
p,1 , cp,2 , . . . , cp,8 ) (4) proposed network consists of one objectness classification
Fig. 3. (a) Illustration of (3). For
each vehicle point p, we define a
A C specific coordinate system which is
centered at p. The x axis (rx ) of
the coordinate system is along with
the ray from Velodyne origin to p
B (dashed line). (b) An example illus-
tration about the rotation invariance
when observing a vehicle. Vehicle
rz A and B have same appearance. See
(3) in Section III-C for details.
(a) ry (b)
p rx

branch and one bounding box regression branch. We respec- vehicle samples at different distances also need to be balanced.
tively denote the losses of the two branches in the training This helps avoid the prediction to bias towards near vehicles
phase. As notation, denote oap and obp as the feature map and neglect far vehicles or occluded vehicles. Denote n(p) as
output of deconv6a and deconv6b corresponding to point p the number of points belonging to the same vehicle with p.
respectively. Also denote P as the point cloud and V P as Since the 3D range scan points are almost uniquely projected
all points on all vehicles. onto the point map. n(p) is also the area of the vehicle of p
The loss of the objectness classification branch correspond- on the point map. Denote n as the average number of points
ing to a point p is denoted as a softmax loss of vehicles in the whole dataset. We re-weight Lobj (p) and
Lbox (p) by w2 as
Lobj (p) = log(pp )
(
exp(oap,lp ) (5) /n(p) p V
n
pp = P a
w2 (p) = (8)
l{0,1} exp(op,l ) 1 pP V
where lp {0, 1} denotes the groundtruth objectness label Using the losses and weights designed above, we accumu-
of p, i.e. 0 as background and 1 as a point on vechicles. oap,? late losses over deconv6a and deconv6b for the final training
denotes the deconv6a feature map output of channel ? for point loss
p. X X
The loss of the bounding box regression branch correspond- L= w1 (p)w2 (p)Lobj (p) + wbox w2 (p)Lbox (p) (9)
ing to a point p is denoted as a L2-norm loss pP pV

Lbox (p) = kobp b0p k2 (6) with wbox used to balance the objectness loss and the bounding
box loss.
where b0p is a 24d vector denoted in (4). Note that Lbox is
only computed for those points on vehicles. For non-vehicle E. Testing Phase
points, the bounding box loss is omitted. During the test phase, a range scan data is fed to the
3) Training strategies: Compared to positive points on network to produce the objectness map and the bounding
vehicles, negative (background) points account for the majority box map. For each point which is predicted as positive
portion of the point cloud. Thus if simply pass all objectness in the objectness map, the corresponding output obp of the
losses in (5) in the backward procedure, the network prediction bounding box map is splitted as c0p,i , i = 1, . . . , 8. c0p,i is
will significantly bias towards negative samples. To avoid then converted to box corner cp,i by the inverse transform of
this effect, losses of positive and negative points need to be (3). We denote each bounding box candidates as a 24d vector
balanced. Similar balance strategies can be found in Huang bp = (c> > > >
p,1 , cp,2 , , cp,8 ) . The set of all bounding box
et al. [11] by randomly discarding redundant negative losses. candidates is denoted as B = {bp |oap,1 > oap,0 }. Figure 1c
In our training procedure, the balance is done by keeping all shows the bounding box candidates of all the points predicted
negative losses but re-weighting them using as positive.
(
k|V|/(|P| |V|) p P V We next cluster the bounding boxes and prune outliers by
w1 (p) = (7) a non-max suppression strategy. Each bounding box bp is
1 pV
scored by counting its neighbor bounding boxes in B within
which denotes that the re-weighted negative losses are aver- a distance , denoted as #{x B|kx bp k < }. Bounding
agely equivalent to losses of k|V| negative samples. In our case boxes are picked from high score to low score. After one
we choose k = 4. Compared to randomly discarding samples, box is picked, we find out all points inside the bounding box
the proposed balance strategy keeps more information of and remove their corresponding bounding box candidates from
negative samples. B. Bounding box candidates whose score is lower than 5 is
Additionally, near vehicles usually account for larger por- discarded as outliers. Figure 1d shows the picked bounding
tion of points than far vehicles and occluded vehicles. Thus boxes for Figure 1a.
(a) (b)
Fig. 4. More examples of the detection results. See Section IV-A for details. (a) Detection result on a congested traffic scene. (b) Detection result on far
vehicles.

TABLE I
IV. E XPERIMENTS P ERFORMANCE IN AVERAGE P RECISION AND AVERAGE O RIENTATION
S IMILARITY FOR THE O FFLINE E VALUATION
Our proposed approach is evaluated on the vehicle de-
tection task of the KITTI object detection benchmark [7]. Easy Moderate Hard
This benchmark originally aims to evaluate object detection Image Space (AP) 74.1% 71.0% 70.0%
Image Space (AOS) 73.9% 70.9% 69.9%
of vehicles, pedestrians and cyclists from images. It contains World Space (AP) 77.3% 72.4% 69.4%
not only image data but also corresponding Velodyne 64E World Space (AOS) 77.2% 72.3% 69.4%
range scan data. The groundtruth labels include both 2D object
bounding boxes on images and its corresponding 3D bounding
boxes, which provides sufficient information to train and test
1.0
detection algorithm on range scans. The KITTI training dataset
contains 7500+ frames of data. We randomly select 6000 0.8
frames in our experiments to train the network and use the rest
1500 frames for detailed offline validation and analysis. The 0.6
Precision

KITTI online evaluation is also used to compare the proposed

approach with previous related works. 0.4
For simplicity of the experiments, we focus our experiemts Easy
only on the Car category of the data. In the training phase, 0.2 Moderate
we first label all 3D points inside any of the groundtruth Hard
car 3D bounding boxes as foreground vehicle points. Points 0.0
0.0 0.2 0.4 0.6 0.8 1.0
from objects of categories like Truck or Van are labeled to be Recall
ignored from P since they might confuse the training. The rest
of the points are labeled as background. This forms the label Fig. 5. Precision-recall curve in the offline evaluation, measured by the world
space criterion. See Section IV-A.
lp in (5). For each foreground point, its belonging bounding
box is encoded by (4) to form the label b0p in (6).
The experiments are based on the Caffe [12] framework. In
the KITTI object detection benchmark, images are captured predicts the 3D bounding boxes of the vehicles, we evaluate
from the front camera and range scans percept a 360 FoV the approach in both the image space and the world space in
of the environment. The benchmark groundtruth are only the offline validation. Compared to the image space, metric in
provided for vehicles inside the image. Thus in our experiment the world space is more crucial in the scenario of autonomous
we only use the front part of a range scan which overlaps with driving. Because for example many navigation and planning
the FoV of the front camera. algorithms take the bounding box in world space as input for
The KITTI benchmark divides object samples into three obstacle avoidance. Section IV-A describes the evaluation in
difficulty levels according to the size and the occlusion of the both image space and world space in our offline validation. In
2D bounding boxes in the image space. A detection is accepted Section IV-B, we compare the proposed approach with several
if its image space 2D bounding box has at least 70% overlap previous range scan detection algorithms via the KITTI online
with the groundtruth. Since the proposed approach naturally evaluation system.
TABLE II
A. Performane Analysis on Offline Evaluation P ERFORMANCE C OMPARISON IN AVERAGE P RECISION AND AVERAGE
We analyze the detection performance on our custom offline O RIENTATION S IMILARITY FOR THE O NLINE E VALUATION
evaluation data selected from the KITTI training dataset,
Easy Moderate Hard
whose groundtruth labels are accessable to public. To obtain Proposed 60.3% 47.5% 42.7%
an equivalent 2D bounding box for the original KITTI criterion Image Space (AP)
Vote3D 56.8% 48.0% 42.6%
in the image space, we projected the 3D bounding box into CSoR 34.8% 26.1% 22.7%
mBoW 36.0% 23.8% 18.4%
the image space and take the minimum 2D bounding rectangle Proposed 59.1% 45.9% 41.1%
Image Space (AOS)
as the 2D bounding box. For the world space evaluation, we CSoR 34.0% 25.4% 22.0%
project the detected and the groundtruth 3D bounding boxes
onto the ground plane and compute their overlap. The world
space criterion also requires at least 70% overlap to accept performance for far and occluded objects. Second, the image
a detection. The performance of the approach is measured space based criterion does not reflect the advantage of range
by the Average Precision (AP) and the Average Orientation scan methods in localizing objects in full 3D world space.
Similarity (AOS) [7]. The AOS is designed to jointly measure Related explanation can also be found from Wang and Posner
the precision of detection and orientation estimation. [31]. Thus in this experiments, we only compare the proposed
Table I lists the performance evaluation. Note that the world approach with range scan methods of Wang and Posner
space criterion results in slightly better performance than the [31], Behley et al. [2], Plotkin [22]. These three methods all
image space criterion. This is because the user labeled 2D use traditional features for classification. Wang and Posner
bounding box trends to be tighter than the 2D projection of [31] performs a sliding window based strategy to generate
the 3D bounding boxes in the image space, especially for candidates and Behley et al. [2], Plotkin [22] segment the point
vehicles observed from their diagonal directions. This size cloud to generate detection candidates.
difference diminishes the overlap between the detection and Table II shows the performance of the methods in AP and
the groundtruth in the image space. AOS reported on the KITTI online evaluation. The detection
Like most detection approaches, there is a noticeable drop AP of our approach outperforms the other methods in the
of performance from the easy evaluation to the moderate and easy task, which well illustrates the advantage of CNN in
hard evaluation. The minimal pixel height for easy samples representing rich features on near vehicles. In the moderate and
is 40px. This approximately corresponds to vehicles within hard detection tasks, our approach performs with similar AP as
28m. The minimal height for moderate and hard samples is Wang and Posner [31]. Because vehicles in these tasks consist
25px, corresponding to minimal distance of 47m. As shown of too few points for CNN to embed complicated features. For
in Figure 4 and Figure 1, some vehicles farther than 40m are the joint detection and orientation estimation evaluation, only
scanned by very few points and are even difficult to recognize our approach and CSoR support orientation estimation and our
for human. This results in the performance drop for moderate approach significantly wins the comparison in AOS.
and hard evalutaion.
Figure 5 shows the precision-recall curve of the world
space criterion as an example. Precision-recall curves of the V. C ONCLUSIONS
other criterion are similar and omitted here. Figure 4a shows
Although attempts have been made in a few previous
the detection result on a congested traffic scene with more
research to apply deep learning techniques on sensor data
than 10 vehicles in front of the lidar. Figure 4b shows the
other than images, there is still a gap inbetween this state-of-
detection result cars farther than 50m. Note that our algorithm
the-art computer vision techniques and the robotic perception
predicts the completed bounding box even for vehicles which
research. To the best of our knowledge, the proposed approach
are only partly visible. This significantly differs from previous
is the first to introduce the FCN detection techniques into
proposal-based methods and can contribute to stabler object
the perception on range scan data, which results in a neat
tracking and path planning results. For the easy evaluation,
and end-to-end detection framework. In this paper we only
the algorithm detects almost all vehicles, even occluded. This
evaluate the approach on 3D range scan from Velodyne 64E
is also illustrated in Figure 5 where the maximum recall rate
but the approach can also be applied on 3D range scan
is higher than 95%. The approach produces false-positive
from similar devices. By accumulating more training data and
detection in some occluded scenes, which is illustrated in
design deeper network, the detection performance can be even
Figure 4a for example.
further improved.
B. Related Work Comparison on the Online Evaluation
There have been several previous works in range scan based VI. ACKNOWLEDGEMENT
detection evaluated on the KITTI platform. Readers might
find that the performance of these works ranks much lower The author would like to acknowledge the help from Ji
compared to the state-of-the-art vision-based approaches. We Liang, Lichao Huang, Degang Yang, Haoqi Fan and Yifeng
explain this by two reasons. First, the image data have much Pan in the research of deep learning. Thanks also go to Ji
higher resolution which significantly enhance the detection Tao, Kai Ni and Yuanqing Lin for their support.
R EFERENCES [14] Klaas Klasing, Dirk Wollherr, and Martin Buss. A
clustering method for efficient segmentation of 3D laser
[1] Jens Behley, Volker Steinhage, and Armin B Cremers. data. Conference on Robotics and Automation, ICRA
Performance of Histogram Descriptors for the Classifi- 2008. IEEE International, pages 40434048, 2008.
cation of 3D Laser Range Data in Urban Environments. [15] Kevin Lai, Liefeng Bo, and Dieter Fox. Unsupervised
2012 IEEE International Conference on Robotics and Feature Learning for 3D Scene Labeling. IEEE Inter-
Automation, pages 43914398, 2012. national Conference on Robotics and Automation (ICRA
[2] Jens Behley, Volker Steinhage, and Armin B. Cremers. 2014), pages 30503057, 2014.
Laser-based segment classification using a mixture of [16] J. Levinson and S. Thrun. Robust vehicle localization in
bag-of-words. IEEE International Conference on Intelli- urban environments using probabilistic maps. Robotics
gent Robots and Systems, (1):41954200, 2013. and Automation (ICRA), 2010 IEEE International Con-
[3] Xiaozhi Chen, Kaustav Kundu, Yukun Zhu, Andrew G ference on, 2010.
Berneshawi, Huimin Ma, Sanja Fidler, and Raquel Ur- [17] Dahua Lin, Sanja Fidler, and Raquel Urtasun. Holistic
tasun. 3d object proposals for accurate object class scene understanding for 3D object detection with RGBD
detection. Advances in Neural Information Processing cameras. Proceedings of the IEEE International Confer-
Systems, pages 424432, 2015. ence on Computer Vision, pages 14171424, 2013.
[4] Mark De Deuge, F Robotics, and Alastair Quadros. [18] Jonathan Long, Evan Shelhamer, and Trevor Darrell.
Unsupervised Feature Learning for Classification of Out- Fully convolutional networks for semantic segmentation.
door 3D Scans. Araa.Asn.Au, pages 24, 2013. arXiv preprint arXiv:1411.4038, 2014.
[5] B. Douillard, J. Underwood, N. Kuntz, V. Vlaskine, [19] Daniel Maturana and Sebastian Scherer. VoxNet : A
a. Quadros, P. Morton, and a. Frenkel. On the segmen- 3D Convolutional Neural Network for Real-Time Object
tation of 3D lidar point clouds. Proceedings - IEEE Recognition. pages 922928, 2015.
International Conference on Robotics and Automation, [20] Frank Moosmann, Oliver Pink, and Christoph Stiller.
pages 27982805, 2011. Segmentation of 3D lidar data in non-flat urban environ-
[6] O.D. Faugeras and M. Hebert. The Representation, ments using a local convexity criterion. IEEE Intelligent
Recognition, and Locating of 3-D Objects. The Interna- Vehicles Symposium, Proceedings, pages 215220, 2009.
tional Journal of Robotics Research, 5(3):2752, 1986. [21] Jeremie Papon, Alexey Abramov, Markus Schoeler, and
[7] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are Florentin Worgotter. Voxel cloud connectivity segmenta-
we ready for autonomous driving? the KITTI vision tion - Supervoxels for point clouds. Proceedings of the
benchmark suite. Proceedings of the IEEE Computer IEEE Computer Society Conference on Computer Vision
Society Conference on Computer Vision and Pattern and Pattern Recognition, pages 20272034, 2013.
Recognition, pages 33543361, 2012. [22] Leonard Plotkin. Pydriver: Entwicklung eines frame-
[8] Ross Girshick, Jeff Donahue, Trevor Darrell, U C Berke- works fur raumliche detektion und klassifikation von
ley, and Jitendra Malik. Rich feature hierarchies for objekten in fahrzeugumgebung. Bachelors thesis (Stu-
accurate object detection and semantic segmentation. dienarbeit), Karlsruhe Institute of Technology, Germany,
Cvpr14, pages 29, 2014. March 2015.
[9] S Gupta, R Girshick, P Arbelaez, and J Malik. Learning [23] Joseph Redmon, Ross Girshick, and Ali Farhadi. You
Rich Features from RGB-D Images for Object Detec- Only Look Once: Unified, Real-Time Object Detection.
tion and Segmentation. arXiv preprint arXiv:1407.5736, arXiv, 2015.
pages 116, 2014. [24] Max Schwarz, Hannes Schulz, and Sven Behnke. RGB-D
[10] Michael Himmelsbach, Felix V Hundelshausen, and Object Recognition and Pose Estimation based on Pre-
Hans-Joachim Wunsche. Fast segmentation of 3d point trained Convolutional Neural Network Features. IEEE
clouds for ground vehicles. Intelligent Vehicles Sympo- International Conference on Robotics and Automation
sium (IV), 2010 IEEE, pages 560565, 2010. (ICRA), (May), 2015.
[11] Lichao Huang, Yi Yang, Yafeng Deng, and Yinan Yu. [25] Pierre Sermanet, David Eigen, Xiang Zhang, Michael
DenseBox: Unifying Landmark Localization with End Mathieu, Rob Fergus, and Yann LeCun. OverFeat
to End Object Detection. pages 113, 2015. : Integrated Recognition , Localization and Detec-
[12] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey tion using Convolutional Networks. arXiv preprint
Karayev, Jonathan Long, Ross B Girshick, Sergio arXiv:1312.6229, pages 115, 2013.
Guadarrama, and Trevor Darrell. Caffe: Convolutional [26] Richard Socher, Brody Huval, Bharath Bath, Christo-
architecture for fast feature embedding. ACM Multime- pher D Manning, and Andrew Y Ng. Convolutional-
dia, 2:4, 2014. recursive deep learning for 3d object classification. Ad-
[13] Andrew E Johnson and Martial Hebert. Using spin vances in Neural Information Processing Systems, pages
images for efficient object recognition in cluttered 3d 665673, 2012.
scenes. Pattern Analysis and Machine Intelligence, IEEE [27] Shuran Song and Jianxiong Xiao. Sliding shapes for 3d
Transactions on, 21(5):433449, 1999. object detection in depth images. pages 634651, 2014.
[28] Alex Teichman, Jesse Levinson, and Sebastian Thrun.
Towards 3D object recognition via classification of ar-
bitrary object tracks. Proceedings - IEEE International
Conference on Robotics and Automation, pages 4034
4041, 2011.
[29] Rudolph Triebel, Jiwon Shin, and Roland Siegwart.
Segmentation and Unsupervised Part-based Discovery of
Repetitive Objects. Robotics: Science and Systems, 2006.

[30] Rudolph Triebel, Richard Schmidt, Oscar Martnez Mo-
zos, and Wolfram Burgard. Instance-based amn classifi-
cation for improved object recognition in 2d and 3d laser
range data. Proceedings of the 20th international joint
conference on Artifical intelligence, pages 22252230,
2007.
[31] Dominic Zeng Wang and Ingmar Posner. Voting for vot-
ing in online point cloud object detection. Proceedings
of Robotics: Science and Systems, Rome, Italy, 2015.
[32] Dominic Zeng Wang, Ingmar Posner, and Paul Newman.
What could move? Finding cars, pedestrians and bicy-
clists in 3D laser data. Proceedings - IEEE International
Conference on Robotics and Automation, pages 4038
4044, 2012.
[33] Zhirong Wu and Shuran Song. 3D ShapeNets : A
Deep Representation for Volumetric Shapes. IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR2015), pages 19, 2015.

Solidcam 5 Axis Tutorial: Volume 1 - Beginner
No ratings yet
Solidcam 5 Axis Tutorial: Volume 1 - Beginner
45 pages
The Meaning of Relativity
No ratings yet
The Meaning of Relativity
103 pages
Anand Bhat PHD Thesis
No ratings yet
Anand Bhat PHD Thesis
173 pages
Integrating Visual and Range Data For Robotic Object Detection
No ratings yet
Integrating Visual and Range Data For Robotic Object Detection
12 pages
Deep Continuous Fusion
No ratings yet
Deep Continuous Fusion
16 pages
TM 2122 AVEVA Marine 12 1 Project Administration Hull Rev 4 0 PDF
100% (1)
TM 2122 AVEVA Marine 12 1 Project Administration Hull Rev 4 0 PDF
94 pages
False Positive Removal For 3D Vehicle Detection With Penetrated Point Classifier
No ratings yet
False Positive Removal For 3D Vehicle Detection With Penetrated Point Classifier
5 pages
Second Progress Report UID - 17BCS2127
No ratings yet
Second Progress Report UID - 17BCS2127
13 pages
Patch Refinement - Localized 3D Object Detection
No ratings yet
Patch Refinement - Localized 3D Object Detection
10 pages
Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss
No ratings yet
Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss
10 pages
Voxelnet: End-To-End Learning For Point Cloud Based 3D Object Detection
No ratings yet
Voxelnet: End-To-End Learning For Point Cloud Based 3D Object Detection
10 pages
Autoshape: Real-Time Shape-Aware Monocular 3D Object Detection
No ratings yet
Autoshape: Real-Time Shape-Aware Monocular 3D Object Detection
11 pages
Deep Learning For LiDAR Point Clouds in Autonomous Driving A Review
No ratings yet
Deep Learning For LiDAR Point Clouds in Autonomous Driving A Review
21 pages
User's Manual HEIDENHAIN Conversational Format ITNC 530
100% (2)
User's Manual HEIDENHAIN Conversational Format ITNC 530
747 pages
Obstacle Detection and Classification Using Deep Learning For Tracking in High-Speed Autonomous Driving
No ratings yet
Obstacle Detection and Classification Using Deep Learning For Tracking in High-Speed Autonomous Driving
6 pages
Squeeze Seg
No ratings yet
Squeeze Seg
7 pages
Li, Kong 2022 - SRIF-RCNN Sparsely Represented Inputs Fusion of Different Sensors For 3D Object Detection
No ratings yet
Li, Kong 2022 - SRIF-RCNN Sparsely Represented Inputs Fusion of Different Sensors For 3D Object Detection
22 pages
A LiDAR-Camera Fusion 3D Object Detection Algorith
No ratings yet
A LiDAR-Camera Fusion 3D Object Detection Algorith
11 pages
OBJECT DETECTION IN AUTONOMOUS VEHICLES USING CNN Report FINAL
No ratings yet
OBJECT DETECTION IN AUTONOMOUS VEHICLES USING CNN Report FINAL
62 pages
Deep SCNN-Based Real-Time Object Detection For Self-Driving Vehicles Using LiDAR Temporal Data
No ratings yet
Deep SCNN-Based Real-Time Object Detection For Self-Driving Vehicles Using LiDAR Temporal Data
10 pages
HD Map
No ratings yet
HD Map
10 pages
Lang PointPillars Fast Encoders For Object Detection From Point Clouds CVPR 2019 Paper
No ratings yet
Lang PointPillars Fast Encoders For Object Detection From Point Clouds CVPR 2019 Paper
9 pages
Bai TransFusion Robust LiDAR-Camera Fusion For 3D Object Detection With Transformers CVPR 2022 Paper
No ratings yet
Bai TransFusion Robust LiDAR-Camera Fusion For 3D Object Detection With Transformers CVPR 2022 Paper
10 pages
Li DeepFusion Lidar-Camera Deep Fusion For Multi-Modal 3D Object Detection CVPR 2022 Paper
No ratings yet
Li DeepFusion Lidar-Camera Deep Fusion For Multi-Modal 3D Object Detection CVPR 2022 Paper
10 pages
Object and Face Detection Based On Center-Net 1
No ratings yet
Object and Face Detection Based On Center-Net 1
7 pages
BirdNet A 3D Object Detection Framework
No ratings yet
BirdNet A 3D Object Detection Framework
8 pages
3 CIE IGCSE Additional Mathematics Topical Maths Paper Equations Inequalties and Graphs
No ratings yet
3 CIE IGCSE Additional Mathematics Topical Maths Paper Equations Inequalties and Graphs
12 pages
Range View Based Fusion of Time-Series LiDAR Data For
No ratings yet
Range View Based Fusion of Time-Series LiDAR Data For
7 pages
Yang, Liu Et Al 2022 - Graph R-CNN - Towards Accurate 3D Object Detection With Semantic-Decorated Local Graph
No ratings yet
Yang, Liu Et Al 2022 - Graph R-CNN - Towards Accurate 3D Object Detection With Semantic-Decorated Local Graph
18 pages
Hangers Supports Reference Data Guide
No ratings yet
Hangers Supports Reference Data Guide
57 pages
RangeRCNN Towards Fast and Accurate 3D Object Detection
No ratings yet
RangeRCNN Towards Fast and Accurate 3D Object Detection
9 pages
Boundary Estimation Manuscript
No ratings yet
Boundary Estimation Manuscript
46 pages
RangeDet in Defense of Range View For LiDAR-Based 3D Object ICCV 2021 Paper
No ratings yet
RangeDet in Defense of Range View For LiDAR-Based 3D Object ICCV 2021 Paper
10 pages
(2021) Temporal-Channel - Transformer - For - 3D - Lidar-Based - Video - Object - Detection - For - Autonomous - Driving
No ratings yet
(2021) Temporal-Channel - Transformer - For - 3D - Lidar-Based - Video - Object - Detection - For - Autonomous - Driving
11 pages
3D-Cvf: Generating Joint Camera and Lidar Features Using Cross-View Spatial Feature Fusion For 3D Object Detection
No ratings yet
3D-Cvf: Generating Joint Camera and Lidar Features Using Cross-View Spatial Feature Fusion For 3D Object Detection
16 pages
Point-Trajectory Transformer For Efficient Temporal 3D Object Detection
No ratings yet
Point-Trajectory Transformer For Efficient Temporal 3D Object Detection
10 pages
Click Here Lol For Real Life Examples and A More Detailed Cheat Sheet
No ratings yet
Click Here Lol For Real Life Examples and A More Detailed Cheat Sheet
17 pages
TransFusion Robust LiDAR-Camera Fusion For 3D Object Detection With Transformers
No ratings yet
TransFusion Robust LiDAR-Camera Fusion For 3D Object Detection With Transformers
10 pages
Object Detection From A Few LIDAR Scanning Planes
No ratings yet
Object Detection From A Few LIDAR Scanning Planes
13 pages
48 Laws of Power Cheat Sheet
No ratings yet
48 Laws of Power Cheat Sheet
17 pages
Volumetric and Multi-View CNNs For Object Classification On 3D Data
No ratings yet
Volumetric and Multi-View CNNs For Object Classification On 3D Data
14 pages
Fast and Furious
No ratings yet
Fast and Furious
9 pages
Obstacle Detection For Autonomus Vehicles Using 3D LiDAR Point Cloud Data
No ratings yet
Obstacle Detection For Autonomus Vehicles Using 3D LiDAR Point Cloud Data
14 pages
Centre of Gravity Micro-Project
100% (4)
Centre of Gravity Micro-Project
10 pages
Calculus of Variations C
100% (1)
Calculus of Variations C
22 pages
Entropy 25 00635
No ratings yet
Entropy 25 00635
35 pages
Shift RCNN
No ratings yet
Shift RCNN
2 pages
Object Detection Techniques A Review
No ratings yet
Object Detection Techniques A Review
9 pages
0 Final PDF
No ratings yet
0 Final PDF
44 pages
Semi Detailed Lesson Plan in Mathematics 1
100% (1)
Semi Detailed Lesson Plan in Mathematics 1
5 pages
Dobot Scratch User Guide V1.3.1 PDF
100% (1)
Dobot Scratch User Guide V1.3.1 PDF
67 pages
MCQ Unit 1 Resolution and Composition of Forces
No ratings yet
MCQ Unit 1 Resolution and Composition of Forces
10 pages
Flureedb, A Practical Decentralized Database: Brian Andrew Kevin
No ratings yet
Flureedb, A Practical Decentralized Database: Brian Andrew Kevin
20 pages
Deep 3d Object Detection Networks Using Lidar Data
No ratings yet
Deep 3d Object Detection Networks Using Lidar Data
20 pages
Critical Thinking For Testers: Why Don't People Think Well?
No ratings yet
Critical Thinking For Testers: Why Don't People Think Well?
33 pages
A Source Book in APL
100% (1)
A Source Book in APL
146 pages
103 Exercises
No ratings yet
103 Exercises
70 pages
3D Vehicle Detection Using Multi-Level Fusion From Point Clouds and Images
No ratings yet
3D Vehicle Detection Using Multi-Level Fusion From Point Clouds and Images
9 pages
Advanced Regression Methods: 1. Reminders On Linear Regression
No ratings yet
Advanced Regression Methods: 1. Reminders On Linear Regression
109 pages
Frustum PointNets For 3D Object Detection From RGB-D Data
No ratings yet
Frustum PointNets For 3D Object Detection From RGB-D Data
10 pages
2 RegularizedRegression
No ratings yet
2 RegularizedRegression
25 pages
Computational Thinking Activity Book
No ratings yet
Computational Thinking Activity Book
31 pages
3D Object Detection For Autonomous Driving: A Survey: Qian, Lai and Li
No ratings yet
3D Object Detection For Autonomous Driving: A Survey: Qian, Lai and Li
19 pages
Curvilinear 1 PDF
No ratings yet
Curvilinear 1 PDF
8 pages
Scholarworks at Utrgv Scholarworks at Utrgv
No ratings yet
Scholarworks at Utrgv Scholarworks at Utrgv
15 pages
Applp: A Dialogue On Applications of Logic Programming: David S. Warren Yanhong A. Liu Stony Brook University
No ratings yet
Applp: A Dialogue On Applications of Logic Programming: David S. Warren Yanhong A. Liu Stony Brook University
33 pages
OD Trans Ishan-Mishra2021 A
No ratings yet
OD Trans Ishan-Mishra2021 A
12 pages
CHP 5 Gradient of A Straight Line
No ratings yet
CHP 5 Gradient of A Straight Line
12 pages
Bostrom 2009 C
No ratings yet
Bostrom 2009 C
31 pages
4 - Engineering Vehicles Detection For Warehouse Surveillance System Based On Modified YOLOv4-Tiny
No ratings yet
4 - Engineering Vehicles Detection For Warehouse Surveillance System Based On Modified YOLOv4-Tiny
17 pages
Nabati和Qi - 2021 - CenterFusion Center-based Radar and Camera Fusion for 3D Object Detection
No ratings yet
Nabati和Qi - 2021 - CenterFusion Center-based Radar and Camera Fusion for 3D Object Detection
10 pages
Unit-1 DirectX Pipeline and Programming (E-Next - In)
No ratings yet
Unit-1 DirectX Pipeline and Programming (E-Next - In)
21 pages
The Trust Advantage: How To Win With Big Data
No ratings yet
The Trust Advantage: How To Win With Big Data
18 pages
DSE12P1E Set1
No ratings yet
DSE12P1E Set1
21 pages
Wave Optics Module: User's Guide
No ratings yet
Wave Optics Module: User's Guide
180 pages
Correlation Between Rock Fabrics and Physical Properties of Carbonate Reservoir Rocks
No ratings yet
Correlation Between Rock Fabrics and Physical Properties of Carbonate Reservoir Rocks
17 pages
Fact Sheet
No ratings yet
Fact Sheet
4 pages
Cunningham - Perception, Meaning, and Mind
No ratings yet
Cunningham - Perception, Meaning, and Mind
19 pages
One Model To Learn Them All: Work Performed While at Google Brain
No ratings yet
One Model To Learn Them All: Work Performed While at Google Brain
10 pages
Unwedge Problem Sets
No ratings yet
Unwedge Problem Sets
29 pages
Jfreechart Tutorial
No ratings yet
Jfreechart Tutorial
31 pages
Overture To Functions With A Geometric Perspective
No ratings yet
Overture To Functions With A Geometric Perspective
81 pages
Wiegand Church 2018
No ratings yet
Wiegand Church 2018
13 pages
NeurIPS 2023 Rangeperception Taming Lidar Range View For Efficient and Accurate 3d Object Detection Paper Conference
No ratings yet
NeurIPS 2023 Rangeperception Taming Lidar Range View For Efficient and Accurate 3d Object Detection Paper Conference
13 pages
2014 10 Cho EMNLP
No ratings yet
2014 10 Cho EMNLP
11 pages
Fan RangeDet in Defense ICCV 2021 Supplemental
No ratings yet
Fan RangeDet in Defense ICCV 2021 Supplemental
3 pages
Marinov Divine Electromagnetism (1993)
No ratings yet
Marinov Divine Electromagnetism (1993)
290 pages
Cambridge O Level: PHYSICS 5054/42
No ratings yet
Cambridge O Level: PHYSICS 5054/42
16 pages
Intelligence Is Not Enough
100% (12)
Intelligence Is Not Enough
67 pages
Practical Guide 11 Spherical Coordinates
No ratings yet
Practical Guide 11 Spherical Coordinates
9 pages
Lab Manual Cadd Ii - 2023
No ratings yet
Lab Manual Cadd Ii - 2023
14 pages
Drawing Graphs: Unit 1 - Exercise 11
No ratings yet
Drawing Graphs: Unit 1 - Exercise 11
6 pages
Experimental Problems. Language: English: E1: Hidden Wire
No ratings yet
Experimental Problems. Language: English: E1: Hidden Wire
2 pages
Principles of Cost Control
No ratings yet
Principles of Cost Control
4 pages
Efficient and Accurate 3D Object Detection Using Six
No ratings yet
Efficient and Accurate 3D Object Detection Using Six
14 pages
Electromagnetic Field Theory: "Our Thoughts and Feelings Have Electromagnetic Reality. Manifest Wisely."
No ratings yet
Electromagnetic Field Theory: "Our Thoughts and Feelings Have Electromagnetic Reality. Manifest Wisely."
68 pages
Ref 14
No ratings yet
Ref 14
5 pages
2018 Frustum PointNets
No ratings yet
2018 Frustum PointNets
10 pages
Math Assignment Uit - 3
No ratings yet
Math Assignment Uit - 3
5 pages
Fast and Accurate Deep Learning-Based Framework For 3D Multi-Object Detector For Autonomous Vehicles
No ratings yet
Fast and Accurate Deep Learning-Based Framework For 3D Multi-Object Detector For Autonomous Vehicles
3 pages
Teste
No ratings yet
Teste
12 pages
Aiav Unit 2 Notes
No ratings yet
Aiav Unit 2 Notes
8 pages
16456-Article Text-19950-1-2-20210518
No ratings yet
16456-Article Text-19950-1-2-20210518
8 pages
JC Maths Syllabus-1
No ratings yet
JC Maths Syllabus-1
35 pages
Deep Learning For Lidar Point Clouds in Autonomous Driving: A Review
No ratings yet
Deep Learning For Lidar Point Clouds in Autonomous Driving: A Review
21 pages
نهائي 2021mdp422 Solution
No ratings yet
نهائي 2021mdp422 Solution
9 pages
Rukhovich ImVoxelNet Image To Voxels Projection For Monocular and Multi-View General-Purpose WACV 2022 Paper
No ratings yet
Rukhovich ImVoxelNet Image To Voxels Projection For Monocular and Multi-View General-Purpose WACV 2022 Paper
10 pages
Pseudo-Image and Sparse Points
No ratings yet
Pseudo-Image and Sparse Points
13 pages
Sensors
No ratings yet
Sensors
21 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet

1608 07916 PDF

Uploaded by

1608 07916 PDF

Uploaded by

Vehicle Detection from 3D Lidar Using Fully

Fig. 2. The proposed FCN structure to

conv1 conv2 conv3 concat concat deconv6a (oap)

KITTI online evaluation is also used to compare the proposed

You might also like