Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras

1) This paper proposes a method to detect vehicles in 3D from a single image from an uncalibrated traffic camera. It does this by estimating the homography between the road plane and image plane, and detecting rotated bounding boxes in bird's eye view images generated through inverse perspective mapping. 2) Two strategies are introduced to improve detection accuracy in the warped images: regressing "tailed r-boxes" and using a dual-view network architecture. 3) A data synthesis method is also proposed to generate training data visually similar to images from uncalibrated traffic cameras. Experiments show the method can generalize to new camera setups despite not seeing those cameras during training.

Uploaded by

trí nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views8 pages

Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras

Uploaded by

trí nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras

through Homography
Minghan Zhu1 , Songan Zhang1 , Yuanxin Zhong1 , Pingping Lu1 ,
Huei Peng1 and John Lenneman2

Abstract— This paper proposes a method to extract the

position and pose of vehicles in the 3D world from a single
traffic camera. Most previous monocular 3D vehicle detection
algorithms focused on cameras on vehicles from the perspec-
arXiv:2103.15293v1 [cs.CV] 29 Mar 2021

tive of a driver, and assumed known intrinsic and extrinsic

calibration. On the contrary, this paper focuses on the same
task using uncalibrated monocular traffic cameras. We observe
that the homography between the road plane and the image
plane is essential to 3D vehicle detection and the data synthesis
for this task, and the homography can be estimated without
the camera intrinsics and extrinsics. We conduct 3D vehicle
detection by estimating the rotated bounding boxes (r-boxes)
in the bird’s eye view (BEV) images generated from inverse
perspective mapping. We propose a new regression target called
tailed r-box and a dual-view network architecture which boosts
the detection accuracy on warped BEV images. Experiments
show that the proposed method can generalize to new camera
and environment setups despite not seeing imaged from them
during training.

I. INTRODUCTION
Traffic cameras are widely deployed today to monitor Fig. 1: The 3D vehicle detection problem is transformed to a 2D detection
traffic conditions especially around intersections. Camera problem in warped bird’s eye view (BEV) images. The orange lines attached
vision algorithms are developed to automate various tasks to each orange boxes are tails, defined in Sec. III-C.1 and Fig. 4, which are
regressed by the network to better handle distortions in BEV images.
including vehicle and pedestrian detection, tracking [1], and
re-identification [2]. However, most of them work in the 2D
image space. In this paper, we consider the task of monocular
3D detection, which is to detect the targets and to estimate available to users. Second, 3D annotations of images from
their positions and poses in the 3D world space from a single these traffic cameras are lacking, while there are some with
image. It could enable us to better understand the behaviors 2D annotations [7]–[9]. Some previous work tried to solve
of the targets from a traffic camera. the 3D object detection problem, but they posed some strong
Monocular camera 3D object detection is a non-trivial task assumptions such as known intrinsic/extrinsic calibration
since images lack depth information. A general strategy is to [10] or fixed orientation of the objects [11]. We extend the 3D
leverage the prior of the sizes of the objects of interest and detection to a more general setup without these assumptions.
the consistency between 3D and 2D detections established We leverage the homography between the road plane
through extrinsic and intrinsic parameters. To help improve and image plane as the only connection between the 3D
the performance, a series of datasets are published with 3D world and the 2D images. The homography can be estimated
object annotation associated with images. [3]–[5] are datasets conveniently using satellite images from public map service.
in driving scenarios, and [6] contains object-centric video As opposed to 3D bounding box detection which requires full
clips in general daily scenarios. calibration, we formulate the 3D object detection problem
However, these research efforts mostly cannot be directly as the detection of rotated bounding boxes in images from
applied to traffic cameras for two reasons. First, the intrin- bird’s eye view (BEV) generated using the homography,
sic/extrinsic calibration information of many cameras are not see Fig. 1. The homography also enables us to synthesize
images from the perspective of a traffic camera even if it
*This work was supported by the Collaborative Safety Research Center is not calibrated, which in return benefits the training of the
at the Toyota Motor North America Research & Development.
1 M. Zhu, S. Zhang, Y. Zhong, P. Lu, and H. Peng are with the University detection network. To address the problem of shape distortion
of Michigan, Ann Arbor, MI 48109, USA. {minghanz, songanz, introduced by the inverse perspective mapping (IPM), we
zyxin, pingpinl, hpeng}@umich.edu designed an innovative regression target called tailed r-box
2 J. Lenneman is with the Collaborative Safety Research Center at the
Toyota Motor North America Research & Development, Ann Arbor, MI as an extension to the conventional rotated bounding boxes,
48105, USA. [email protected] and introduced a dual-view network architecture.
The main contributions of this paper include: calibration, 3D bounding boxes can be constructed from 2D
1) We propose a method to estimate the pose and position bounding boxes or segmentation following the direction of
of vehicles in the 3D world using images from a vanishing points. There are two limitations of this approach.
monocular uncalibrated traffic camera. First, the calibration requires a lot of parallel landmarks
2) We propose two strategies to improve the accuracy of and/or traffic flow in one or two dominant directions, which
object detection using IPM images: (a) tailed r-box may not be the case in real traffic, e.g., roundabouts. Second,
regression, (b) dual-view network architecture. the construction of 3D bounding boxes assumes that all
3) We propose a data synthesis method to generate data vehicles are largely aligned in the direction of the vanishing
that are visually similar to images from an uncalibrated lines, which is not always true, including at curved lanes,
traffic camera. intersections with turning vehicles, and roundabouts.
4) Our work is open-sourced and software is available for [10] avoided the limitations mentioned above, by cal-
download at https://github.com/minghanz/ ibrating the 2D landmarks in images to 3D landmarks in
trafcam_3d. Lidar scans, obtaining full calibration of the camera. It is
The remainder of this paper is organized as follows. The apparently non-trivial to obtain Lidar scans for already-
literature review is given in Sec. II. The proposed method installed traffic camera, which limits the practicality to apply
for 3D detection is introduced in Sec. III. The dataset used this approach. The authors synthesized images of vehicles
for training and the data synthesis method are introduced in from CAD models on random background images as the
Sec. IV. The experimental setup and results are presented in training data. We adopt a similar approach, but we render
Sec. V. Section VI concludes the paper and discusses future the vehicles on the scene of the traffic cameras directly,
research ideas. which reduces the domain gap, while not requiring the
intrinsic/extrinsic calibration of the cameras.
II. R ELATED W ORK
A. Monocular 3D vehicle detection C. Rotated bounding box detection
A lot of work has been done in monocular 3D vehicle Rotated bounding box detection is useful in aerial image
detection. Our primary application is vehicle detection. Al- processing, where objects generally are not aligned to a
though the problem is theoretically ill-posed, most vehicles dominant direction as in our daily images, and are mostly
have similar shapes and sizes, allowing the network to axis-aligned to the gravity direction. Some representative
leverage such priors jointly with the 3D-2D consistency work include [19], [20]. Although the regression target is
determined by the camera intrinsics. For example, [12] very similar in this paper, the challenges are very different in
employed CAD models of vehicles as priors. [13] estimated that we deal with BEV images warped from the perspective
the depth from the consistency of the 2D bounding boxes of traffic cameras through IPM, which introduces severe
and the estimated 3D box dimensions. The object depth can distortion compared with original traffic camera images, and
also be estimated using a monocular depth network module have more occlusions compared with native bird’s eye view
[14]. Some work proposed to change to a space to deal with images (e.g., aerial images).
3D detection better. For example, [15] back-projected 2D
D. Perception networks with perspective transform
images to 3D space using estimated depth and detect 3D
bounding boxes in the 3D space directly. [16] transformed Several previous work also employed a similar idea of
original images to the bird’s eye view (BEV) where vehicles using perspective transform to conduct perception for BEV
can be localized with 2D coordinate, which is similar to images. [16] conducted object detection in warped BEV
our work, but they did not address the challenges caused by images, but it did not address the distortion effect in IPM,
distortion in perspective transform. In this paper, we identify as mentioned at the end of Sec. II-A. [21], [22] addressed
these challenges and propose new solutions. the distortion effect, but the discussions there are in the
context of segmentation. [21], [23] studied lane detection
B. Calibration and 3D vehicle detection for traffic cameras and segmentation, respectively. They employed perspective
Some previous work aimed at solving the 3D detection transform inside the network, which approach we adopt. The
problem from traffic cameras, but with different setup and difference is that they mainly used it to transform results to
assumptions. The detection approach is closely coupled with BEV, while in this work the warping is to fuse features from
the underlying calibration method, as the latter determines the original view images and the BEV images.
how to establish the 3D-2D relation. Therefore we review the
detection and the calibration methods together. A common III. P ROPOSED M ETHOD
type of calibration methods is based on vanishing point Our strategy is to transform the 3D vehicle detection
detection. Methods are proposed to detect vanishing points problem to the 2D rotated bounding box detection problem
from the major direction of vehicle movement and edge- in the bird’s eye view. We are mainly concerned about the
shaped landmarks in the scene [17], [18], from which the planar position of vehicles, and the vertical coordinate in the
rotational part of the extrinsic matrix can be solved. The height direction is of little importance since we assume a
intrinsic matrix (mainly the focal length) is estimated from vehicle is always on the (flat) ground. Under the moderate
the average size of vehicles and that in images. With the assumption of flat-Earth, and that the nonlinear distortion
Fig. 2: Overview of the 3D vehicle detection framework.

effect in the camera is negligible, the pixel coordinates in homography can be estimated if the corresponding points in
the bird’s eye view images are simply a scaling of the real the two planes are known. We find the corresponding points
world planar coordinates. by annotating the same set of landmarks in the traffic camera
There are several other merits to work on bird’s eye view image and in the map (e.g. Google Maps). Using the satellite
images. First, the rotated bounding boxes of the objects images on the map, we can retrieve the real world coordinate
at different distances have a consistent scale in the bird’s of the landmarks given a chosen local frame. With the set
eye view images, making it easier to detect remote objects. of corresponding points {(pworld
i , pori
i )}, the homography
world
Second, the rotated bounding boxes in the bird’s eye view Hori can be solved by Direct Linear Transformation
bev
do not overlap with one another, as oppose to 2D bounding (DLT). Given Hori , the original traffic camera images can
boxes in the original view. Nevertheless, working on bird’s be warped to BEV images.
eye view images also requires us to address the challenges
of distortion and occlusion, as mentioned above, and they B. Rotated bounding box detection in warped bird’s eye view
will be discussed in more details below in this section. (BEV) images
An overview of the proposed method is shown in Fig. 2. It
has three parts: homography calibration, vehicle detection in The rotated bounding box detection network in this pa-
warped BEV images, and data synthesis. The data synthesis per is developed based on YOLOv3 [24], by extending it
part is for network training and not directly related to the to support rotation prediction. We will abbreviate "rotated
detection methodology, therefore introduced later in Sec. IV. bounding box" as "r-box" in the following. We choose
In this section, the first two parts are introduced. YOLOv3, which is a one-stage detector, over two-stage
detectors (e.g. [25]) for the following two reasons. First, two-
A. Calibration of homography stage detectors have advantage in detecting small objects and
A planar homography is a mapping between two planes overlapping objects in crowed scenes, while in bird’s eye
which preserves collinearity, represented by a 3*3 matrix. view images the size of objects does not vary too much, and
We model the homography between the original image the r-boxes are not overlapping. Second, one-stage detectors
and the bird’s eye view image as a composition of two are faster. More recent network architectures like [26] should
homographies: also work.
bev bev world The network is extended to predict rotations by introduc-
Hori = Hworld Hori (1)
ing the yaw (r) dimension in both anchors and predictions.
where spa = Hba pb , denoting that Hba maps coordinates The anchors are now of the form (l, w, r), where r ∈ [0, π],
in frame b to coordinated in frame a up to a scale factor implying that we are not distinguishing the front end and
s, and p = [x, y, 1]T is the homogeneous coordinate of a rear end of vehicles in the network. Although the dimension
point in a plane. bev denotes the BEV image plane. world of the anchors increased by one, we do not increase the total
denotes the road plane in the real world. ori denotes the number of anchors, due to the fact that object size does not
bev vary too much in our bird’s eye view images. There are 9
original image plane. Hworld can be freely defined by users
as long as it is a similarity transform, preserving the angles anchors per YOLO prediction layers, and there are in total 3
between the real-world road plane and the bird’s eye view YOLO layers in the network, the same as in YOLOv3. The
world rotation angles of 9 anchors in a YOLO prediction layer are
image plane. Calibration is needed for Hori , denoting the
homography between the original image plane and road plane evenly distributed over the [0, π] interval.
in the real world. If the intrinsic and extrinsic parameters of a The network predicts the rotational angle offsets to the
camera is known or can be calibrated using existing methods, anchors. Denote the angle of an anchor as r0 , only anchors
the homography can be obtained following Eq. 5. Under with |r0 − rgt |< π/4 can be considered as positive, and for
circumstances where the full calibration is unavailable, the a positive anchor the rotation angle is predicted following
Fig. 3: R-boxes shown in BEV and original camera view. The red r-box
is completely occluded by surrounding vehicles, posing challenges for the Fig. 4: Definition of a tailed r-box and examples in synthetic training data.
network detection. Notice that there are still visible pixels of the vehicle (a) Definition of tailed r-box. The tail is defined as the line connecting
corresponding to the red r-box, therefore detecting it is possible, but needs the center of the bottom face and that of the top face of the 3D bounding
specific solutions (see Sec. III-C.1). box. (b) Tailed r-boxes in original view. (c) Tailed r-boxes in BEV. A tail
indicates the stretched pixels of a vehicle in BEV.

Eq. 2.
π By enforcing the network to predict the tail offset, the net-
rpred =(σ(x) − 0.5) + r0 (2)
2 work is guided to learn that the stretched pixels far from the
where x is the output of a convolution layer, and σ(·) is the r-box are also part of the objects. Especially when the bottom
sigmoid function. It follows that |rpred − r0 |< π/4. part of a vehicle is occluded, the network could still detect it
The loss function for angle prediction is in Eq. 3. Note from the visible pixels at the top, drastically improving the
that the angular residual rres = rpred − rgt ∈ (−π/2, π/2) recall rate (see Sec. V). In comparison, directly regressing
falls in a converging basin of the sin2 (·) function. the projection of the 3D bounding boxes in BEV can achieve
similar effect in guiding the network to leverage all pixels of
Lrotation = sin2 (rres ) = sin2 (rpred − rgt ) (3) a vehicle, but the projected location of the four top points is
harder to determine in BEV, and creates unnecessary burden
C. Special designs for detection in warped BEV images for the network.
With the above setup, the network is able to fulfill the 2) Dual-view network architecture: The distortion in IPM
proposed task, but the distortion introduced in the inverse makes remote objects larger than they really are, posing
perspective mapping poses some challenges to the network, difficulty for learning. To alleviate the problem caused by
which harm the performance. First, in bird’s eye view large receptive field requirements, we propose to use a dual-
images, a large portion of the pixels of vehicles are outside view network structure.
of the r-boxes. What makes it worse, when the vehicles are In the dual-view network, there are two feature extractors
crowded, the r-box area could be completely occluded and with identical structures and non-shared parameters, taking
the visible pixels of the vehicle are disjoint from the r-box BEV images and corresponding original view images as
(see Fig. 3), which makes it difficult for the network to infer. input respectively. The feature maps of original images are
Secondly, the IPM "stretches" the remote pixels, extending then transformed to BEV through IPM and concatenated with
the remote vehicles to a long shape. It requires the network to the feature maps of the BEV images. The IPM of feature
have large receptive field for each pixel to handle very large maps is similar to the IPM of raw images, with different
objects. Our proposed designs solve these two problems. homography matrices. The homography between the feature
1) Tailed r-box regression: We propose a new regression maps of original view and BEV can be calculated using
target called tailed r-box to address the problem that r-boxes Eq. 4.
could be disjoint from the visible pixels of objects. It is bev_f bev_f bev ori
Hori_f = Hbev Hori Hori_f (4)
constructed from the 3D bounding boxes in the original
bev_f ori
view. The tail is defined as the line connecting the center of where Hbev and Hori_f denotes the homography between
the bottom rectangle to that of the top rectangle of the 3D the input image coordinates and the feature map coordinates,
bounding box. After warping to BEV, the tail extends from which are mainly determined by the pooling layers and
the r-box center through the stretched body of the vehicle, convolution layers with strides. The network structure is
as shown in Fig. 4. Note that while the definition of tails shown in Fig. 5.
is in the original view images, the learning and inference With the dual-view architecture, pixels of a vehicle are
of tails can be done in the BEV images. In BEV images, spatially closer in the original view images than in the BEV
predicting tailed r-boxes corresponds to augmenting the images, making it easier to propagate information among the
prediction vector with two elements: utail , vtail , representing pixels. Then the intermediate feature warping stretches the
the offset from the center of r-box to the end of tail in BEV. information with IPM, propagating the consensus of nearby
Anchors are not parameterized with tails. pixels of an object in the original view to pixels of further
Fig. 6: Synthesizing images with real background captured by traffic
cameras and rendered vehicles using CAD models. The intrinsic/extrinsic
parameters corresponding to the background images are unknown, but we
can still render visually realistic images by sampling camera parameters that
keep the homography H invariant.

Fig. 5: Dual-view network architecture. Both the original view and BEV floating in the air, despite that the camera parameters of the
images are taken as input. The original view feature maps are warped
to BEV and concatenated with BEV feature maps. The warping (IPM)
background images are unknown?
stretches the vehicles to be very long in the BEV images, posing difficulty Our observation is that the plausibility mainly depends
to detection due to limited receptive field. The dual-view structure enables on the homography. In other words, if we can maintain the
feature learning before warping, where the object shapes are regular and the
knowledge propagation is easier.
same homography from the road plane to the image plane
in both foreground and background images, the composite
images will look like the vehicles are on the ground, as
seen in Fig. 6. The relation between homography and camera
distances in BEV. In the experiments we show that the dual-
intrinsic/extrinsics is shown in Eq. 5.
view architecture improves the detection performance.
sK[r1 r2 t] = H (5)
IV. DATA SYNTHESIS
where K is the intrinsic matrix, T = [r1 r2 r3 t] is the
The lack of training data for 3D vehicle detection in traffic extrinsic matrix of the camera, ri is the ith column of the T
camera images pose difficulty in learning a high-performance matrix, and s is a scaling factor. Given H, there are infinite
detector. In this work, we adopt two approaches to synthesize number of combinations of K and T such that the equality
training data. holds. One of them corresponds to the actual K and T of the
1) CARLA-synthetic: Our first approach is to generate traffic camera of the background images. However, we do not
synthetic data using a simulation platform CARLA [27]. attempt to find the actual K and T . Instead, we randomly
CARLA is capable of producing photo-realistic images from sample (K, T ) tuples to render the foreground images, as
cameras with user-specified parameters. It is able to simulate long as the equality holds. In practice, we assume K =
different lighting conditions and weather conditions. It also [f 0 cx ; 0 f cy ; 0 0 1] (pixels are square), and each sample
supports camera post-processing effects, e.g., bloom and lens on K will determine a (K, T ) tuple.
flares. We selected several positions in the pre-built maps and Notice that while this strategy lays synthetic vehicles on
collected images from the perspective of traffic cameras. the ground, the perspective between the foreground and the
2) Blender-synthetic: The second approach is to synthe- background may be inconsistent, but it is not essential to
size images by composing real traffic scene background the task of r-box detection, and experiments show that the
images with rendered vehicle foreground from CAD models. network generalizes well to real data.
The background images are pictures of empty road taken
by traffic cameras. The rendering and composition is by V. E XPERIMENTS
using the 3D graphic software Blender. While the CARLA The overall setup of the experiment is training on synthetic
images presents large varieties which benefits generalization, data generated following Sec. IV, and testing on real data.
the discrepancy between synthesized images and real images The training dataset contains 40k synthetic images consists
is still easily perceivable from human eyes. Composing real of two parts: CARLA-synthetic and Blender-synthetic. See
background images with synthesized vehicle foregrounds Fig. 7 for some examples. The CARLA-synthetic set contains
could be a step forward minimizing the domain gap. The 15k images collected from 5 locations in 2 maps pre-built
key challenge is: how to set up the camera in foreground by CARLA, covering 2 four-way intersections, 1 five-way
rendering, such that when compositing the foreground and intersection, 1 three-way intersection, and 1 roundabout. The
the background images together, the output looks like the weather and lighting conditions are dynamically changed
foreground vehicles are laying on the ground, instead of during the data collection, improving the robustness for
TABLE II: Ablation study, evaluated on the Ko-PER dataset. IoU is
defined for r-boxes. d is the distance between centers of predicted and
ground truth r-boxes. l is the length of ground-truth rbox. d ≤ 0.5l only
evaluates the position prediction.
Network settings Average precision (AP, %)
IoU ≥ 0.5 d ≤ 0.5l
r-box (similar to [16]) 65.67 71.96
dual-view 75.78 83.25
tailed r-box 78.27 85.55
tailed r-box + dual-view (ours) 82.44 91.20

dataset cannot be used for training a detection network, but

can be used for evaluation using the protocol provided by the
dataset. To be consistent with the literature, we evaluate on
split C of the dataset, which contains 9 hours of videos. The
result is shown in Table I. The previous work in the table
are variants of methods based on vanishing point estimation
and constructing 3D bounding boxes from 2D bounding box
Fig. 7: Examples of synthetic training data. (a) CARLA-synthetic images detection or segmentation.
are captured under various weather and lighting conditions at several The key property that we examine through the evaluation
intersections. (b) CAD models of vehicles are rendered on real background on the BrnoCompSpeed dataset is the generalizability to
images. Lighting effect, surface reflection, and shadow are also captured in
the rendering to make the images more realistic. new camera configurations and environment layout, because
the test set involves 9 cameras at three locations, none of
which has been used for training. Therefore the environments
TABLE I: Quantitative evaluation on BrnoCompSpeed dataset split C. The
numbers of previous work are cited from [11]. Mean precision and mean and camera setups are completely new to the network when
recall are calculated using the dataset evaluation protocol. tested. The result shows that our network generalizes well to
Method Mean recall (%) Mean precision (%) these new scenes, and the proposed r-box detection in BEV
DubskaAuto [31] 90.08 73.48 drastically improves the accuracy of 3D vehicle detection in
SochorAuto [17] 83.34 90.72 terms of both recall and precision over previous methods.
Transform3D-1 [32] 89.32 87.67
Transform3D-2 [11] 86.32 88.32 2) Ko-PER dataset: The Ko-PER dataset has a video of
Ours 97.25 92.70 about 400 seconds corresponding to 9666 images annotated
with 3D bounding boxes, captured at one intersection from
two cameras with different poses. The major challenges in
adversarial environments. The Blender-synthetic set has 25k this dataset are the black-white images and heavy occlusions
images rendered with background images from 3 traffic of queued vehicles at the intersection. We did not find re-
cameras at 2 intersections, and the CAD models for vehicle ported results on this dataset in previous literature. Actually,
rendering are from ShapeNet [28]. the 3D detection methods based on vanishing points do not
The test dataset for quantitative evaluation contains two work for turning vehicles at intersections, which occur in
datasets of real videos from traffic cameras: Ko-PER [29] and Ko-PER dataset.
BrnoCompSpeed [30]. More details of these two datasets are Therefore we use this dataset for ablation study. The result
in Sec. V-A. For qualitative evaluation, we show examples is in Table II. We report two sets of result, evaluated using the
of detection on the test dataset and also on traffic cameras IoU with ground truth r-boxes and using the offset of center
without ground truth annotations. predictions to the ground truth r-boxes. The latter criteria is
Our network is developed based on the SPP (Spatial used to indicate the accuracy in predicting vehicle positions
Pyramid Pooling) variant of the YOLOv3 network. The in the real world. Tails are not considered in the evaluation.
network is trained for 20 epochs with batch size 6 on a From the table, we see that both tailed r-box targets and dual-
single NVIDIA 2080Ti GPU, using ADAM optimizer and view architecture boost the detection accuracy significantly,
cosine learning rate decay, starting from learning rate 0.01. compared with vanilla r-box detection on BEV (which is
similar to the solution in [16]). Predicting tailed r-boxes
A. Quantitative result guides the network to learn long-distance connections in
1) BrnoCompSpeed dataset: We firstly show the quan- distorted BEV, which allows the network to better handle
titative evaluation on the BrnoCompSpeed dataset. The occluded targets. The introduction of dual-view networks
BrnoCompSpeed dataset contains eighteen 1-hour traffic further improves the accuracy.
videos of 50 FPS taken at 6 locations with 3 cameras per
location. The videos do not have 3D bounding box anno- B. Qualitative results
tations. Instead, the speeds and timestamps when a vehicle Examples of detection on real images are shown in Fig.
passes several predefined lines are recorded. Therefore, the 8. All detections are by the same network, without separate
Fig. 8: Examples of detection on real images in variant environment conditions. Each example contains a pair of original-view and BEV images. The
vertices of detected r-boxes in BEVs are projected on the original images. (e) and (f) are samples of the quantitative test set, while (a) to (d) are from
traffic cameras without ground truth annotation. Tail predictions are not drawn for better visualization.

retraining for different cameras. It is shown that the network R EFERENCES

generalizes to different road layout and camera poses, and it
is robust to different lighting and weather conditions. Notice [1] A. Fedorov, K. Nikolskaia, S. Ivanov, V. Shepelev, and A. Minbaleev,
“Traffic flow estimation with data from a video surveillance camera,”
that the roundabout scene and the BrnoCompSpeed dataset Journal of Big Data, vol. 6, no. 1, pp. 1–15, 2019.
are not used as background in synthesizing the training data, [2] S. D. Khan and H. Ullah, “A survey of advances in vision-based
therefore is totally unseen during learning, yet the network vehicle re-identification,” Computer Vision and Image Understanding,
vol. 182, pp. 50–63, 2019.
still achieve accurate detection. [3] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Kr-
ishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal
VI. C ONCLUSION dataset for autonomous driving,” arXiv preprint arXiv:1903.11027,
2019.
We developed a method to solve the problem of 3D [4] P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui,
vehicle detection from images captured by traffic cameras, J. Guo, Y. Zhou, Y. Chai, B. Caine, et al., “Scalability in perception
without requiring the intrinsic and extrinsic parameters of for autonomous driving: Waymo open dataset,” in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition,
the cameras. The homography to connect images to the 2020, pp. 2446–2454.
real world is conveniently done using public map services [5] Q.-H. Pham, P. Sevestre, R. S. Pahwa, H. Zhan, C. H. Pang, Y. Chen,
and only requires mild flat-Earth assumption. Based on the A. Mustafa, V. Chandrasekhar, and J. Lin, “A* 3d dataset: Towards
autonomous driving in challenging environments,” in 2020 IEEE
homography, the 3D vehicle detection problem transforms International Conference on Robotics and Automation (ICRA). IEEE,
to rotated bounding box detection using BEV images. We 2020, pp. 2267–2273.
further proposed a new regression target called tailed r- [6] A. Ahmadyan, L. Zhang, J. Wei, A. Ablavatski, and M. Grundmann,
“Objectron: A large scale dataset of object-centric videos in the wild
box and a dual-view network architecture to address the with pose annotations,” arXiv preprint arXiv:2012.09988, 2020.
distortion and occlusion problems which are common in [7] C. Snyder and M. Do, “Streets: A novel camera network dataset
warped BEV images. Experiments show that both the tailed for traffic flow.” Sandia National Lab.(SNL-NM), Albuquerque, NM
(United States), Tech. Rep., 2019.
r-box regression and the dual-view structure improved the [8] A. Shah, J. B. Lamare, T. N. Anh, and A. Hauptmann, “Cadp: A novel
accuracy significantly. We also synthesized a large dataset dataset for cctv traffic camera based accident analysis,” arXiv preprint
via two approaches, the network trained on which generalizes arXiv:1809.05782, 2018, First three authors share the first authorship.
well on the real test data. The large data size also may have [9] S. Zhang, G. Wu, J. P. Costeira, and J. M. Moura, “Understanding
traffic density from large-scale web camera data,” in Proceedings of
played a role in our significantly higher detection accuracy. the IEEE Conference on Computer Vision and Pattern Recognition,
Our work provides a practical and generalizable solution 2017, pp. 5898–5907.
to deploy 3D vehicle detection on already widely available [10] S. Zhang, C. Wang, Z. He, Q. Li, X. Lin, X. Li, J. Zhang, C. Yang, and
J. Li, “Vehicle global 6-dof pose estimation under traffic surveillance
traffic cameras. Many with unknown intrinsic/extrinsic cali- camera,” ISPRS Journal of Photogrammetry and Remote Sensing, vol.
bration. Some interesting future direction includes removing 159, pp. 114–128, 2020.
the assumptions of planar road surface and negligible cam- [11] V. Kocur and M. Ftáčnik, “Detection of 3d bounding boxes of vehicles
using perspective transformation for accurate speed measurement,”
era nonlinear distortion, by incorporating a more advanced Machine Vision and Applications, vol. 31, no. 7, pp. 1–15, 2020.
calibration procedure. Employing self-training techniques to [12] F. Chabot, M. Chaouch, J. Rabarisoa, C. Teuliere, and T. Chateau,
use unlabeled real-world data may also reduce the domain “Deep manta: A coarse-to-fine many-task network for joint 2d and
3d vehicle analysis from monocular image,” in Proceedings of the
gap and achieve better precision. IEEE conference on computer vision and pattern recognition, 2017,
pp. 2040–2049.
ACKNOWLEDGMENT [13] Z. Qin, J. Wang, and Y. Lu, “Monogrnet: A geometric reasoning
network for monocular 3d object localization,” in Proceedings of the
This article solely reflects the opinions and conclusions of AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp.
its authors and not CSRC or any other Toyota entity. 8851–8858.
[14] T. He and S. Soatto, “Mono3d++: Monocular 3d vehicle detection [23] N. Garnett, R. Cohen, T. Pe’er, R. Lahav, and D. Levi, “3d-
with two-scale 3d hypotheses and task priors,” in Proceedings of the lanenet: end-to-end 3d multiple lane detection,” in Proceedings of the
AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. IEEE/CVF International Conference on Computer Vision, 2019, pp.
8409–8416. 2921–2930.
[15] Y. Wang, W.-L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Q. [24] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,”
Weinberger, “Pseudo-lidar from visual depth estimation: Bridging the arXiv preprint arXiv:1804.02767, 2018.
gap in 3d object detection for autonomous driving,” in Proceedings of [25] Z. Cai and N. Vasconcelos, “Cascade r-cnn: Delving into high quality
the IEEE/CVF Conference on Computer Vision and Pattern Recogni- object detection,” in Proceedings of the IEEE conference on computer
tion, 2019, pp. 8445–8453. vision and pattern recognition, 2018, pp. 6154–6162.
[16] Y. Kim and D. Kum, “Deep learning based vehicle position and [26] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Op-
orientation estimation via inverse perspective mapping image,” in 2019 timal speed and accuracy of object detection,” arXiv preprint
IEEE Intelligent Vehicles Symposium (IV). IEEE, 2019, pp. 317–323. arXiv:2004.10934, 2020.
[17] J. Sochor, R. Juránek, and A. Herout, “Traffic surveillance camera [27] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun,
calibration by 3d model bounding box alignment for accurate vehicle “CARLA: An open urban driving simulator,” in Proceedings of the
speed measurement,” Computer Vision and Image Understanding, vol. 1st Annual Conference on Robot Learning, 2017, pp. 1–16.
161, pp. 87–98, 2017. [28] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li,
[18] X. You and Y. Zheng, “An accurate and practical calibration method S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu,
for roadside camera using two vanishing points,” Neurocomputing, vol. “ShapeNet: An Information-Rich 3D Model Repository,” Stanford
204, pp. 222–230, 2016. University — Princeton University — Toyota Technological Institute
[19] B. Zhong and K. Ao, “Single-stage rotation-decoupled detector for at Chicago, Tech. Rep. arXiv:1512.03012 [cs.GR], 2015.
oriented object,” Remote Sensing, vol. 12, no. 19, p. 3262, 2020. [29] E. Strigel, D. Meissner, F. Seeliger, B. Wilking, and K. Dietmayer,
[20] X. Yang, J. Yang, J. Yan, Y. Zhang, T. Zhang, Z. Guo, X. Sun, and “The ko-per intersection laserscanner and video dataset,” 2014 17th
K. Fu, “Scrdet: Towards more robust detection for small, cluttered IEEE International Conference on Intelligent Transportation Systems,
and rotated objects,” in Proceedings of the IEEE/CVF International ITSC 2014, pp. 1900–1901, 11 2014.
Conference on Computer Vision, 2019, pp. 8232–8241. [30] J. Sochor, R. Juránek, J. Špaňhel, L. Maršík, A. Širokỳ, A. Herout, and
[21] L. Reiher, B. Lampe, and L. Eckstein, “A sim2real deep learning P. Zemčík, “Comprehensive data set for automatic single camera visual
approach for the transformation of images from multiple vehicle- speed measurement,” IEEE Transactions on Intelligent Transportation
mounted cameras to a semantically segmented image in bird’s eye Systems, vol. 20, no. 5, pp. 1633–1643, 2018.
view,” in 2020 IEEE 23rd International Conference on Intelligent [31] M. Dubská, A. Herout, and J. Sochor, “Automatic camera calibration
Transportation Systems (ITSC). IEEE, 2020, pp. 1–7. for traffic understanding.” in BMVC, vol. 4, no. 6, 2014, p. 8.
[22] A. Loukkal, Y. Grandvalet, T. Drummond, and Y. Li, “Driving among [32] V. Kocur, “Perspective transformation for accurate detection of 3d
flatmobiles: Bird-eye-view occupancy grids from a monocular camera bounding boxes of vehicles in traffic surveillance,” in Proceedings
for holistic trajectory planning,” in Proceedings of the IEEE/CVF of the 24th Computer Vision Winter Workshop, 02 2019, pp. 33–41.
Winter Conference on Applications of Computer Vision, 2021, pp. 51–
60.

Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking For Autonomous Driving
No ratings yet
Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking For Autonomous Driving
10 pages
3D Vehicle Detection Using A Laser Scanner and A Video Camera
No ratings yet
3D Vehicle Detection Using A Laser Scanner and A Video Camera
8 pages
Symmetry 14 02657
No ratings yet
Symmetry 14 02657
14 pages
BEVFormer: Learning Bird's-Eye-View Representation From Multi-Camera Images Via Spatiotemporal Transformers
No ratings yet
BEVFormer: Learning Bird's-Eye-View Representation From Multi-Camera Images Via Spatiotemporal Transformers
20 pages
IJCV2023 - Multi-Modal 3D Object Detection in Autonomous Driving A Survey
No ratings yet
IJCV2023 - Multi-Modal 3D Object Detection in Autonomous Driving A Survey
31 pages
Deep VP Submit Mail
No ratings yet
Deep VP Submit Mail
13 pages
Joint Monocular 3D Vehicle Detection and Tracking
No ratings yet
Joint Monocular 3D Vehicle Detection and Tracking
18 pages
Multi-Modal 3D Object Detection in Autonomous Driving (A Survey)
No ratings yet
Multi-Modal 3D Object Detection in Autonomous Driving (A Survey)
30 pages
Center Former
No ratings yet
Center Former
17 pages
Hinterstoisser Iccv11
No ratings yet
Hinterstoisser Iccv11
8 pages
Putting Objects in Perspective
No ratings yet
Putting Objects in Perspective
8 pages
1 MICROTEACH ON Continuing Education
88% (8)
1 MICROTEACH ON Continuing Education
6 pages
U D: T U D P A C C: NI Rive Owards Niversal Riving Erception Cross Amera Onfigurations
No ratings yet
U D: T U D P A C C: NI Rive Owards Niversal Riving Erception Cross Amera Onfigurations
14 pages
Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook
No ratings yet
Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook
32 pages
Hu Joint Monocular 3D Vehicle Detection and Tracking ICCV 2019 Paper
No ratings yet
Hu Joint Monocular 3D Vehicle Detection and Tracking ICCV 2019 Paper
10 pages
A Cheap System For Vehicle Speed Detection
No ratings yet
A Cheap System For Vehicle Speed Detection
8 pages
Xiong CAPE Camera View Position Embedding For Multi-View 3D Object Detection CVPR 2023 Paper
No ratings yet
Xiong CAPE Camera View Position Embedding For Multi-View 3D Object Detection CVPR 2023 Paper
10 pages
Stereo CenterNet Based 3D Object Detection For Autonomous Driving
No ratings yet
Stereo CenterNet Based 3D Object Detection For Autonomous Driving
11 pages
A Survey of 3D Object Detection: Wei Liang Pengfei Xu Ling Guo Heng Bai Yang Zhou Feng Chen
No ratings yet
A Survey of 3D Object Detection: Wei Liang Pengfei Xu Ling Guo Heng Bai Yang Zhou Feng Chen
25 pages
Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss
No ratings yet
Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss
10 pages
Sensors 23 03385
No ratings yet
Sensors 23 03385
20 pages
HD Map
No ratings yet
HD Map
10 pages
Object Detection and Matching in A Mixed Network of Fixed and Mobile Cameras
No ratings yet
Object Detection and Matching in A Mixed Network of Fixed and Mobile Cameras
8 pages
M 2 3DLaneNet: Multi-Modal 3D Lane Detection
No ratings yet
M 2 3DLaneNet: Multi-Modal 3D Lane Detection
9 pages
Integrating Visual and Range Data For Robotic Object Detection
No ratings yet
Integrating Visual and Range Data For Robotic Object Detection
12 pages
Unified Monocular 3D Object Detection
No ratings yet
Unified Monocular 3D Object Detection
10 pages
3D Bounding Box Estimation For Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results
No ratings yet
3D Bounding Box Estimation For Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results
11 pages
SMOKE: Single-Stage Monocular 3D Object Detection Via Keypoint Estimation
No ratings yet
SMOKE: Single-Stage Monocular 3D Object Detection Via Keypoint Estimation
10 pages
Patch Refinement - Localized 3D Object Detection
No ratings yet
Patch Refinement - Localized 3D Object Detection
10 pages
Voxel-Based 3D Detection and Reconstruction of Multiple Objects From A Single Image
No ratings yet
Voxel-Based 3D Detection and Reconstruction of Multiple Objects From A Single Image
14 pages
The Pedestrian Next To The Lamppost - Adaptive Object Graphs For Better Instantaneous Mapping
No ratings yet
The Pedestrian Next To The Lamppost - Adaptive Object Graphs For Better Instantaneous Mapping
10 pages
Improving Distant 3D Object Detection Using 2D Box Supervision
No ratings yet
Improving Distant 3D Object Detection Using 2D Box Supervision
11 pages
2005 04078
No ratings yet
2005 04078
7 pages
LATR: 3D Lane Detection From Monocular Images With Transformer
No ratings yet
LATR: 3D Lane Detection From Monocular Images With Transformer
12 pages
3D Object Detection For Autonomous Driving: A Survey: Qian, Lai and Li
No ratings yet
3D Object Detection For Autonomous Driving: A Survey: Qian, Lai and Li
19 pages
CAMOT Camera Angle-Aware Multi-Object Tracking
No ratings yet
CAMOT Camera Angle-Aware Multi-Object Tracking
10 pages
Autoshape: Real-Time Shape-Aware Monocular 3D Object Detection
No ratings yet
Autoshape: Real-Time Shape-Aware Monocular 3D Object Detection
11 pages
BirdNet A 3D Object Detection Framework
No ratings yet
BirdNet A 3D Object Detection Framework
8 pages
Deep Continuous Fusion
No ratings yet
Deep Continuous Fusion
16 pages
NeurIPS 2023 Rangeperception Taming Lidar Range View For Efficient and Accurate 3d Object Detection Paper Conference
No ratings yet
NeurIPS 2023 Rangeperception Taming Lidar Range View For Efficient and Accurate 3d Object Detection Paper Conference
13 pages
Qin MonoGround Detecting Monocular 3D Objects From The Ground CVPR 2022 Paper
No ratings yet
Qin MonoGround Detecting Monocular 3D Objects From The Ground CVPR 2022 Paper
10 pages
Monoloco: Monocular 3D Pedestrian Localization and Uncertainty Estimation
No ratings yet
Monoloco: Monocular 3D Pedestrian Localization and Uncertainty Estimation
11 pages
Li GS3D An Efficient 3D Object Detection Framework For Autonomous Driving CVPR 2019 Paper
No ratings yet
Li GS3D An Efficient 3D Object Detection Framework For Autonomous Driving CVPR 2019 Paper
10 pages
Eict 2023
No ratings yet
Eict 2023
8 pages
Distance Determination For An Automobile Environment Using Inverse Perspective Mapping in OpenCV
No ratings yet
Distance Determination For An Automobile Environment Using Inverse Perspective Mapping in OpenCV
6 pages
MonA04 4
No ratings yet
MonA04 4
8 pages
Shift RCNN
No ratings yet
Shift RCNN
2 pages
1 en Bok 978-3-030-54407-2
No ratings yet
1 en Bok 978-3-030-54407-2
12 pages
(2024-AEJ) Z-YOLOv8s-based Approach For Road Object Recognition in Complex Traffic Scenarios
No ratings yet
(2024-AEJ) Z-YOLOv8s-based Approach For Road Object Recognition in Complex Traffic Scenarios
14 pages
YOLO Multi-Camera Object Detection and Distance Estimation
No ratings yet
YOLO Multi-Camera Object Detection and Distance Estimation
6 pages
Aiav Unit 2 Notes
No ratings yet
Aiav Unit 2 Notes
8 pages
Focalformer3D: Focusing On Hard Instance For 3D Object Detection
No ratings yet
Focalformer3D: Focusing On Hard Instance For 3D Object Detection
12 pages
Enhancing Point Features
No ratings yet
Enhancing Point Features
21 pages
FocalPose Focal Length and Object Pose Estimation Via Render and Compare
No ratings yet
FocalPose Focal Length and Object Pose Estimation Via Render and Compare
18 pages
Sensors
No ratings yet
Sensors
21 pages
BEVFormer Learning Birds-Eye-View Representation From LiDAR-Camera Via Spatiotemporal Transformers
No ratings yet
BEVFormer Learning Birds-Eye-View Representation From LiDAR-Camera Via Spatiotemporal Transformers
17 pages
Cross-Field Road Markings Detection Based On Inverse Perspective Mapping
No ratings yet
Cross-Field Road Markings Detection Based On Inverse Perspective Mapping
21 pages
Monocular Pose and Shape Reconstruction of Vehicles in UAV Imagery Using A Multi-Task CNN
No ratings yet
Monocular Pose and Shape Reconstruction of Vehicles in UAV Imagery Using A Multi-Task CNN
18 pages
Rukhovich ImVoxelNet Image To Voxels Projection For Monocular and Multi-View General-Purpose WACV 2022 Paper
No ratings yet
Rukhovich ImVoxelNet Image To Voxels Projection For Monocular and Multi-View General-Purpose WACV 2022 Paper
10 pages
C Series Product Guide PDF
No ratings yet
C Series Product Guide PDF
112 pages
m3010 Series PDF
100% (2)
m3010 Series PDF
133 pages
TNPSC Group 2 Mains Preparation Book List For Latest Updated Syllabus - TNPSC Group 4, VAO, Group 2, Group 1, Notificati 1
No ratings yet
TNPSC Group 2 Mains Preparation Book List For Latest Updated Syllabus - TNPSC Group 4, VAO, Group 2, Group 1, Notificati 1
5 pages
Standard American Accent Worksheets
No ratings yet
Standard American Accent Worksheets
10 pages
Bar Exam Tips
No ratings yet
Bar Exam Tips
2 pages
Session 2: Personal Professional Development: Pre-Test
No ratings yet
Session 2: Personal Professional Development: Pre-Test
9 pages
School Directory: Santy C. Balaoro
No ratings yet
School Directory: Santy C. Balaoro
3 pages
COM 20250108 Aria - Pub.provider@
No ratings yet
COM 20250108 Aria - Pub.provider@
40 pages
1SDH001316R1002 Ekip Touch
No ratings yet
1SDH001316R1002 Ekip Touch
40 pages
Series Circuits
No ratings yet
Series Circuits
4 pages
2 - SITXHRM003 Lead and Manage People Student Assessment Guide
No ratings yet
2 - SITXHRM003 Lead and Manage People Student Assessment Guide
78 pages
2021 Investment Case For After School Programmes
No ratings yet
2021 Investment Case For After School Programmes
27 pages
1711954353
No ratings yet
1711954353
58 pages
Fashion Business Research - Assignment
No ratings yet
Fashion Business Research - Assignment
28 pages
Traffic-Net - 3D Traffic Monitoring Using A Single Camera
100% (1)
Traffic-Net - 3D Traffic Monitoring Using A Single Camera
21 pages
USB Devices As VMFS Datastore in Vsphere ESXi 70 Virtennet
No ratings yet
USB Devices As VMFS Datastore in Vsphere ESXi 70 Virtennet
14 pages
Hong Kong History
No ratings yet
Hong Kong History
5 pages
AGAPE
No ratings yet
AGAPE
13 pages
Invisisil Op2131sd Uv Cure Optical Bonding Silicone Tds
No ratings yet
Invisisil Op2131sd Uv Cure Optical Bonding Silicone Tds
5 pages
Classifiedrecords
No ratings yet
Classifiedrecords
18 pages
Compilation - Stamp Duty - Lease Deed
No ratings yet
Compilation - Stamp Duty - Lease Deed
7 pages
XML Tutorial For Beginners
No ratings yet
XML Tutorial For Beginners
28 pages
Rational of The Body Shop - Project 2 (Branded Interactions)
No ratings yet
Rational of The Body Shop - Project 2 (Branded Interactions)
5 pages
Treating The Wise Baby
No ratings yet
Treating The Wise Baby
6 pages
01 Guide To Drafting Your Critical Role Letters
No ratings yet
01 Guide To Drafting Your Critical Role Letters
3 pages
Evaluate Vygotsky's Theory of Cognitive Development (8 Marks)
No ratings yet
Evaluate Vygotsky's Theory of Cognitive Development (8 Marks)
1 page
Leonel-Tumaniog-Course-Syllabus-In-Pe 3-Individual and Dual Sports
No ratings yet
Leonel-Tumaniog-Course-Syllabus-In-Pe 3-Individual and Dual Sports
10 pages
MOT20: A Benchmark For Multi Object Tracking in Crowded Scenes
No ratings yet
MOT20: A Benchmark For Multi Object Tracking in Crowded Scenes
7 pages
Real-Time Collision Warning System Based On Computer Vision Using Mono Camera
No ratings yet
Real-Time Collision Warning System Based On Computer Vision Using Mono Camera
6 pages
Wireless Television Notice Board
No ratings yet
Wireless Television Notice Board
10 pages
CornerNet Detecting Objects As Paired Keypoints
No ratings yet
CornerNet Detecting Objects As Paired Keypoints
14 pages
Deep Layer Aggregation
No ratings yet
Deep Layer Aggregation
10 pages
Heat Exchanger Formulas
No ratings yet
Heat Exchanger Formulas
2 pages
Vision Meets Robotics: The KITTI Dataset: Andreas Geiger, Philip Lenz, Christoph Stiller and Raquel Urtasun
No ratings yet
Vision Meets Robotics: The KITTI Dataset: Andreas Geiger, Philip Lenz, Christoph Stiller and Raquel Urtasun
6 pages
SCFR1 JHS Currhead Consolidation
No ratings yet
SCFR1 JHS Currhead Consolidation
2 pages
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet

Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras

Uploaded by

Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras

Uploaded by

Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras

Abstract— This paper proposes a method to extract the

tive of a driver, and assumed known intrinsic and extrinsic

dataset cannot be used for training a detection network, but

retraining for different cameras. It is shown that the network R EFERENCES

You might also like