0% found this document useful (0 votes)
46 views4 pages

Vanishing Point Detection With Convolutional Neural Networks

This document summarizes research on detecting vanishing points in images using convolutional neural networks. The researchers collected over 37,000 frames from YouTube videos annotated with vanishing points to train AlexNet and VGG networks. The networks achieved over 98.9% accuracy in predicting vanishing point existence and localized vanishing points within 15 pixels of ground truth in 85% of test cases over a 20x20 grid. Baseline models using typical vanishing point locations performed worse, demonstrating the CNN approach is effective for vanishing point detection.

Uploaded by

sachin verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views4 pages

Vanishing Point Detection With Convolutional Neural Networks

This document summarizes research on detecting vanishing points in images using convolutional neural networks. The researchers collected over 37,000 frames from YouTube videos annotated with vanishing points to train AlexNet and VGG networks. The networks achieved over 98.9% accuracy in predicting vanishing point existence and localized vanishing points within 15 pixels of ground truth in 85% of test cases over a 20x20 grid. Baseline models using typical vanishing point locations performed worse, demonstrating the CNN approach is effective for vanishing point detection.

Uploaded by

sachin verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/307636469

Vanishing point detection with convolutional neural networks

Article · September 2016

CITATIONS READS
16 585

1 author:

Ali Borji

176 PUBLICATIONS   9,750 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Saliency View project

EgoTransfer: Transferring Motion Across Egocentric and Exocentric Domains using Deep Neural Networks View project

All content following this page was uploaded by Ali Borji on 29 November 2016.

The user has requested enhancement of the downloaded file.


Presented at SUNw: Scene Understanding Workshop, CVPR 2016

Vanishing point detection with convolutional neural networks

Ali Borji
Center for Research in Computer Vision, University of Central Florida
[email protected]

1. Introduction
arXiv:1609.00967v1 [cs.CV] 4 Sep 2016

points (1 per frame) in all videos (one annotator; the au-


In a graphical perspective, a vanishing point (VP) is a thor). The grid cell containing the vanishing point has the
2D point (in the image plane) which is the intersection of label 1 on a n×n grid map (n= [10, 20, 30]). Some example
parallel lines in the 3D world (but not parallel to the im- frames of 29 YouTube videos are shown in Figure 1.
age plane). In other words, the vanishing point is the spot We also collected some images without VPs to train a bi-
to which the receding parallel lines diminish. In principle, nary classifier for VP existence prediction. A total of 32,419
there can be more than one vanishing point in the image. images were sampled from these datasets: MIT saliency
VP can commonly be seen in fields, railroads, streets, tun- benchmark [6], CAT2000 dataset [7], Caltech 256 [8], 15
nels, forest, buildings, objects such as ladder (from looking category dataset [9] except the street and highway cate-
bottom-up), etc. It is an important visual cue useful in sev- gories, MS COCO [10], and Imagenet [11].
eral applications (e.g., camera calibration, 3D reconstruc-
2.2. Vanishing point existence prediction
tion, autonomous driving).
After training networks for 20 epochs over 63,916 im-
Inspired by the finding that vanishing point (road tan-
ages (34,497 with VP and 29,419 without VP), we obtained
gent) guides driver’s gaze [1, 2], in our previous work we
98.9% VP existence prediction accuracy using the Alexnet
showed that vanishing point attracts gaze during free view-
network and 99.73% using the VGG network over the test
ing of natural scenes as well as in visual search [3]. We have
set (6,000 images; 3,000 with VP).
also introduced improved saliency models using vanishing
point detectors [4]. Here, we aim to predict vanishing points 2.3. Vanishing point localization
in naturalistic environments by training convolutional neu- Alexnet and VGG networks were trained to map a scene
ral networks in an end-to-end manner. into the VP location which is one of the p classes (p = 100,
Traditionally, geometrical and structural features such as 400, or 900; linearized n × n grids). Thus, there are p neu-
lines and corners (e.g., using Hough transform [5]) have rons in the output layers. We used 33,000 frames for train-
been applied for detecting vanishing points in images. Here, ing and the remaining 4,497 frames for testing. The network
we follow a data-driven learning approach by training two training was stopped after 40 epochs.
popular convolutional neural networks, Alexnet and VGG, Results are shown in Figure 2. We achieved the lowest
for: 1) predicting whether a vanishing point exists in a scene top-5 error rate of 5.1% over 10 × 10, 15.9% over 20 × 20,
(on a n × n grid map), and 2) If so, we then attempt to lo- and 25.3% over 30 × 30 grid sizes using the VGG network.
calize its exact location. It means that the probability of hitting within 15 pixels of
the true VP location in 5 guesses is about 85% (over a 20
2. Experiments & Results × 20 grid on a 300 × 300 image). Our results are nearly
2.1. Data collection the same using both networks. Figure 2 (right) shows some
To train deep neural networks, often a large amount success and failure cases of the Alexnet for VP localization.
of data is needed. We resorted to YouTube to down- Since vanishing point usually happens at the image cen-
load videos including road trips across America (e.g., from ter (See Figure 1, bottom-right), we devised two baseline
sedan, bus, or truck dash cams), personal adventures (e.g., predictors to further evaluate our method. The first one is
using shifters or motorbikes) or game playing sessions (e.g., the most frequent grid location ([x y] in training data) de-
formula one, Nascar). These videos have been captured in noted as the ‘Top-1 center’ and the second one is the five
a variety of weather and ground conditions (e.g., freeway, most frequent locations ([x y], [x-1 y], [x y-1], [x+1 y], [x
race track, in city, inter city, snowy, rainy, sunny, mountain- y+1]; all set to one, the rest are zero) denoted as ‘Top-5
ous, forest, vegetation). Eventually, we had 37,497 frames center’. These models perform well above chance (16.5%
(resized to 300 × 300 pixels). We annotated vanishing accuracy using Top-1 center vs. 0.25% chance over a 20 ×
1 2 3 4 5 6 7 8 9 10 Images without vanishing point

11 12 13 14 15 16 17 18 19 20

21 22 23 24 25 26 27 28 29 distribution of vanishing points


14

12

10

Figure 1. Left: Two sample frames of each of 29 videos downloaded from YouTube. Top-right: Sample images without vanishing point
used to train the vanishing point existence prediction network. Bottom-right: Average vanishing point location. Left panel shows all visited
locations and the right panel shows the VP histogram.

Alexnet: top-1 VGG: top-1 Center: top-1


1 Alexnet: top-5 VGG: top-5 Center: top-5
0.9
0.8
0.7
0.6
Error rate

0.5
0.4
0.3
0.2
0.1
0
Grid size 10 x 10 20 x 20 30 x 30

Figure 2. Left: Error rates of deep models for VP detection. Top-right: Sample images where our model is able to accurately locate the VP
in five tries. Red circle is the top-1 prediction and blue ones are the next top-4. Bottom-right: Failure examples of our model.

20 grid) but are well below the deep learning performance shortcoming. Another way to improve performance would
(deep learning Top-1 accuracy is about 57%). be through data augmentation (i.e., adding jittered, cropped,
We also compare our model with two vanishing point noisy, and blurry versions of input images).
detection algorithms from the literature. The first one is
a method by Košecká and Zhang [12] and the second one 3. Discussion
is the classic Hough transform [5]. These two algorithms
score 15.6% and 35%, respectively in detecting the vanish- We proposed a method for vanishing point detection
ing point on a 20 × 20 map (Top-1 accuracy) which are based on convolutional neural networks that does well
much lower than our results using CNNs. on road scenes but is not very effective on arbitrary im-
ages. We will consider collecting a larger image dataset
To assess the generalization power of our approach in
with variety of scenes including vanishing points and more
detecting vanishing points in arbitrary natural scenes, we
recent deep learning architectures to improve accuracy.
experimented with pictures of buildings, tunnels, sketches
Extension of this approach to videos is another interest-
and fields shown in Figure 3. Although our model (VGG)
ing future direction. Our dataset is freely available at:
has not been explicitly trained on these images, it success-
http://crcv.ucf.edu/people/faculty/Borji/code.php
fully finds VPs in some of them. It fails on some other
unseen examples (e.g., sketches). Augmenting our dataset Acknowledgments: We wish to thank NVIDIA for their
with more images of these kinds could help overcome this generous donation of the GPU used in this study.
Figure 3. Performance of our vanishing point detector on arbitrary images containing vanishing points. The largest red circle is the first
detection. Other four detections are shown in blue.

References
[1] M. Land, D.N. Lee, Where do we look when we steer, Na-
ture,1994.
[2] A. Borji, L. Itti, State of the art in visual attention modeling,
IEEE Trans. PAMI, 2013.
[3] A. Borji, M. Feng, Vanishing point attracts gaze in free-
viewing and visual search tasks, arXiv:1512.01722, 2015.
[4] M. Feng, A. Borji, H. Lu, Fixation prediction with a com-
bined model of bottom-up saliency and vanishing point,
WACV, 2015.
[5] P.V.C. Hough, Method and means for recognizing complex
patterns, 1962.
[6] Z. Bylinskii, T. Judd, A. Borji, L. Itti, F. Durand, A. Oliva, A.
Torralba, MIT Saliency Benchmark, http://saliency.mit.edu/
[7] A. Borji, L. Itti, Cat2000: A large scale fixation dataset for
boosting saliency research, arXiv:1505.03581, 2015.
[8] G. Griffin, A. Holub, P. Perona, Caltech-256 object category
dataset, 2007.
[9] A. Torralba, A. Oliva, M.S. Castelhano, J. Henderson, Con-
textual guidance of eye movements and attention in real-
world scenes: the role of global features in object search,
Psychological review, 2006.
[10] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ra-
manan, P. Dollár, L.C Zitnick, Microsoft coco: Common ob-
jects in context, ECCV 2014.
[11] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, L. Fei-Fei, Im-
agenet: A large-scale hierarchical image database, CVPR,
2009.
[12] J. Košecká, W. Zhang, Wei, Video compass, ECCV 2002.

View publication stats

You might also like