0% found this document useful (0 votes)
18 views4 pages

Scene Understanding A Survey

base

Uploaded by

deshmukhneha833
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views4 pages

Scene Understanding A Survey

base

Uploaded by

deshmukhneha833
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

IEEE International Conference on Computer, Communication, and Signal Processing (ICCCSP-2017)

Scene Understanding – A Survey


S. Aarthi S. Chitrakala
Department of Computer Science and Engineering Department of Computer Science and Engineering
CEG, Anna University CEG, Anna University
Chennai, India Chennai, India
[email protected] [email protected]

5. Communication, Visualization and Knowledge


Abstract— In recent times, scene understanding holds a great acquisition.
position in computer vision due to its real time perceiving,
analyzing and elaborating an interpretation of dynamic scene
which leads to new discoveries. A scene is a view of real world Scene Understanding Strategies
environment with multiple objects and surfaces in a meaningful
way. Objects are compact and act upon whereas scene are
extended in space and act within. The visual information can be
given with many features such as Colors, Luminance and contours
or in the form of Shapes, Parts and Textures or through semantic
context. The goal of scene understanding is to make machines look Hierarchical Non-Hierarchical
like humans, to have a complete understanding of visual scenes.
Scene understanding is influenced by cognitive vision with an
involvement of major areas like computer vision, cognitive
engineering and software engineering. Due to its enormous growth
many outstanding universities like Boston University, Stafford
Vision lab, Scene grammar lab, air lab, Laboratory Machine Top Down Bottom Up Combined
Vision and Pattern Recognition have been perseveringly working
for added improvements in this area. This paper discusses an
method method method
extensive survey of scene understanding with various strategies
and methods.
Figure 1: Scene Understanding Strategies
Key words— Scene understanding, contextual scene, semantic
scene, Image identification II. CHALLENGES OF SCENE UNDERSTANDING

The challenge lies in detecting meaningful features or


I. INTRODUCTION characteristics for the physical objects of interest and
The generic computer vision functionality to understand a maintaining the coherency of these objects. All the
scene is first detecting a scene , localizing, recognizing, and spatial, temporal relationship between the objects must be
finally understanding it. Apart from detecting the visual explored efficiently based on the situations, their actions
features such as edges, corners it also requires new robust and behaviors. To better understand the scene its
computer vision functionalities to learn, adapt, weigh performance must be evaluated with the ground truth
alternative solutions and develop new strategies for analysis value. To better perform this, the dynamic environmental
and interpretation. The scene understanding system must be changes should be optimized with higher reasoning. The
able to adapt the variations of current environment, anticipate exact perception of the scene has to be communicated to
and adapt to predict the features and to communicate to all users. The Scene understanding strategies play a vital
humans and other systems. role in optimization of scene that is specified in Figure
The visual information can be given with many features 1.Also the factors that affect the efficiency of scene
• Colors, Luminance and contours. understanding
• Shapes , Parts and Textures • Type and position of Images in the scene
• Semantic context. • Scene motion
The main axes of scene understanding involves • Illumination changes
1. Perception • Static and dynamic occlusions
2. Maintaining the coherency • Type, speed and pose of object
3. Event Recognition • Camera synchronization and hand-over
4. Evaluation, Control and Learning
• Event complexity
• Handling Dynamic scenes

978-1-5090-3716-2/17/$31.00 ©2017 IEEE


Authorized licensed use limited to: Visvesvaraya Technological University Belagavi. Downloaded on May 22,2024 at 09:52:10 UTC from IEEE Xplore. Restrictions apply.
IEEE International Conference on Computer, Communication, and Signal Processing (ICCCSP-2017)

• Dynamic scene Reconstruction Identification of object [7], Scene Recognition [8],


Contextual reasoning [9], Pose estimation[10] is analyzed to
detect the scenes effectively.
III. THE EVOLUTION OF SCENE UNDERSTANDING Roozbeh Mottaghi et.al[11] has proposed a work to give
importance to individual task from which the entire scene can
be depicted perfectly. Semantic segmentation, Object
A. 2D image understanding of images
Detection and Scene Recognition has been greatly used
Understanding the images in the scene may be got with to achieve better accuracy and performance.
text in the images. When text could be located and Many works have been reported in locating the images
recognized properly they could provide a better [12][13] and giving the position of visible objects
visualization.2D images may be analyzed by content [14][15].The semantic interpretation of images is being
based or semantically. identified to understand the images [16][17].The attributes of
the visible objects[18][19] is identified to establish the
1) Context Based semantic meaning and the relationship between them.
C. Lawrence Zitnick et.al[20] have proposed a system that
A large number of components comprises of a scene which measures the mutual information between visual and semantic
needs proper understanding of the context. The individual classes to identify the most meaningful semantic .Semantically
object identification requires the perspective of the actual similar scenes with same written descriptions are used to
context in which the object is placed. Identifying the define the classes. Part of speech for a word is related to the
interaction between the objects can successfully recognize the visual to predict feature to predict the semantic features .The
objects. The contextual information can be gathered based on relationship between semantic feature, its saliency and
the nearby object, their presence interaction and location. memorability are compared to provide alternate information.
The framework proposed by Wangchao Le & Shaofa Li[1] The objects that are memorable and salient will have less
comprises of 3 stages which detects stroke like fonts on multi semantic importance. Thus better understanding of the images
scale input, a color shift distribution model which trains pool can be achieved.
of text images. And a contextual probability function for
integration. The system well works with less restrictions in the
datasets but lacks on many rules and thresholds applied on the IV. DATASETS
dataset.To overcome this a tree based model [2] by Myung Jin Various Standard Datasets were found for analyzing the
Choi et.al is given to analyze the object categories. The model scene. The Datasets include Million of tiny images, CBCL
captures the contextual information, gives global features to Street scenes database, LabelMe ,Games for data collection,
the images and analyzes their dependencies under one tree PASCAL challenges for object detection, Caltech
structure. Also a probabilistic model helps in enhancing the 256,Mammals database, Oxford Building Set, Hoofed animals
semantivity among the features and may be improved to dataset, Leibe's datasets. The CityScape Dataset and PASCAL
discover relationship between objects and scenes. Dataset is given in Figure 2.
Tian Lan et.al[3] proposed a framework which tells about a
person – person synergy and group-person communication.
This work clearly gives the information contextually which
observes group activities. Mojtaba Seyedhosseini et.al[4]
proposed a Multiclass Multiscale Framework which utilizes
supervised environment along with contextual information
from collective objects to enhance the discriminative models.

2) Semantic Based

Automatic scene understanding has become an open


challenge and main goal of computer vision. Debadeepta Dey
et.al [5] gives an approach with ordered output by semantic
segmentation.The required problem is considered directly
and enforced with set of learned models .With this continous
learning process ,multiple outputs can be derived
efficiently.Cesar Cadena et.al[6] proposed a semantic system
for understanding the scene by detecting and segmenting the
objects present in the scene. In the segmentation mechanism
the semantic space of the objects are considered deeply and
updated to enhance the performance compared to all holistic
models.

978-1-5090-3716-2/17/$31.00 ©2017 IEEE


Authorized licensed use limited to: Visvesvaraya Technological University Belagavi. Downloaded on May 22,2024 at 09:52:10 UTC from IEEE Xplore. Restrictions apply.
IEEE International Conference on Computer, Communication, and Signal Processing (ICCCSP-2017)

V. CONCLUSION
Scene Understanding is still an open problem where all the
images need to be analyzed and identified correctly. The goal
of the system is to identify the images in the scene and
annotate aptly and form a full description. The system should
get all the events and its operations which depicts the entire
scene. This literature survey summarizes that it should be
possible for the system to adopt itself to the environment and
analyze the context and also should be able to interpret the
scenario so that the environment related to the scene can be
predicted. Yet these works have not paved a route to handle
real time scenes to its fullest reach.

Types Context/ Real


S.no References of Semantic Time/Static Methodology Advantages Limitations
Scene Based Scene
Filtering & No restriction on Works only under
1 [1] 2D Context Static Scene
Labeling Background Text certain Threshold
2D More Reliable in Cannot handle full
2 [2] Context Real Scene predicting multiple hierarchical tree structure
SVM objects
2D Adaptive Capture group of Fails to capture
3 [3] Context Real Scene Structure people ,their activity multiple scene and
Algorithm and interactions complex structures
2D Multiclass Useful for Complex features
4 [4] Context Static Scene Multiscale segmenting imbalanced increases the computational
segmentation datasets complexity
2D SVM and CRF Multiple learning Loss of label at each
5 [5] Semantic Static Scene
set based on predictions stage is not captured
2D SVM Object detector Only certain 2D shapes
6 [6] Semantic Real Scene Modules improve the are accepted.
performance
2D CRF Human machine Works on a limited
models gives a better Datasets
7 [11] Semantic Static Scene comparison thus
improving the
performance
2D Gaussian A new Dataset with Only synthetic data is
8 [20] Semantic Static Scene Distribution abstract images and created
description is created
3D 2D Handles complex But fails to handle the
9 [21] Semantic Real Scene segmentation 3D models occluded objects
and Sampling
10 3D 3D Grasps 3D Computationally
[22] Semantic Real Scene segmentation Streaming Data expensive

TABLE 1: COMPARISON OF VARIOUS SCENE UNDERSTANDING METHODS

978-1-5090-3716-2/17/$31.00 ©2017 IEEE


Authorized licensed use limited to: Visvesvaraya Technological University Belagavi. Downloaded on May 22,2024 at 09:52:10 UTC from IEEE Xplore. Restrictions apply.
IEEE International Conference on Computer, Communication, and Signal Processing (ICCCSP-2017)

[21]Luca Del Pero Joshua Bowdish Bonnie Kermgard Emily Hartley


Kobus Barnard “Understanding Bayesian Rooms Using Composite 3d
REFERENCES Object Models”, 2013 Ieee Conference On Computer Vision And Pattern
Recognition
[1] Wangchao Le & Shaofa Li “Modeling Scene Text Features With [22]Hanzhang Hu, Daniel Munoz ,J. Andrew Bagnell, Martial Hebert ”
Parametric Filter Banks And Contextual Color - Shift Distribution Model “, Efficient 3-D Scene Analysis From Streaming Data”, 2013 IEEE
South China U. Of Tech., Wushan, Canton 510641, International Conference on Robotics and Automation (ICRA)Karlsruhe,
[2]. Myung Jin Choi, Student Member, Ieee, Antonio Torralba, Germany,May 6-10, 2013
Member, Ieee, And Alan S. Willsky, Fellow, Ieee,” A Tree-Based Context [23]T. Tutenel, R.M. Smelik, R. Bidarra, K. J. D. Kraker, “Using
Model For Object Recognition”, Ieee Transactions On Pattern Analysis Semantics to Improve the Design of Game Worlds”, In Artificial
And Machine Intelligence, Vol. 34, No. 2, February 2012 Intelligence and Interactive Digital Entertainment, 2009
[3]Tian Lan, Yang Wang, Weilong Yang, Stephen N. Robinovitch, And [24] P. Merrell, E. Schkufza, Z. Li, M. Agrawala, and V. Koltun,
Greg Mori, Member, IEEE, “Discriminative Latent Models For ”Interactive Furniture Layout Using Interior Design Guidelines”, In
Recognizing Contextual Group Activities”, Ieee Transactions On Pattern SIGGRAPH, 2011
Analysis And Machine Intelligence, Vol. 34, No. 8, August 2012 [25]M. Fisher, and P. Hanrahan, “Context-based search for 3D models”, In
[4]Mojtaba Seyedhosseini and Tolga Tasdizen, Senior Member, IEEE, SIGGRAPH Asia, 2010
“Multi-Class Multi-Scale Series Contextual Model for Image [26]3D Scene Generation by Learning from Examples Mesfin A. Dema and
Segmentation”, Ieee Transactions On Image Processing, Vol. 22, No. 11, Hamed Sari-Sarraf, 2012 IEEE International Symposium on Multimedia
November 2013
[5]Debadeepta Dey Varun Ramakrishna Martial Hebert J. Andrew Bagnell
,” Predicting Multiple Structured Visual Interpretations”, 2015 IEEE
International Conference on Computer Vision.
[6] Cesar Cadena, Anthony Dick and Ian D. Reid “A Fast, Modular
Detection “,2015 IEEE International Conference on Robotics and
Automation (ICRA) Washington State Convention Center Seattle,
Washington, May 26-30, 2015
[7]P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, “Object
detection with discriminatively trained part based models,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627– 1645, Sep. 2010.
[8] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba, “Sun database:
Large-scale scene recognition from abbey to zoo,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recog., 2010, pp. 3485–3492
[9]A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S.
Belongie, “Objects in context,” in Proc. IEEE Int. Conf. Comput.Vis.,
2007, pp. 1–8.
[10] Y. Yang and D. Ramanan, “Articulated pose estimation using flexible
mixtures of parts,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2011,
pp. 1385–1392.
[11] Roozbeh Mottaghi, Sanja Fidler, Alan Yuille, Raquel Urtasun, And
Devi Parikh” Human-Machine Crfs For Identifying Bottlenecks In Scene
Understanding” Ieee Transactions On Pattern Analysis And Machine
Intelligence, Vol. 38, No. 1, January 2016
[12]L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual
attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 20, no. 11, pp. 1254–1259, Nov. 1998.
[13]C. Privitera and L. Stark, “Algorithms for defining visual regionsof-
interest: Comparison with eye fixations,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 22, no. 9, pp. 970–982, Sep. 2000.
[14]L. Elazary and L. Itti, “Interesting objects are visually salient,”J. Vis.,
vol. 8, no. 3, pp. 1–15, 2008
[15] S. Hwang and K. Grauman, “Learning the relative importance of
objects from tagged images for retrieval and cross-modal search,” Int. J.
Comput. Vis., vol. 100, pp. 134–153, 2011.
[16]A. Farhadi, M. Hejrati, M. Sadeghi, P. Young, C. Rashtchian, J.
Hockenmaier, and D. Forsyth, “Every picture tells a story: Generating
sentences from images,” in Proc. 11th Eur. Conf. Computer Vis., 2010, pp.
15–29..
[17] V. Ordonez, G. Kulkarni, and T. Berg, “Im2text: Describing images
using 1 million captioned photographs,” in Proc. Adv.
Neural Inf. Process. Syst., 2011, pp. 1143–1151.
[18] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, “Describing objects
by their attributes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2009, pp. 1778–1785.
[19]T. Berg, A. Berg, and J. Shih, “Automatic attribute discovery and
characterization from noisy web data,” in Proc. 11th Eur. Conf. Comput.
Vis., 2010, pp. 663–676.
[20]C. Lawrence Zitnick, Member, Ieee, Ramakrishna Vedantam, And
Devi Parikh, Member, Ieee “Adopting Abstract Images For Semantic Scene
Understanding”, Ieee Transactions On Pattern
Analysis And Machine Intelligence, Vol. 38, No. 4, April 2016

978-1-5090-3716-2/17/$31.00 ©2017 IEEE


Authorized licensed use limited to: Visvesvaraya Technological University Belagavi. Downloaded on May 22,2024 at 09:52:10 UTC from IEEE Xplore. Restrictions apply.

You might also like