Video Analysis Application Centric View

Uploaded by

beoverall

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views8 pages

Video Analysis Application Centric View

Uploaded by

beoverall

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Who, What, When, Where, Why and How in Video Analysis:

An Application Centric View

Sadiye Guler Jay Silverstein, Ian Pushee, Xiang Ma,

intuVision, Inc. Ashutosh Morde
10 Tower Office Park, Woburn, MA 01801 intuVision, Inc.
[email protected] {jay,ian,sean,ashu}@intuvisiontech.com

Abstract

This paper presents an end-user application centric of video data, some of which is analyzed in real-time. In
view of surveillance video analysis and describes a addition to traditional surveillance video, in recent years a
flexible, extensible and modular approach to video new source of video data has surpassed any kind of
content extraction. Various detection and extraction volume expectations by individuals recording video
components including tracking of moving objects, during public and private events with their cameras and
detection of text, faces, and face based soft biometric for smart phones and posting them on the websites such as
gender, age and ethnicity classification are described YouTube for public consumption. These unconstrained
within the general framework for real-time and post event video clips (i.e., “other people’s video” the term coined by
analysis applications Panoptes and VideoRecall. Some [2]) may provide additional content and information that is
end-user applications that are built on this framework are not available in typical surveillance camera views; hence
discussed. they became part of any surveillance and investigation
task. For example, surveillance video cameras around a
stadium typically have views of entry/exit halls, snack
courts etc. which may provide partial video coverage of
1. Introduction some excited fans being destructive with no means to get a
good description of the individuals; but an inspection of
Since their early deployment in mid-1960s [25], of the other fan videos posted on public sites soon after the
video surveillance systems held the promise for increased event may make available crucial information that would
security, safety and they facilitated forensic evidence otherwise be missed. The unconstrained surveillance
collection. The potential of video comes with its video is now gaining more importance as a surveillance
challenges; with the large and ever increasing volumes of source and the automated video analysis systems have to
data that is collected and needs to be analyzed to utilize this emerging data source.
automatically to understand the video content, detect Detection and classification of objects in video,
objects and activities of interest and provide relevant tracking of moving objects over multiple camera field of
information to end users in a timely manner. views, understanding the scene context, peoples activities
Over the last two decades, video surveillance systems, and some level of identification information on
from cameras to compression and storage technologies individuals all have been active areas of research [16].
have undergone fundamental changes. Analog cameras Despite all the advances in the field and availability of
have been replaced with digital and IP network cameras, multiple real-time surveillance applications the acceptance
tape based storage systems with no efficient way to find of these automated technologies by security and safety
content have been replaced with Digital and Network professionals have been slower than several initial market
Video Recorders with video content analysis for efficient estimations. This is partly due to the fact that most
storage, indexing and retrieval. With the availability of automated systems are designed to provide a solution to
more and more capable computing platforms the large some of the areas mentioned above, which effectively is
body of research and development in the vision-based only a part of the solution the end-users require. End users
video surveillance field video content analysis and are looking for tools that help to answer the questions
extraction algorithms found their ways into several “What, Where, When, Who, Why and How (w5h)” using
surveillance applications. The surveillance systems video and other data that is available. The answers to w5h
currently deployed at public places collect a large amount questions describe an event with all its details. What
happened? Where did it take place? When did it happen? 2.1. Modular Video Analysis Architecture
Who was involved? Why did it happen? And how did it
happen? In most cases the answers to most of those Figure 1 depicts the intuVision video content extraction
questions can be found in video data at multiple levels of framework. At the heart of our framework lie multiple
detail and the automated video surveillance applications of detectors that operate on the video to detect moving
tomorrow need to be designed to answer as many of these objects and classify them (for fixed camera event
questions by analyzing both in real-time and post event detection); or the frame based detection of face, text,
video. person and vehicles (for analysis of moving platform
“What” describes the event such as “entry from the exit video or post event analysis). Once the objects are
door” that will be detected from a surveillance camera detected and tracked (faces of tracked people can be
video by tracking a person entering into an area from a detected in real-time). Panoptes video event detection
restricted direction doorway. “Where” is the location or module looks for pre-specified or anomalous events. The
locale where the event took place such as “Building-A Soft Biometry module then classifies faces for
lobby” or “Terminal C, Logan Airport, Boston” which is gender/ethnicity and age. All extracted content
available information for surveillance cameras and maybe information is fed into a database to support downstream
in the user created tags for the web videos. “When” is the analysis tasks, queries and retrieval. While Panoptes
time of the event which again is available information for analyzes the video content in real-time, VideoRecall is
surveillance cameras and a date might be available in intended for forensic analyses, identifying items of interest
video tags for web videos or it may just be the time the and providing a reporting utility to organize the findings
web video was posted. “Who” is/are the actors in the and results. Panoptes and VideoRecall form the backbone
event such as the person or people who entered the wrong of a generic video surveillance application and allow for
way. Depending on the video view and the resolution the quick development of more specific custom surveillance
analyses may indicate a single person versus a group of applications (as exemplified in Section 4). These
people; if a good quality face image is available it can be components are discussed in further detail in the following
matched against a watch list or gender/age/ethnicity sections.
maybe detectable even from lower quality face images
using soft biometric classifiers. “Why” describes the
intent of an event which is not directly observable from
video but maybe revealed to the end user by the unfolding
of events following. “How” describes the way in which
this activity has happened and maybe explained by the
event detection such as the person idling and another
person exiting; activities preceding the wrong way entry
will indicate that the person waited around to catch the
door open while someone exited.
Our approach to address all w5h questions is to exploit
video content and metadata at multiple layers with
flexible, expandable and easily re-configurable video
content extraction architecture. The layered architecture
allows for easily configuring various content extraction
tasks to answer as many of these questions as possible.
After this introduction we describe our general video
content extraction framework and its components in
Section 2. We introduce our face based soft video
biometry classification in Section 3. In Section 4 we
introduce few applications based on this framework. We
briefly describe the technology evaluations where parts of Figure 1: intuVision Video Analysis architecture
this framework were evaluated in Section 5, followed by
the Concluding Remarks in Section 6.
2.2. Real-Time analysis with Panoptes
2. Flexible video analytics framework Our approach to real-time video analysis is based on
In this section we introduce our real-time and post- exploiting space and object-based analysis in separate
event content extraction architecture and explain the parallel but highly interactive processing layers that can be
underlying algorithms for the major components.
included in processing as required. This methodology has that are dictated by traffic rules and lights, people walk
its roots in human cognitive vision [36]. Object based along convenient paths or interact with interest points
visual attention; visual tracking and indexing have also such as a snack stand, park their cars and walk towards a
been one of the active fields of study in cognitive vision. building etc. Learning the normal scene activity for each
Models of human visual attention use a saliency map to object type facilitates the detection of the anomalous
guide the attention allocation [1]. Studies on human visual actions of people and vehicles. Similar to [28], we model
cognition system point to a highly parallel but closely the scene activity as a probability density function (pdf).
interacting functional and neural architecture. Humans Kernel Destination Estimation computes the model and a
easily focus their attention on video scenes exhibiting Markov Chain Monte-Carlo framework generates the most
several objects and activities at once and keep track of likely paths through the scene. The model is trained with a
object states and relationships. This indicates allocation of set of normally observed tracks that consists of observed
attention to different processes that deal with various transition vectors defined by the destination location,
dynamics of the scene and using correspondences and transition time and the bounding box width and height of
communication between these processes [21]. the object of interest. The pdf of transition vectors at each
At the first level, the peripheral tracker is responsible pixel location in the background scene is modeled as a
for providing and maintaining a quick glance of the mixture of Gaussian distribution and a Gaussian Mixture
moving object in a scene, it functions as an abstracted Model (GMM) is learned for each pdf at a given location
positional tracker to quickly establish the rough spatial using a modified EM algorithm. This framework
area and the trajectories for the moving objects. Peripheral circumvents the need for explicitly clustering the user
tracker primarily maintains motion based information for tracks. For a given new track the transition vectors are
the objects. This coarse tracking information is all that is calculated and the probability of its being normal is
needed for some higher level processes such as detecting estimated using the trained GMM’s at every pixel
instances of objects entering into a specific zone or high location.
number of objects in the scene. But other detection tasks Object Layer provides detailed object-based analysis
require a more detailed analysis such as classification of for classification. We use Support Vector Machine (SVM)
objects or knowledge of the scene background method for object classification. SVM is a supervised
information to detect and track objects in more learning method that simultaneously minimizes the
challenging environments. The tracking information from classification error while maximizing the geometric
the peripheral tracker is made available to other layers to margin. Hence SVMs perform better than many other
enable maintaining the scene background and context classifiers and have been used in a wide range of
models, stationary object model and detailed object based computer vision applications for object classification,
analysis for classification of tracked objects. object recognition [23,24] and action recognition [29,11].
These layers, described below, facilitate comprehensive We use object’s size, oriented edges [4], silhouette and
analysis of the objects and the scene at multiple levels of motion based features for SVM object classification.
detail as needed. Oriented edges have shown to be robust to changes in
Peripheral Tracking Layer performs a coarse illumination, scale and view angle [6].
spatially-based detection and tracking of moving objects In most surveillance applications, robustly classifying
using a computationally light algorithm based on frame objects without having a large set of sample data to train
recency and produces rough connected regions that the classifier is of particular interest. SVM is well suited
represent the moving objects. The detected moving object for such applications as they can be trained with few
regions are tracked from frame to frame by building samples and generalize well to novel scenes. Figure 2
correspondences for objects based on their motion, illustrates the Panoptes object classification training
position and condensed color information. interface where a new object type is being trained for a
Scene Description Layer maintains models for general person carrying a long object.
scene features to facilitate detailed analysis in other layers. In aerial or other moving camera views (such as
Currently the Scene Description Layer includes an panning and zooming of fixed cameras) scene background
adaptive background model [26], a dynamic texture is not consistent over multiple frames. To detect or track
background model [20], a background edge map [11] and objects in such scenes, using trained detectors based on
a scene activity model [28]. The Scene Description Layer appearance features in each single frame works well. For
or some of its components may not be necessary for all these types of environments we employ the Haar
detection tasks; rather it is designed to support other tasks Transform based feature detection framework as
like stationary object or anomalous activity detection. suggested by [19, 33] for detection of people and vehicles.
Scene Activity Model Every scene has a normal activity Haar features are used to train a cascade of classifiers
and flow associated with it, vehicles usually follow paths using the Adaboost framework. The trained classifier
Real-time Event Detection: Understanding of
activities and events from video sequences have been
studied widely and approaches ranging from expert
system like rules based methods to probabilistic models
have been employed [9,13,14,15]. Our event detection
algorithms use various methods appropriate for different
event detection tasks [11]. We employ rules based
approach for pre-defined events, trained SVM models for
person activities and scene activity model for detecting
anomalous behaviors.

2.3. Forensic analysis with VideoRecall

The end-users who analyze video and multimedia data
are being flooded with large volumes of data from a wide
range of multimedia sources and need systems and tools
to:
Figure 2: Panoptes object classification training interface  Detect and distinguish an event of interest from
many like – but unimportant – events,
detects people and vehicles at real-time speeds and has a
 Query, retrieve, view and route selected video data
high recall rate. Some examples of person and vehicle
and events of interest
detections from UAV videos are shown in Figure 3.
Face Detection Our face detection module uses an  Identify associations between seemingly unrelated
Adaboost classifier trained on Haar like features content.
[19,31,32]. In the real-time detection tasks once a person VideoRecall –designed to fill this need- provides a
is detected and passes the criteria of including a face video content analysis, and exploration environment to
(frontal view and sufficient resolution), we further check gain insights in to the events and activities for
for the presence of face within the detected bounding box. investigating a situation of interest (Figure 3).
To improve computation efficiency, we use only the top Typical VideoRecall processing starts with video shot
half of the bounding box as input to the face detector. segmentation to partition the video clips into short
Text Detection For in scene video text detection, we segments that are separated by camera operations (cuts,
employ the method proposed by [18] due to its efficiency zooms and pans) or large scene changes (walking into a
and robustness in recognizing rigid, monochrome artificial building). This is followed by scene classification, to
text (such as close caption or text overlays) from video. cluster similar and related shots together into coherent
The input frame is first segmented based on color, the scenes. Video content such as text, vehicles, people, faces
segmented regions are then merged to remove over- and their soft biometric classification for
segmentation of characters. Contrast segmentation is gender/age/ethnicity are extracted using the algorithms
performed to get characters in a binary image. Geometric described in Sections 2 and 3. VideoRecall also facilitates
analysis is used to remove false region or pixels that do
not qualify for a character.
Stationary Object Layer Once the moving objects
become stationary the spatially-based peripheral tracker
no longer tracks them; the detection of stationary objects
is accomplished by introducing another parallel layer to
the tracking algorithm. This layer uses the information
produced by the Peripheral Tracker and the Scene
Description Layer to create a Stationary Object Model as a
collection of object regions that represent hypothesized
stationary objects. Each region in the stationary object
model has a pseudo probability value that measures the
“endurance” of the region as a static object. The stationary
object model starts with the hypothesis for the stationary
objects as the regions occupied by the moving objects
(identified by the Peripheral Tracker) in the previous Figure 3: VideoRecall Graphical User Interface.
frames.
a Panoptes plug-in to detect events. All this extracted
content information can be queried to find items of
interest. VideoRecall provides a “sandbox” for creating
media rich reports utilizing the extracted objects. All
content metadata created by VideoRecall is stored in an
index database to facilitate further investigations. In Figure 4: Aligned, scaled and masked face image ROIs. (Images
addition, it supports KLV (Key Length Value) metadata are shaded to protect the anonymity of the subjects.).
[10]. A map interface provides geographical context for
extracted objects, queries and events. Figure 3 illustrates major advantage of video data is that the availability of
the VideoRecall Graphical User Interface, showing the multiple samples of the face, as they move through the
extracted vehicles and a sandbox report example. camera field of view. Our solution takes advantage of this
data redundancy to build a dynamic soft biometric feature
2.4. GPU Aware video analytics vector. The algorithms were trained by over 12,000 face
images from two genders and three ethnicity groups with
General purpose Graphic Processing Units (GPU) have various sizes, poses and illumination. Our algorithms for
attracted a lot of attention recently from the computer extracting face soft biometric features achieve 90%
vision community for speeding up parallel operations in classification for good quality face images that are at least
video analysis algorithms [34]. Pixel based parallel 60x60 pixels in size and around 70-75% for lower quality
operations of object detection and frame pre-processing ones around 30x30 pixels (Figure 4).
stages of video content extraction tasks are well suited to The first method uses pixel intensity-based features as a
(GPU) platform and provide reasonable speed-ups in case to implement real time gender and ethnicity detection.
of integrated GPU cards, which became the standard Detection of the intensity-based facial features requires
Graphic card hardware in newer computers. Our video the registration of the faces. The angle and the distance
analysis algorithms are adapted to NVIDIA’s CUDA between the eye points are used for aligning the faces.
platform and take advantage of the available GPU cards Automated pupil detection is achieved utilizing Haar
seamlessly. We take advantage of the existing GPU’s on features. Then the face region is masked and the pixel
the host platform to process HD resolution video in real- features are collected from these images. We use few
time. Some of the CPU intense operations such as hundred pixels as features with SVM classifiers.
background model generation, update and comparisons Our second method uses the Biologically Inspired
are inherently parallel and can be done much faster on Model (BIM) to extract illumination, scale, and shift
GPU’s resulting in about 89% drop in processing time for invariant features from face images. These features are
background update and a 77% drop in processing time for also known as HMAX features [25,30]. The BIM features
background compare on an average. can handle the partial occlusions, slight scale variation,
and changes in illumination, which are common problems
3. Video soft biometry in video surveillance. BIM is an advanced method that
The goal for the soft biometry classification is to computes features that exhibit a better trade-off between
categorize faces by gender, age and ethnicity based on the invariance and selectivity.
facial features and their geometry. Face images have been In the training the BIM model, we first compute
extensively used to identify the gender, ethnicity and other intermediate representation (C1) for all the positive
soft biometric traits in several studies [17,22]. Previous training images using Gabor filters. From this set of C1
literature use still images with datasets containing well images, template patches of different sizes are randomly
aligned frontal face images taken under controlled selected. The set of these template patches create a
lighting. Our algorithms for face based soft biometry dictionary defining a positive class. In other words, an
classification perform well on good to low quality video image with higher number of these template patches will
images [7]. We use two algorithms. The first algorithm more likely be an image from positive class.
utilizes pixel intensity values while the other one uses Once the dictionary is computed these template patches
Biologically Inspired Model (BIM) features for soft are used to compute features and train SVM. Each
biometry computation. Both of these methods use SVM template patch gives a feature, thus the number of features
for classification. is equal to the number of template patches in the
Extracting robust soft biometric features from dictionary. Each template patch is correlated over an
surveillance video is a challenging task due to lower entire image and the maximum correlation value defines
quality of data and uncontrolled environment factors, such the feature value. Features computed in this manner for
as changes in illumination, resolution, camera view angle, positive and negative training dataset is used to train SVM
occlusion, and shadows. Despite these challenges, one [31].
4. Example application Basic components of our approach to design and
developing the collaborative Smart Camera and Mote
Sensors in a Sensor Web framework are:
4.1. Smart sensing and video tracking  Use of inexpensive ad hoc netted sensors (motes) to
A prototype surveillance system is being built on cue and task video cameras to collect and process
Panoptes and VideoRecall framework and uses smart video only when a mote field detects and verifies
sensor motes, intelligent video, and Sensor Web activity.
technologies to aid in large area monitoring operations  Embedded video analytics processing with camera
and to enhance the security of borders [8]. The application (i.e., Smart Video Node) to push processing of
for the prototype system developed is envisioned as part video to the sensor edge for efficient use of
of a remote sensor network that can incorporate a range of bandwidth.
sensor types as illustrated in Figure 5.  Sensor Web Framework integrating smart video,
various inexpensive mote sensor tasks, various
sensors based on events and reception and
dissemination of alerts from multiple sensors.
Fortunately over the past decade, several technologies
have emerged to support low-cost ad hoc networks and
standardized worldwide data architectures, including
cellular technology, distributed processing, and sensor
web services [5, 12].
A mote is a wireless sensor device that represents a
single node in a wireless sensor network. Each node
consists of a small microcontroller, radio, and sensor
board. When a mote is powered on, it forms an ad hoc
mesh network with any neighboring nodes that are in
range. Wireless sensor networking (WSN) technology
allows for true ad hoc mesh networking of small sensor
devices. WSN extends the internet to the physical world,
allowing our system to interact with physical assets and
Figure 5: Sensor Web Framework with a smart video and mote perform automatic tasking of other sensors seamlessly,
sensor based event and threat detection. and remote sensor control (with verified authority).
This framework allows disparate, geographically
Numerous barriers exist that limit the effectiveness of dispersed sensors to be universally described, discovered,
surveillance video in large area protection; such as the accessed, and tasked over a network.
number of cameras needed to provide coverage, large
volumes of data to be processed and disseminated, lack of 5. Technology Evaluations
smart sensors to detect potential threats and limited
Both Panoptes and VideoRecall applications have been
bandwidth to capture and distribute video data. We are
in technology evaluations as briefly described below.
addressing these obstacles by employing a Smart Video
Node in a Sensor Web framework. Smart Video Node
(SVN) is an IP video camera with automated event 5.1. NIST-TRECVid 08 Video Event Detection
detection capability. The SVNs are cued by inexpensive Event detection algorithms built on Panoptes
sensor motes to detect the existence of humans or framework have been evaluated in NIST TRECVid 2008
vehicles. Based on sensor motes’ observation cameras are Video Event Detection task with 50 hours of surveillance
slewed in to observe the activity and automated video video from Gatwick airport. We developed algorithms for
analysis detects potential threats to be disseminated as three event detection tasks, namely “Elevator No-Entry”,
“alerts”. Sensor Web framework enables quick and “Opposing Flow”, and “Picture Take” that performed well
efficient identification of available sensors, collects data [35]. Our detection results were at the top for these three
from disparate sensors, automatically tasks various events.
sensors based on observations or events received from For Elevator No-Entry event detection we used Haar
other sensors, and receives and disseminates alerts from person detection followed by histogram matching to find
multiple sensors. person not entering an elevator. Histogram matching is
shown to perform well under pose variations as indicated
by our results.
In the Opposing Flow event the main challenge was to
handle severe occlusions and overlap. We used the
Panoptes tracking and minimized occlusions and overlaps
caused by other objects by considering only the top
portion of a door region. Since heads and shoulders are
smaller in area the occlusion effects were not severe.
Restricting the problem in this manner we were able to get
good tracking and detected the tracks in the specified
direction using our Panoptes “direction event spy”. We
incorporated a mechanism to reject outliers in direction
event spy to better handle tracking errors which
considerably improved the results.
Our Picture Take Event algorithm was based on
characteristic change in illumination caused by flash –
large increase in image intensity followed by almost equal
drop in intensity - for flash detection. Once a flash frame
is detected we find the location of the flash and used
matching over several frames to detect frames where the
hands are steady.

5.2. IEEE-VAST Challenge

VideoRecall plug-in for the Analyst Notebook was used
by Penn State University (PSU) team for IEEE VAST 09
(Visual Analytics Science and Technology) challenge.
The PSU team solved the challenge by utilizing automatic
forensic video content extraction.
The challenge problem: A data breach took place at a
foreign embassy. At least one meeting associated with this
case took place at locations captured by security cameras.
The surveillance video was 10 hours long, hence watching
Figure 6: i2 Analyst Notebook chart (top) with extracted
the video to manually detect the instances of people
objects people and vehicle using VideoRecall Plug-in from
meeting or vehicles involved would be an intractable task. surveillance video cameras (bottom).
Using VideoRecall’s person and vehicle detection the
PSU team identified a clandestine meeting between the [3] Avidan, S. “Subset selection for efficient SVM
suspect and an accomplice and the vehicle used as shown tracking”, Computer Vision and Pattern Recognition, 2003.
in Figure 6. Volume 1, 18-20, June 2003
[4] Bileschi, S. and Wolf, L., "Image representations beyond
6. Concluding Remarks histograms of gradients: The role of Gestalt descriptors,"
CVPR 2007.
We presented our extensible and flexible video content [5] Constantopoulos D., Johnston J., “2006 CCRTS The State
extraction framework with several content extraction tasks Of The Art And The State Of The Practice - Data Schemas
and configurable architecture that allows quick for Net-Centric Situational Awareness”, Cognitive Domain
development and deployment of end-user applications. Issues, Tech. Report, MITRE Corporation Bedford, MA ,
Advance features for these and other video content 2006.
extraction tasks are currently under development at [6] Dalal, N. and Triggs, B., "Histograms of oriented gradients
intuVision, Inc. for human detection," CVPR, 2005.
[7] M. Demirkus, K. Garg, S. Guler" Automated person
categorization for video surveillance using soft biometrics "
References SPIE Defense and Security Conference, April, 2010
[1] R. Allen, P. McGeorge, D. Pearson, and A. B. Milne. [8] S. Guler, T. Cole, J. Silverstein1, I. Pushee, S. Fairgrieve
Attention and expertise in multiple target tracking. Applied " Border Security and Surveillance System with Smart
Cognitive Psychology 18 (3): 337-347 Apr 2004. Cameras and Motes in a Sensor Web "
[2] D. Aldridge, Personal Communications at VACE-3 SPIE Defense and Security Conference, April, 2010.
Closeout meeting, Nov 2008, Washington DC.
[9] S. Guler and M. K. Farrow ,“Abandoned Object detection [27] Roberts, L., P., "The history of video surveillance,"
In Crowded Places", IEEE CVPR , New York City, NY., http://www.video-surveillance-guide.com.
June 18-23, 2006. [28] I. Saleemi, K. Shafique, and M. “Probabilistic Modeling of
[10] S. Guler, W.H. Liang, I. Pushee, “A Video Event Detection Scene Dynamics for Applications in Visual Surveillance,
and Mining Framework”, CVPR Workshop on Event “Shah IEEE Trans. PAMI, August 2009 Vol. 31 No. 8, pp
Mining”, June, 2003, Madison, Wisconsin, USA. 1492-1506.
[11] S.Guler, J.A. Silverstein, K. Garg, “Video Scene [29] Christian Schuldt, Ivan Laptev, Barbara Caputo,
Assessment with an Unattended Sensor Network" SPIE “Recognizing Human Actions: A Local SVM Approach”
Europe, Security and Defence, Florence, Italy September, Proc. . 17th Int. Conf. on Pattern Recognition (ICPR’04)
2007 [30] Serre, T., Wolf, L. and Poggio,T., "Object recognition with
[12] He T., Krishnamurthy S., et al.,“Energy-Efficient
features inspired by visual cortex, " CVPR, 2005.
Surveillance System Using Wireless Sensor Networks",
MobiSYS’04, June 6–9, 2004, Boston, Massachusetts, [31] Shakhnarovich, G., Viola, P. and Moghaddam , B., "A
USA, 2004. Unified Learning Framework for Real Time Face Detection
[13] S. Hongeng,R. Nevaita and F. Bremond, “Video-Based and Classification, " Proc. of Int. Conf. on Automatic Face
Event Recognition: Activity Representation and and Gesture Recognition, 2002.
Probabilistic Recognition Methods”, Computer Vision and [32] Viola, P., A. and Jones, M., "Rapid object detection using a
Image Understanding.96:pp 129-162, 2004. boosted cascade of simple features, " CVPR, 2001.
[14] Y. A. Ivanov, and A.F. Bobick, “Recognition of Visual [33] Viola, P., A. and Jones, M., J., "Robust Real-Time Face
Activities and Interactions by Stochastic Parsing,” IEEE Detection," Int. Journal of Computer Vision, 2004.
Trans. PAMI., vol. 22, no. 8, pp. 852-872, Aug. 2000.
[34] Yang, R., Welch, G.: Fast image segmentation and
[15] S.W. Joo, R. Chellappa, "Attribute Grammar-Based Event
smoothing using commodity graphics hardware. J. Graph.
Recognition and Anomaly Detection", IEEE International
Tools 7(4), pp. 91–100, 2002.
Workshop on Semantic Learning Applications in
[35] Yarlagadda, P., Demirkus, M., Garg, K. and Guler, S.,
Multimedia, 2006.
[16] T. Ko, “A survey on behavior analysis in video surveillance "Intuvision Event Detection Systems for TRECVID 2008,"
for homeland security applications “, Applied Imagery TREC Video Retrieval Evaluation Competition Conf., Nov.
Pattern Recognition Workshop, 37th IEEE AIPR pp1-8 Oct 17-18, Washington, DC.
2008. [36] O. Yilmaz, S. Guler, H. Ogmen. “Inhibitory Surround and
Grouping Effects in Human and Computational Multiple
[17] Lapedriza, A., Marin-Jimenez, M. and Vitria, J., "Gender
Object Tracking" SPIE Electronic Imaging Conference,
Recognition in Non Controlled Environments," ICPR, (3), Visualization and Perception, January, 2008, San Jose, CA
2006.
[18] R. Lienhart, W. Effelsberg. “Automatic Text Segmentation
and Text Recognition for Video Indexing”. ACM/Springer
Multimedia Systems, Vol. 8. pp.69-81, January 2000.
[19] Lienhart, R. and Maydt, J., "An Extended Set of Haar-like
Features for Rapid Object Detection," ICIP, 2002.
[20] B.D. Lucas and T. Kanade. An iterative image registration
technique with an application to stereo vision. In
Proceedings IJCAI, pp. 674--679, Vancouver, Canada,
1981.
[21] D. Marr, Vision. Freeman, Cambridge, MA 1982.
[22] Moghaddam, B. and Yang, M., H., "Learning Gender with
Support Faces," IEEE Transactions on Pattern Analysis and
Machine Intelligence, 24, 707–711 (2002).
[23] E. Osuna, R. Freund, and F. Girosit. “Training support
vector machines: an application to face detection”. In Proc.
Computer Vision and Pattern Recognition, pages 130–136,
June 1997.
[24] M. Pontil and A. Verri, “Support Vector Machines for 3D
Object Recognition”, IEEE Trans. on Pattern Analysis and
Machine Intelligence, vol. 20, no. 6, pp. 637-646, June
1998.
[25] Riesenhuber, M. and Poggio, T., "Hierarchical models of
object recognition in cortex," Nature Neuroscience, 2,
1999.
[26] J. Rittscher, J Kato, S Joga, and A Blake. “A probabilistic
background model for tracking” ECCV 2: 336-350, 2000.

Download
No ratings yet
Download
46 pages
NLP Final
No ratings yet
NLP Final
28 pages
Porikli2012 BookChapter Objectdetectionandtracking
No ratings yet
Porikli2012 BookChapter Objectdetectionandtracking
40 pages
SSRN 2174255
No ratings yet
SSRN 2174255
22 pages
Kar Das 2017
No ratings yet
Kar Das 2017
46 pages
My Real Time Surveillance System
No ratings yet
My Real Time Surveillance System
22 pages
Automated Video Surveillance Challenges and Soluti
No ratings yet
Automated Video Surveillance Challenges and Soluti
12 pages
Halooo
No ratings yet
Halooo
13 pages
HumanMotionAnalyzer Ok
No ratings yet
HumanMotionAnalyzer Ok
8 pages
Video Surveillance Systems
No ratings yet
Video Surveillance Systems
9 pages
Advances in Face Detection Techniques in Video
No ratings yet
Advances in Face Detection Techniques in Video
6 pages
Twenty Five Years of Real Time Surveillance Video Analytics: A Bibliometric Review
No ratings yet
Twenty Five Years of Real Time Surveillance Video Analytics: A Bibliometric Review
34 pages
Masked Face Detection Using The Viola Jones Algorithm: A Progressive Approach For Less Time Consumption
No ratings yet
Masked Face Detection Using The Viola Jones Algorithm: A Progressive Approach For Less Time Consumption
11 pages
2023 - A Frame Work Designing For Deep Fake Motion
No ratings yet
2023 - A Frame Work Designing For Deep Fake Motion
9 pages
21-Anomaly Detection in Surveillance Footage
No ratings yet
21-Anomaly Detection in Surveillance Footage
9 pages
Video Database: Role of Video Feature Extraction
No ratings yet
Video Database: Role of Video Feature Extraction
5 pages
Olatunji-Cheng2019 Chapter VideoAnalyticsForVisualSurveil
No ratings yet
Olatunji-Cheng2019 Chapter VideoAnalyticsForVisualSurveil
41 pages
User Interface To A CCTV Video Search System
No ratings yet
User Interface To A CCTV Video Search System
5 pages
Video Object Extraction
No ratings yet
Video Object Extraction
10 pages
Conception and Development of A Video Surveillance System For Detecting Tracking and Profile Analysis of A Person
No ratings yet
Conception and Development of A Video Surveillance System For Detecting Tracking and Profile Analysis of A Person
5 pages
Analyzing Surveillance Videos in Real-Time Using AI-Powered Deep Learning Techniques
No ratings yet
Analyzing Surveillance Videos in Real-Time Using AI-Powered Deep Learning Techniques
11 pages
Video Database For Face Recognition: P. Bambuch, T. Malach, J. Malach
No ratings yet
Video Database For Face Recognition: P. Bambuch, T. Malach, J. Malach
7 pages
The State of The Art in Image and Video Retrieval
No ratings yet
The State of The Art in Image and Video Retrieval
7 pages
DiVA A Distributed Video Analysis Framework Applied To Video-Surveillance
No ratings yet
DiVA A Distributed Video Analysis Framework Applied To Video-Surveillance
4 pages
Smart Surveillance: Applications, Technologies and Implications
No ratings yet
Smart Surveillance: Applications, Technologies and Implications
6 pages
Image Processing Paper
No ratings yet
Image Processing Paper
5 pages
10 1109icraie 2018 8710421
No ratings yet
10 1109icraie 2018 8710421
7 pages
Sensors: A Semantic Autonomous Video Surveillance System For Dense Camera Networks in Smart Cities
No ratings yet
Sensors: A Semantic Autonomous Video Surveillance System For Dense Camera Networks in Smart Cities
23 pages
Research Paper IOT of Video Survillence
No ratings yet
Research Paper IOT of Video Survillence
9 pages
Doaa Nasser Alghamdi-442202873
No ratings yet
Doaa Nasser Alghamdi-442202873
5 pages
Activity Extraction in Video Data
No ratings yet
Activity Extraction in Video Data
12 pages
Video Data Mining Framework For Information Retrieval Systems
No ratings yet
Video Data Mining Framework For Information Retrieval Systems
4 pages
A Robust Video Object Segmentation Scheme With Prestored Background Information
No ratings yet
A Robust Video Object Segmentation Scheme With Prestored Background Information
4 pages
Kode Password Database 2019
No ratings yet
Kode Password Database 2019
10 pages
Intelligent Video Surveillance System
No ratings yet
Intelligent Video Surveillance System
21 pages
1207 6774 PDF
No ratings yet
1207 6774 PDF
14 pages
GRDCF013016 PDF
No ratings yet
GRDCF013016 PDF
4 pages
GCN NetApp Gtech DigitalDialogue 2014
No ratings yet
GCN NetApp Gtech DigitalDialogue 2014
2 pages
Intelligent Video Surveillance System Architecture For Abnormal Activity Detection
No ratings yet
Intelligent Video Surveillance System Architecture For Abnormal Activity Detection
10 pages
Remote Video Tracking System Based On Computer Vision
No ratings yet
Remote Video Tracking System Based On Computer Vision
6 pages
Event Detection Using A Multimedia Datamining Framework:, Email
No ratings yet
Event Detection Using A Multimedia Datamining Framework:, Email
6 pages
Remote Based Intelligent Video Surveillance System
No ratings yet
Remote Based Intelligent Video Surveillance System
4 pages
Performance Comparison of Optical Flow and Background Subtraction and Discrete Wavelet Transform Methods For Moving Objects
No ratings yet
Performance Comparison of Optical Flow and Background Subtraction and Discrete Wavelet Transform Methods For Moving Objects
10 pages
Moving Object Detection For Video Surveillance
No ratings yet
Moving Object Detection For Video Surveillance
6 pages
IBM IVA White Paper
No ratings yet
IBM IVA White Paper
4 pages
Smart Cameras Enabling Automated Face Recognition in The Crowd For Intelligent Surveillance System
No ratings yet
Smart Cameras Enabling Automated Face Recognition in The Crowd For Intelligent Surveillance System
8 pages
IBM Intelligent Video Analytics
No ratings yet
IBM Intelligent Video Analytics
4 pages
Intelligent: Vision Processing in For Mass Transport Security
No ratings yet
Intelligent: Vision Processing in For Mass Transport Security
4 pages
A Survey of Intelligent Surveillance Systems
No ratings yet
A Survey of Intelligent Surveillance Systems
6 pages
Motion Based Summarization and Grouping of Events For Video Surveillance System
No ratings yet
Motion Based Summarization and Grouping of Events For Video Surveillance System
3 pages
Analytics Article Aug09
No ratings yet
Analytics Article Aug09
4 pages
Video Segmentation For Moving Object Detection Using Local Change & Entropy Based Adaptive Window Thresholding
No ratings yet
Video Segmentation For Moving Object Detection Using Local Change & Entropy Based Adaptive Window Thresholding
12 pages
A Study On Smart Video Security For Banks Using Mobile Remote Control
No ratings yet
A Study On Smart Video Security For Banks Using Mobile Remote Control
4 pages
Year 2022
No ratings yet
Year 2022
121 pages
10.1007@978 981 15 4032 5
No ratings yet
10.1007@978 981 15 4032 5
1,093 pages
WSN in The Future of IoT
No ratings yet
WSN in The Future of IoT
184 pages
Security Wireless NW
No ratings yet
Security Wireless NW
8 pages
Project Ideas
No ratings yet
Project Ideas
107 pages
IT6601 MC AU QP Reg 2013
No ratings yet
IT6601 MC AU QP Reg 2013
11 pages
Xiang Yu 2010
No ratings yet
Xiang Yu 2010
7 pages
EC6802 Wireless Networks
No ratings yet
EC6802 Wireless Networks
9 pages
Chapter 2 Types of Services and Requirements For Wireless Communication
No ratings yet
Chapter 2 Types of Services and Requirements For Wireless Communication
43 pages
Wireless Devices Project
No ratings yet
Wireless Devices Project
12 pages
Sahithi - Vanet
No ratings yet
Sahithi - Vanet
20 pages
Android Ad Hoc - Full - Must Read
No ratings yet
Android Ad Hoc - Full - Must Read
260 pages
Text Books
No ratings yet
Text Books
3 pages
ASN Unit-3 Part-1
No ratings yet
ASN Unit-3 Part-1
34 pages
Energy Management
100% (1)
Energy Management
29 pages
Iwn Unit-5 PPT
No ratings yet
Iwn Unit-5 PPT
40 pages
Wireless LAN - Computer Networks Questions & Answers - Sanfoundry
No ratings yet
Wireless LAN - Computer Networks Questions & Answers - Sanfoundry
7 pages
Li 2019 J. Phys. Conf. Ser. 1187 052067
No ratings yet
Li 2019 J. Phys. Conf. Ser. 1187 052067
10 pages
Ad-Hoc On Android: Bachelor - Software Technology
No ratings yet
Ad-Hoc On Android: Bachelor - Software Technology
74 pages
A Survey of Network Simulation Tools: Current Status and Future Developments
No ratings yet
A Survey of Network Simulation Tools: Current Status and Future Developments
6 pages
File Sharing Using Ad Hoc Network
0% (1)
File Sharing Using Ad Hoc Network
12 pages
NT Unit 3
No ratings yet
NT Unit 3
11 pages
Wirless Lan Technology
No ratings yet
Wirless Lan Technology
22 pages
Call For Papers - 14th International Conference On Mobile & Wireless Networks (MoWiN 2025)
No ratings yet
Call For Papers - 14th International Conference On Mobile & Wireless Networks (MoWiN 2025)
2 pages
Security Mechanism
No ratings yet
Security Mechanism
9 pages
Proactive vs. Reactive Routing
No ratings yet
Proactive vs. Reactive Routing
16 pages
Routing
No ratings yet
Routing
2 pages
16th International Conference On Wireless & Mobile Network (WiMo 2024)
No ratings yet
16th International Conference On Wireless & Mobile Network (WiMo 2024)
2 pages
Fuzzy Logic Based VANETS A Review On Smart Transportation System
No ratings yet
Fuzzy Logic Based VANETS A Review On Smart Transportation System
4 pages
A Comparative Study of Broadcasting Protocols in VANET: A.Noble Mary Juliet, Vijayakumar.S Joan Pavithra.R Kaleeswari.P
No ratings yet
A Comparative Study of Broadcasting Protocols in VANET: A.Noble Mary Juliet, Vijayakumar.S Joan Pavithra.R Kaleeswari.P
5 pages
Learning iOS Penetration Testing: Secure your iOS applications and uncover hidden vulnerabilities by conducting penetration tests
From Everand
Learning iOS Penetration Testing: Secure your iOS applications and uncover hidden vulnerabilities by conducting penetration tests
Swaroop Yermalkar
No ratings yet
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Machine Vision: Insights into the World of Computer Vision
From Everand
Machine Vision: Insights into the World of Computer Vision
Fouad Sabry
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Video Content Analysis: Unlocking Insights Through Visual Data
From Everand
Video Content Analysis: Unlocking Insights Through Visual Data
Fouad Sabry
No ratings yet
Digital Video Fingerprinting: Enhancing Security and Identification in Visual Data
From Everand
Digital Video Fingerprinting: Enhancing Security and Identification in Visual Data
Fouad Sabry
No ratings yet
Smart Camera: Revolutionizing Visual Perception with Computer Vision
From Everand
Smart Camera: Revolutionizing Visual Perception with Computer Vision
Fouad Sabry
No ratings yet

Video Analysis Application Centric View

Uploaded by

Video Analysis Application Centric View

Uploaded by

Who, What, When, Where, Why and How in Video Analysis:

An Application Centric View

Sadiye Guler Jay Silverstein, Ian Pushee, Xiang Ma,

2.3. Forensic analysis with VideoRecall

5.2. IEEE-VAST Challenge

You might also like