Adaptive Sparse Representations For Video Anomaly Detection
Adaptive Sparse Representations For Video Anomaly Detection
Abstract—Video anomaly detection can be used in the trans- interpretation and analysis of video data for surveillance and
portation domain to identify unusual patterns such as traffic law enforcement. An active area of research within this domain
violations, accidents, unsafe driver behavior, street crime, and
is video anomaly detection, which refers to the problem of
other suspicious activities. A common class of approaches re-
lies on object tracking and trajectory analysis. Very recently, finding patterns in data that do not conform to expected
sparse reconstruction techniques have been employed in video behavior and may warrant special attention or action.
anomaly detection. The fundamental underlying assumption of Two precursors to anomaly detection are an effective en-
these methods is that any new feature representation of a coding of events, and a systematic means of modeling normal
normal/anomalous event can be approximately modeled as a
events and normal event classes. The purpose of event encod-
(sparse) linear combination prelabeled feature representations
(of previously observed events) in a training dictionary. Sparsity ing is to extract features from the video that are most useful
can be a powerful prior on model coefficients but challenges in differentiating among different events. Xiang et al. [1] use
remain in the detection of anomalies involving multiple objects a 7-D vector to represent each moving blob. An expectation
and the ability of the linear sparsity model to effectively allow maximization (EM) algorithm is then used to cluster these
for class separation. The proposed research addresses both these
7-D vectors into a predefined number of clusters. Events
issues. First, we develop a new joint sparsity model for anomaly
detection that enables the detection of joint anomalies involving that cannot be clustered into any of these predefined clusters
multiple objects. This extension is highly nontrivial since it leads are regarded as anomalies. A ratio histogram approach is
to a new simultaneous sparsity problem that we solve using a proposed by Chuang et al. [2] to represent object features
greedy pursuit technique. Second, we introduce nonlinearity into, and suspicious events such as abandoned luggage are detected
that is, kernelize. The linear sparsity model to enable superior
using a finite state machine. Saligrama et al. [3] propose a
class separability and hence anomaly detection. We extensively
test on several real world video datasets involving both single and motion label representation to encode events with decisions
multiple object anomalies. Results show marked improvements facilitated via a two-state Markov chain model. Wang et al.
in detection of anomalies in both supervised and unsupervised [4] propose using hierarchical Bayesian models, in which the
scenarios when using the proposed sparsity models. video data is divided into so-called documents and events are
Index Terms—Anomaly detection, joint sparsity model, kernel subsequently encoded as quantized features, or words, within
function, outlier rejection. these documents. Simon et al. [5] encode events via spatio-
temporal volumes and employ decision trees to identify events.
I. Introduction Vaswani et al. [6] model the shape activities of objects by
a hidden Markov model (HMM) and define anomalies as
W ITH AN increasing demand for security and safety,
video-based surveillance systems are being increas-
ingly used in urban traffic locations. Vast amounts of video
a change in the shape activity model. Instead of extracting
moving features, Malinici et al. [7] extract features of the
scene as a whole and build an infinite HMM on these features
footage are collected and analyzed for traffic violations, ac-
to identify anomalies. An excellent review of video anomaly
cidents, crime, terrorism, vandalism, and other suspicious
detection techniques can be found in [3].
activities. Since manual analysis of such large volumes of data
Recent relevant work and challenges: Very recently,
is prohibitively costly, there is a desire to develop effective
sparse reconstruction techniques [8], [9] have been employed
algorithms that can aid in the automatic or semiautomatic
in video anomaly detection. The fundamental underlying as-
Manuscript received December 1, 2012; revised April 24, 2013 and June 12, sumption of these methods is that any new feature repre-
2013; accepted August 9, 2013. Date of publication August 29, 2013; date of sentation of a normal/anomalous event can be approximately
current version April 2, 2014. This work was supported by a grant from the
Xerox Research Center, Webster, NY, USA. This paper was recommended by modeled as a (sparse) linear combination prelabeled feature
Associate Editor J. Zhang. representations (of previously observed events) in a train-
X. Mo and V. Monga are with the Department of Electrical Engineering, ing dictionary. Li et al. use object trajectories while Zhao
Pennsylvania State University, University Park, PA 16802 USA (e-mail:
[email protected]; [email protected]). et al. use spatio-temporal volumes. Their work was motivated
R. Bala and Z. Fan are with the Xerox Research Center, Webster, NY 14580 by the sparsity-based face recognition approach of Wright
USA (e-mail: [email protected]; [email protected]). et al. [10], which claimed that sparse representations could
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. exhibit robustness to significant amounts of noise and face
Digital Object Identifier 10.1109/TCSVT.2013.2280061 occlusion. We note that the assertions of Wright et al. have
1051-8215 c 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
632 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 4, APRIL 2014
been challenged in recent work [11]–[13] that claim similar model in which a matrix (instead of a vector) of sparse
noise robustness without the use of l1 norm techniques that coefficients results. This extension is highly nontrivial
Wright et al. advocate and [8] and [9] employ. Both [8] and because the structure of this matrix of sparse coefficients
[9] show promise for the use of sparsity in video anomaly is not naturally row sparse. The model is meaningful in
detection but exhibit two important limitations. First, they only the multiobject scenario only when there is object-wise
address anomalies involving single objects. While such events correspondence in the linear combinations. To incorporate
arguably account for a large proportion of anomalies, there this very challenging constraint, we therefore develop and
are important scenarios wherein the anomaly arises from an solve a new simultaneous sparsity problem with the help
interaction among multiple objects. Consider, for example, of a new greedy pursuit algorithm.
two vehicles following nominal individual trajectories but 2) Because linear models are not always adequate, we pro-
approaching within a dangerously close vicinity of each other. pose a kernelization of our joint sparsity model. If the data
The second limitation, associated particularly with [8], is that set does not obey linear models, kernel methods that are
the anomalous events must be characterized a priori into their popular in learning can be applied to project the data into
own classes, that is, supervised anomaly detection in which a high-dimensional nonlinear feature space in which the
representative training for anomalous events is available. In data becomes more linearly separable [23]–[25]. Kernel
real word applications, it is often not possible to gather a orthogonal and basis pursuit algorithms [26], [27] and
sufficiently large number of training samples representing their applications [28] have been of much recent interest.
anomalous events. We develop a new kernelization of our joint sparsity
Contributions: We propose a novel and general trajectory- model for video anomaly detection. This involves the
based joint sparse reconstruction framework for video anomaly development of a numerical algorithm that does not
detection. Trajectories have long been popular in video analy- significantly increase complexity over the regular linear
sis and anomaly detection [14]–[17]. A common characteristic joint sparsity model.
of trajectory-based approaches [18]–[21] is the derivation of 3) Finally, a suitable outlier rejection measure is developed
nominal classes of object trajectories in a training phase, and for the multiple-object case that obviates the need to
the comparison of new test trajectories against the nominal build anomalous event classes and enables unsupervised
classes in an evaluation phase. A statistically significant devi- anomaly detection with high accuracy (note, labeled train-
ation from all classes indicates an anomaly. ing for normal events is still assumed available).
We must emphasize that our choice of trajectories (as We evaluate our algorithms by testing on several real trans-
opposed to spatio-temporal volumes, for example in [9]) as portation datasets for both single-object and multiple-object
the event encoder is motivated by two principal reasons: anomalies. Additionally, both supervised and unsupervised
1) interactions between multiple objects are quite naturally scenarios are used in testing. In the supervised case, training
captured in trajectory representations, for example, vehicles dictionaries contain both example normal and anomalous
approaching within a dangerously close vicinity of each other trajectories. In the unsupervised case, only normal event
can be caught and 2) recent advances in object tracking trajectories are available for training and the aforementioned
ensure that trajectory extraction is both fast and reliable [22]. outlier rejection measure is used for anomaly detection. The
Nevertheless, in theory, any event representation can be used test datasets include the well-known CAVIAR and AVSS
with our model. datasets [29], [30] and two transportation datasets provided by
Why sparsity: Our goal is to build a linear model in which Xerox Corporation. To benchmark our findings, we compare
joint trajectory representations of multiple objects are written our results against widely cited trajectory based techniques.
as linear combinations of corresponding joint trajectories in This includes the one class SVM-based method by Picarelli
a training dictionary. Sparsity is a powerful prior in this et al. [18] and the multiple-object tracking and anomaly de-
model for multiple reasons: 1) as in [8] and [9], when a new tection approach in Han et al. [31]. For single-object anomaly
collection of multiobject trajectories manifests, it is expected detection, we also compare against the recent proposal of Li
to invoke only a few columns of the training dictionary that et al. [8], which was the first approach to suggest sparsity
combine to create it; 2) even more crucially, object-wise for video anomaly detection. Our experimental results include
correspondence is important in the linear combination for confusion matrices and ROC curves that are obtained across a
this model to physically meaningful leading to a (nonstan- variety of real-world scenarios, which reveal that our proposed
dard) block-diagonal sparse structure on coefficients detailed sparsity models outperforms the alternatives.
later in Section III-B; and 3) finally, we observe that the The rest of the paper is organized as follows. Section II
sparse structure conveys information about normal/anomalous briefly reviews sparsity-based video anomaly detection as first
event classes—in the absence of training data for anomalous proposed by Li et al. Section III motivates and details our
events we can develop and use outlier rejection measures on central contribution, which is a joint sparsity model. This
the sparse coefficient matrix that can help with multiobject involves setting up a new simultaneous sparsity optimization
anomaly detection in unsupervised settings—a very challeng- problem that effectively captures representation of events as
ing problem. described by multiple trajectories. Section IV then presents
Our contributions over [8] and [9] are as follows. kernelization of this joint sparsity model and proposes a new
1) We focus on multiobject anomaly detection and extend algorithm for solving the optimization problem that results
the approaches in [8] and [9] toward a joint sparsity from the joint kernel sparsity model (JKSM). Section V
MO et al.: ADAPTIVE SPARSE REPRESENTATIONS FOR VIDEO ANOMALY DETECTION 633
events. Wang et al. [4] present an unsupervised framework to explain the structure of (4). In this situation, P = 2, K = 2,
using hierarchical Bayesian models to model individual events (4) becomes
and interactions between them. ⎡ ⎤
The sparsity-based approach reviewed in Section II is α1,1 0
⎢ α1,2 0 ⎥
powerful, but does not capture interactions to detect two or Y ≈ AS = [A1,1 A1,2 A2,1 A2,2 ] ⎢ ⎣ 0
⎥. (6)
more object anomalies. We describe next a new joint sparsity α2,1 ⎦
model for video anomaly detection that incorporates multiple 0 α2,2
object trajectories and their interactions. Hence, even if the The test trajectory sample is thought of as a collective event.
trajectories may be considered normal individually, collective Therefore, all trajectories of the sample should be classified
anomalies could occur and can be successfully detected in our into one class. If the first trajectory is classified into jth class,
proposed framework. the second trajectory should also be classified into jth class,
which means α1,j and α2,j should be activated simultaneously.
B. Joint Sparsity Model for Anomaly Detection This characteristic that some coefficients should be activated
We are interested in detection of anomalies involving P ≥ 1 jointly captures the interaction between objects.
objects. Their corresponding P trajectories can be represented Moreover, we only care about the nonzero element in the
as a matrix: Y = [y1 y2 . . . yP ] ∈ Rn×P , where yi correspond matrix S. Define a new matrix S
to the ith trajectory. The training dictionary can be defined
α1,1 α2,1
as A = [A1 A2 . . . AP ] ∈ Rn×PKT , where each dictionary S =
α1,2 α2,2
. (7)
Ai = [Ai,1 Ai,2 . . . Ai,K ] ∈ Rn×KT , i = 1, 2, . . . , P, is
formed by the concatenation of the sub-dictionaries from all In the structure of S , joint coefficients are moved into the
classes belonging to the ith trajectory. The crucial aspect of same row. The joint information can be captured by enforcing
this formulation is that the training trajectories for any class certain entire rows of S to be activated simultaneously.
j, that is, Ai,j , i = 1, 2, ..P, are observed jointly from example In general, when there are K classes and P objects, the
videos. This generalizes the set-up of [8] and [9]. structure of S is
The test P trajectories can now be represented as a linear ⎡ ⎤
α1,1 . . . αi,1 . . . αP,1
combination of training samples as ⎢ α1,2 . . . αi,2 . . . αP,2 ⎥
⎢ ⎥
Y ≈ AS S = ⎢ . .. .. .. .. ⎥ ∈ R
KT ×P
. (8)
⎣ . . . . . . ⎦
= [A1,1 A1,2 . . . A1,K . . . AP,1 AP,2 . . . AP,K ][α1 . . . αP ](4) α1,K ... αi,K ... αP,K
where the coefficient vectors αi lie in R PKT
and S = The question that remains to be addressed is the particular
[α1 . . . αi . . . αP ]. way of transforming S. Such a transformation is realized by
It is important to note that the ith object trajectory of any defining matrices H ∈ RPKT ×P and J ∈ RKT ×PKT as
observed set of test trajectories should only lie in the span of ⎡ ⎤
training trajectories corresponding to the ith object. Therefore, 1 0 ... 0
⎢ 0 1 ... 0 ⎥
the columns of S should have the following structure: ⎢ ⎥
H=⎢ . . .. .. ⎥ , J = IKT IKT . . . IKT .
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎣ .. .. . . ⎦
α1,1 0 0
⎢ α1,2 ⎥ ⎢ αi,1 ⎥ ⎢ 0 ⎥ 0 0 ... 1
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ (9)
⎢ .. ⎥ ⎢ αi,2 ⎥ ⎢ αP,1 ⎥
⎢ . ⎥ ⎢ ⎥ ⎢ ⎥ The vectors 1 and 0 are in RKT and contain all ones and zeros,
α1 = ⎢ ⎥,α = ⎢ . ⎥,α = ⎢ ⎥ (5)
⎢ α1,K ⎥ i ⎢ .. ⎥ P ⎢ αP,2 ⎥ respectively, and IKT is the KT -dimensional identity matrix
⎢ ⎥ ⎢ ⎥ ⎢ .. ⎥ ⎡ ⎤
⎣ 0 ⎦ ⎣ αi,K ⎦ ⎣ . ⎦
α1,1 . . . 0 ... 0
0 0 αP,K ⎢ α1,2 . . . αi,1 . . . 0 ⎥
⎢ ⎥
where each of the subvectors {αi,j }K ⎢ .. ⎥
j=1 , i = 1, 2, . . . , P lies in ⎢ . . . . αi,2 . . . αP,1 ⎥
RT , while 0 denotes a vector of all zeros in RKT . As a result, ⎢ ⎥
H◦S=⎢ .. ⎥. (10)
⎢ α1,K . . . . . . . αP,2 ⎥
S exhibits a block-diagonal structure. ⎢ ⎥
⎢ .. ⎥
From [8], we know that a single object’s trajectory can be ⎣ 0 ... α ...
i,K . ⎦
represented by a sparse linear combination of all the training 0 ... 0 ... αP,K
samples. For the multiple trajectories scenario, we assume that
training samples with nonzero weights (in the sparse linear Then, we have
combination) exhibit one–one correspondence across different J (H ◦ S) = S (11)
trajectories. In other words, if the ith trajectory training sample
from the jth class is chosen for the ith test trajectory, then it where the ◦ indicates the matrix Hadamard (entry-wise) prod-
is necessarily that other P − 1 trajectories chosen from the jth uct.
class with very high probability, albeit with possibly different Therefore, we can now solve for the sparse coefficients via
weights. the following optimization problem:
We take a simple scenario that only has two objects and two minimize J (H ◦ S) row,0
(12)
training classes (normal and anomalous class) as an example subject to Y − ASF ≤
MO et al.: ADAPTIVE SPARSE REPRESENTATIONS FOR VIDEO ANOMALY DETECTION 635
Frobenius norm.) It is worth reemphasizing that the matrix 1: initialization: residual R0 = Y, index set 0 : empty set,
S has a special structure—elements in the same row should iteration counter k = 1
be activated simultaneously, which is captured by using the 2: while stopping criterion has not been met
matrix · row,0 norm. Enforcing the sparsity of S (equiva-
1) Find the index of the atom that best approximates all
lently S) using other traditional matrix norms, for example,
residuals: λk = arg maxi Rk−1T
ai 2
entry-wise lp norms, can be detrimental to performance [33],
2) Update the index set k = k−1 ∪ {λk }
[34], particularly under the quadratic reconstruction constraint
3) Compute Gk = (ATk Ak )−1 ATk Y, ATk consists of the
because they could lead to solutions that depart significantly
k atoms in A indexed in k
from row sparsity.
4) Determine the residual Rk = Y − ATk Gk
The well-known row sparsity problem
5) k ← k + 1
minimize Srow,0 3: end while
(13)
subject to Y − ASF ≤ Output: Index set = k−1 , the sparse representation S
whose nonzero rows indexed by are k rows of the matrix
is nonconvex but can be solved using greedy pursuit algorithms (AT A )−1 AT Y
widely used in the literature. Simultaneous orthogonal match-
ing pursuit (SOMP) [34], [35]—enumerated in Algorithm 1—
is among the most popular algorithms used. In SOMP, the event Y as that which gives the minimum residual
support of the solution is sequentially updated (i.e., the atoms
in the dictionary A are sequentially selected). At each iteration, identity(Y) = arg min Y − Aδi (S)F (16)
i
the atom that simultaneously yields the best approximation to
all of the residual vectors is selected where δi (S) is the matrix whose only nonzero entries are the
same as those in S associated with class i (in all P trajec-
λk = arg max Rk−1
T
ai 2 . (14) tories). When sufficent representation (for example, training
i trajectories) for anomalous events is available, then anomalous
Our proposed joint sparsity model for representing multiple classes are simply one or more of the K classes in this joint
object trajectories involves solving (12), which looks quite sparsity-based classification framework.
similar to (13) but the Hadamard operator from S to S makes Unsupervised Anomaly Detection via Outlier Rejection
the problem much more involved. We can observe from Al- If training for anomalies is missing/statistically insignificant,
gorithm 1 that the original SOMP algorithm effectively gives we cannot use (16) to identify anomalies. Inspired by the
k0 distinct atoms from dictionary A that best approximates outlier rejection measure in [10]
the data matrix Y for k0 iterations, we apply the general K · maxi ρi (α)1 /α1 − 1
SCI(α) =
formulation even when the Hadamard operator is present. At K−1
every iteration k, SOMP measures the residual for each atom SCI(α) < τ1 → y : outlier (17)
in A and creates an orthogonal projection with the highest
and ρi (α) is the new vector whose only nonzero entries are
correlation.
the entries in α that are associated with class i. We model
This idea can be extended to our proposed joint sparsity
anomalies as outliers, given training from expected normal
setting. If the atom of the jth trajectory we selected comes
event classes that form the dictionary A. Equation (17) can
from the ith training, the other P − 1 atoms of trajectories
be used to detect single object anomalies. We extend it to the
should be also chosen from the ith training. Then (14) in
multiple object case
SOMP can be modified as
K · maxi δi (S )row,0 /S row,0 − 1
λk = arg max Rj,k−1
T
aj,i 2 (15) JSCI(S ) = (18)
i K−1
j
where JSCI is the joint-SCI. Note that 0 ≤ JSCI(S ) ≤ 1,
where T
Rj,k−1 refers to the residual of jth trajectory in iteration if JSCI(S ) is close to 0 and the event is normal, and if
k −1 and aj,i represents the ith training of jth trajectory. After JSCI(S ) is close to 1 and the event is anomalous. We choose
employing this special rule of choice for atom selection, each the threshold τ2 ∈ (0, 1), if JSCI(S ) < τ2 , a multiple object
row of parameter matrix S will be activated simultaneously anomaly is identified. A nominal choice of τ2 = 0.5 can be
or inactivated simultaneously; thus, the row sparsity require- made but this can further be optimized experimentally based
ments will inherently hold. The implementation details of this on the underlying video dataset by observing the range of the
algorithm can be found in a technical report [36]. measure JSCI(S ) for normal events.
Supervised Anomaly Detection as Event Classification Fig. 2 shows an example of sparse coefficients under normal
Having obtained the sparse coefficient matrix S, we compute vs. anomalous events. To the left of the figure are sparse
class-specific residual errors and identify the class of the test coefficients of an anomalous event. The activated coefficients
636 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 4, APRIL 2014
Algorithm 2 KSOMP
Input: Dictionary A = [a1 a2 . . . aPKT ], data matrix Y =
[y1 y2 . . . yP ], kernel function κ, the stopping criterion:
Yφ −Aφ Sφk F
k
Yφ −Aφ Sφk−1 F
>1−μ
k−1
1: initialization: compute the kernel matrices A ∈
RPKT ×PKT whose (i, j)-th entry is κ(ai , aj ) and A,Y ∈
RPKT ×P whose (i, j)-th entry is κ(ai , yj ), Set index set
0 = arg maxi (A,Y )i,: 2 and iteration counter t = 1.
2: while stopping criterion has not been met
1) Compute the correlation matrix
Fig. 2. Example illustration of an anomalous event versus normal event. C = A,Y −(A ):,t−1 ((A )t−1 ,t−1 +λI)−1 (A,Y )t−1 ,:
To the left of the figure are sparse coefficients of an anomalous event. The (21)
activated coefficients are scattered all over the normal classes. To the right of
the figure are sparse coefficients of a normal event. The activated coefficients 2) Select the new index as λt = arg maxi Ci,: 2
are clustered in normal class two. 3) Update the index set t = t−1 ∪ {λt }
4) t ← t + 1
3: end while
Output: Index set = t−1 , the sparse representa-
tion Sφ whose nonzero rows indexed by Sφ = (, +
λI)−1 (A,Y ),:
The effectiveness of the proposed joint sparsity model The problem in (20) can be approximately solved by kernel
largely depends on the structure of the trajectory data. If orthogonal/basis matching pursuit algorithms [26]–[28], [39].
the data is not linearly separable enough, the trajectory- Note that in the above problem formulation, we are solving
based sparsity model may not enable sufficiently accurate for the sparse vector α directly in the feature space using the
reconstruction to be reliable from a classification standpoint. implicit feature vectors, but not evaluating the kernel functions
Kernel-based algorithms [37] implicitly exploit the higher- at the training points.
order nonlinear structure of the data that is not captured by The well-known row sparsity problem in (13) can be
the linear models. Kernel methods can be applied to transform extended to the (kernelized) feature space as
the data into a feature space via a transformation φ(·) such minimize Sφ row,0
(22)
that the resulting transformed trajectory vectors becomes more subject to Yφ − Aφ Sφ F ≤ .
separable [23]–[25] and comply with the linear sparsity model.
The kernel function κ : Rn × Rn → R is usually defined as We propose the kernel SOMP algorithm (i.e., KSOMP) in
the inner product κ(x, z) = φ(x), φ(z). order to solve (22)—see Algoirthm 2. Note that a regular-
The benefits of kernel function are illustrated in the Fig. 3. ization term λI is added in Step 1 of Algorithm 2 in order
For the left figure, it is impossible to use a linear classifier to enable a stable inversion. Like SOMP (Algorithm 1), the
to classify these two types of data. However, after using the goal of KSOMP is to pick out nonzero rows that minimize
kernel function, the data is projected into a higher dimension Sφ row,0 but in the transformed kernel space.
space where it is more linearly separable. The kernelized version of our proposed joint sparsity model
Kernel Sparsity Model for Object Trajectory in (12) is given as
The transformed training trajectory vectors are written as minimize J H ◦ Sφ row,0
(23)
ai → φ(ai ), where ai is the ith column of A. Let φ(y) denote subject to Yφ − Aφ Sφ F ≤
MO et al.: ADAPTIVE SPARSE REPRESENTATIONS FOR VIDEO ANOMALY DETECTION 637
which is very similar to (22) but for the presence of the matri- Kernel Parameters Optimization
ces J, H and the Hadamard operator. Recall the modification We focus on picking parameters for the RBF kernel κ(x, z) =
e−γx−z , which will be used in all our experiments.
2
to SOMP employed in Section III; a similar trick will allow us
to adapt kernelized SOMP to yield a solution for our problem For different choices of the RBF parameter γ, multiple
in (23). In particular, instead of using the selection rule training dictionaries are generated Aφ (γ); that is, Aφ (γ) is
λt = arg max Ci,: 2 (24) a function of γ. Inspired by cross-validation, we split the
i
training data Aφ (γ) into two subsets Bφ (γ) and Cφ (γ), such
in Step 2 of KSOMP, we can jointly select atoms of trajectories that both Bφ (γ) and Cφ (γ) have representation from the K
from the the same training classes. Now if the dictionary in the sparsity model is chosen to
j
λt = arg max Ci,: 2 (25) be equal to Bφ (γ) and a (transformed) test trajectory is selected
i
j from Cφ (γ), then ideally we expect perfect classification into
j
where Ci,: refers to the correlation matrix of the jth trajectory. one of the K classes. Therefore, a good kernel is one that
Note that in the transformed space, the residual becomes will enable close to ideal classification of test samples from
Cφ (γ)—which means that only a small number of Ŝφ (γ) are
φ(y) − Aφ α 2
n
activated (nonzero) and for one particular class. Here the
= [ (φ(y)i − (Aφ α )i )2 ] 2
1
outlier rejection measure
i=1
n KT K · maxi δi (Ŝ φ (γ))1 /Ŝ φ (γ)1 − 1
JSCI(Ŝ φ (γ)) = (29)
αj (Aφ )i,j )2 ] 2
1
= [ (φ(y)i − K−1
i=1 j=1
n KT KT where δi (Ŝ φ (γ)) is the vector whose only nonzero entries
= [ (φ(yi )2 − 2φ(y)i αj (Aφ )i,j + αj (Aφ )i,j )]
1
2 are the same as those in Ŝ φ (γ) associated with class i.
i=1 j=1 j=1 JSCI(Ŝ φ (γ)) will be very close to 1 if the classification is
n KT n n KT accurate. Therefore the best parameter γ can be chosen by
αj αj (Aφ )i,j )] 2
1
= [ φ(yi )2 −2 φ(y)i (Aφ )i,j + ( solving the following kernel parameter optimization problem:
i=1 j=1 i=1 i=1 j=1
21 arg max JSCI(Ŝ φ (γ)), (30)
T T γ
= κ(y, y) − 2α θA,y + α A α .
Computational Complexity
Let Ŝφ be the optimum solution of (23). The residual for Suppose that the dimension of trajectory is n: y ∈ Rn .
the kernelized joint sparsity model corresponding to the ith For the SOMP Algorithm 1, the complexity is O(nKPT )
class is then given by [40] in our set up—where K, P, and T were defined in
p Section III and refer to the number of classes, objects, and
1
Yφ − Aφ δi (Ŝφ )F = ( (φ(yj ) − Aφ (δi (Ŝφ )):,j 2 )2 ) 2 (26) training samples per class, respectively. Changing the selection
j=1 rules in our modified version of SOMP will not increase the
complexity much. Therefore, the computational complexity of
where φ(yj ) is the jth column of Yφ and (δi (Ŝφ )):,j is the our joint sparsity model is also ≈ O(nKPT ). The kernelized
jth column of δi (Ŝφ ). More precisely in terms of the kernel joint sparsity model has to compute the kernel matrix A
function, this is given by
p and the correlation matrix C. The computational complexity
ri (Yφ ) = κ(yj , yj ) − 2(δi (Ŝφ ))T:,j (A,Y ) i,j will therefore increase to O(nKP 2 T ). Note that the number of
j=1
objects P is usually a small number. Therefore, the kernel trick
21 does not significantly increase complexity over sparsity-based
+ (δi (Ŝφ ))T:,j (A ) (δi (Ŝφ )):,j anomaly detection while providing significant performance
i, i
improvements, as will be seen in Section V.
where i is the index set associated with the ith training class.
Supervised Anomaly Detection as Event Classification V. Experimental Validation
Similar to (16), the class of Yφ is determined by
A. Trajectory Extraction
identity(Yφ ) = arg min Yφ − Aφ δi (Ŝφ )F (27)
i Trajectory extraction is accomplished using well-known
where δi (Ŝφ ) is the matrix whose only nonzero entries are the techniques. First, background subtraction is accomplished via
same as those in Ŝφ associated with class i. the use of a Gaussian mixture model (GMM) [41]. In order to
Unsupervised Anomaly Detection via Outlier Rejection eliminate the effect of noise, blob analysis is then used here
As in Section III, we can define a transformed coefficient to identify the location of the moving vehicle. We calculate
matrix Ŝ φ = J(H ◦ Ŝφ ). The outlier rejection measure in (18) the number of connected foreground pixels and deem the
can then be extended here as connected segment to be a vehicle if this exceeds a threshold.
As seen in Fig. 4(a), the car is successfully detected by this
K · maxi δi (Ŝ φ )row,0 /Ŝ φ row,0 − 1 technique. Next, we calculate and track the centroid of the blob
JSCI(Ŝ φ ) = . (28)
K−1 over time in order to obtain the object trajectory. Fig. 4(b)
638 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 4, APRIL 2014
Fig. 5. Illustration of one class SVM method by Piciarelli et al. [18] on 2-D
data. The classification hyperplane intersect the hypersphere, thus defining
Fig. 4. Trajectory extraction. (a) Background subtraction using Gaussian an hyperspherical cap containing the majority of the data, while outliers lie
mixture models and blob analysis to identify objects. (b) Blob centroid outside the cap. (From [18], with permission.)
calculation and trajectory derivation by collecting the blob centroid.
shows an example of the extracted trajectory, which is rep- where J(yi , αi , A) is the objective function that measures how
resented mathematically as a coordinate pair [x(t), y(t)]. Li normal an event is. It includes reconstruction error, sparsity
et al. [8] use a least-squares cubic spline curves approximation regularization via l1 norm, and smoothness terms. Subsequent
(LCSCA) representation of trajectories. We first approximate to the optimization, the dictionary A is augmented using newly
a raw trajectory using a basic B-spline function [42] with 50 observed events.
knots (50 x-coordinates and 50 y-coordinates) and these knots For anomaly detection, they define a threshold ˆ that con-
are extracted to represent the trajectory. trols the sensitivity of the algorithm to anomalous events. A
test spatio-temporal volume y will be detected as anomalous
B. Brief Review of Competing Approaches event if the following criterion is satisfied:
Here we elaborate on three widely cited techniques from
the literature that will be used as benchmarks to compare our J(y , α , A∗ ) > ˆ . (33)
approach. Our implementation of Zhao’s method is based on their
Anomaly Detection via SVM Trajectory Clustering pseudocode in Algorithm 1 on page 4 of their paper [9].
Piciarelli et al. [18] propose a technique that uses one Multiobject Trajectory Tracking and Anomaly Detection
class SVMs for anomaly detection by utilizing trajectory Han et al. [31] propose a multiple object tracking algorithm
information as features. and corresponding rule-based anomaly detection approach. In
In their method, trajectories are represented using eight their tracking algorithm, Each object is identified by index i
pairs of x and y coordinates, thus leading to feature vectors and its state at time t is represented by
composed of 16 elements. Then, these trajectories are clustered
with a one-class SVM. During the training phase, a classifica- xit = (pti , vti , ait , sit ) (34)
tion hyperplane is learned. Fig. 5 shows an illustration of their
where pti is the image location; cit represents the 2-D velocity;
method on 2-D data. Based on the classification hyperplane,
and ait and sit denote the appearance and scale of object
an outlier detection technique can be used to detect anomalies.
i at time t, respectively. pti and vti use continuous image
If the angle θX between a test trajectory X and the center C
coordinates. Then, they use a HMM as the probabilistic model
greater than a threshold, this test trajectory X is regarded as
to maximize the joint probability between the state sequence
an anomaly.
and the observation sequence.
θX > θth → X : outlier. (31) For anomaly detection, they first collect the information
from tracking results including the number of objects, their
Online Anomaly Detection using Sparsity Model
motion history and interaction, and the timing of their behav-
Zhao et al. [9] propose an online sparsity-based method for
iors. Then they interpret events based on the basic information
video anomaly detection. They use sparse linear combinations
about who (how many), when, where, and what. Based on
of spatio-temporal volumes under an l1 sparsity model. But
this interpretation, anomaly can be defined by some rules. For
instead of using a fixed training dictionary and only solving
example, in a traffic intersection scenario, there are always
for the sparse coefficients, they employ a principled convex
zero to eight cars around. If at a time 15 cars arrive at this in-
optimization formulation that allows both a sparse reconstruc-
tersection simultaneously, it can be regarded as an anomalous
tion code, and an online dictionary to be jointly inferred and
event.
updated
m
C. Video Datasets and Intuitive Illustration
(α∗1 , ..., α∗m , A∗ ) = arg min J(yi , αi , A) (32) As discussed above, if an anomaly is generated by single
α1 ,...,αm ,A
i=1 object, we can call it single-object anomaly. Fig. 6(a) and (b)
MO et al.: ADAPTIVE SPARSE REPRESENTATIONS FOR VIDEO ANOMALY DETECTION 639
Fig. 6. Example frames of single-object anomalies. (a) Man suddenly falls on floor from the CAVIAR dataset. (b) Driver backs his car in front of stop sign
from the Xerox stop sign dataset.
show consecutive frames from two examples of single-object in Fig. 7(a), the Xerox stop sign dataset–multiobject ex-
anomalies. In Fig. 6(a), a man suddenly falls on the floor ample in Fig. 7(b), and the Xerox intersection dataset in
[Fig. 6(a2)] when walking across the lobby. In Fig. 6(b), Fig. 7(c). We also compare our experimental results against
instead of turning left or right in front of the stop sign, four well-recognized techniques in trajectory-based anomalous
the driver suddenly backs his car—see Fig. 6(b3)–(b4). By event detection: 1) the recent sparsity-based technique of Li
setting the number of objects P = 1, our joint sparsity model et al. [8], 2) the approach of Piciarelli et al. [18] using
reduces to the model by Li et al [8] and can be used to detect one class SVMs, 3) the sparsity model with online dictio-
anomalous trajectories of individual (single) objects. nary learning of Zhao et al. [9], and 4) the multiple-object
On the other hand, if an anomaly happens via the interaction tracking and rule-based anomaly detection technique of Han
of multiple objects, we call it a multiple-object anomaly. et al. [31].
The video clip frames from three examples of multiple-object Benefits of sparsity under object occlusion or missing
anomalies are shown in Fig. 7(a)–(c). In Fig. 7(a), a pedestrian trajectory information:
crossing the street loses his hat and retraces his footsteps Because of the limitation of the camera’s visual angle,
to pick it up from the road. At this time, a vehicle comes occlusions often occur in video data. Fig. 8(a) and (b)
in very close proximity to the pedestrian and comes to a show two examples of occlusions. In Fig. 8(a), a car is
sudden halt—see Fig. 7(a2). In Fig. 7(b), the second vehicle occluded by another car (we call it a moving occlusion)—see
(marked by a red rectangle) comes to a complete stop when Fig. 8(a3). In Fig. 8(b), the car is occluded by the stop sign
waiting for the vehicle in front of it [Fig. 7(b1)], but does (static occlusion)—see Fig. 8(b2).
not actually stop at the stop sign—see Fig. 7(b2)–(b3). In Tracking algorithms continue to strive to improve and
Fig. 7(c), a car fails to yield to oncoming car while turning perform well even under occlusions [43]–[45]. Since this is a
left—see Fig. 7(c2)–(c3). The examples in Fig. 7(a) and (b) fundamentally difficult problem, occlusion occurs in practice
are in fact from a real-world transportation database (which and leads to missing or corrupted trajectory data. Because our
cannot be made public for proprietary reasons), which we refer optimization problems in (12) and (23) are well-conditioned,
to as the Xerox stop sign database. An example video clip perturbation theory arguments apply and lend our method
is, however, made available at: http://youtu.be/M6 PJigg5CY robustness under limited trajectory occlusion. In particular,
and the example in Fig. 7(c) comes from another proprietary consider
transportation database that we address as the Xerox intersec- Ŝo = arg min J (H ◦ So ) row,0
tion database. A representative video clip is made available at: (35)
subject to Yo − ASo F ≤
http://youtu.be/ZGKtkVtWEFU.
Our proposed algorithm for multiobject trajectory-based where Yo is a set of occluded test trajectories and So denotes
anomaly detection is called JKSM, or equivalently kernel the corresponding sparse coefficients. If Yo − Y2 < η then
sparsity model (KSM) in the case of single object anomaly Ŝo − Ŝ2 < ζ, i.e., a small perturbation to the problem should
detection. We test the KSM and JKSM algorithms on several cause only a small perturbation to the solution [46]. It means
challenging video datasets. For single-object anomalies, we that if the occluded trajectory set Yo is not very different from
test on the CAVIAR [29] dataset in Fig. 6(a) and the Xe- the original trajectory set Y, then the optimized sparse coef-
rox stop sign video database—represented in Fig. 6(b). For ficients under trajectory occlusion will be close enough to Ŝ,
multiple-object anomalies, we test on the AVSS [30] dataset i.e., the solution in the absence of occlusion. Because anomaly
640 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 4, APRIL 2014
Fig. 7. Example frames of multiple-object anomalies. (a) Vehicle almost hits a pedestrian, from the AVSS dataset. (b) Car (marked by the red rectangle)
violates the stop sign rule from the Xerox stop sign dataset. (c) Car fails to yield to oncoming car while turning left from the Xerox intersection dataset.
detection rests on the structure of the sparse coefficient matrix, and Zhao et al. for this example owing to the nonlinearity in
this intutitevly provides occlusion robustness. the sparsity model as introduced by the use of the kernel. A
In addition, we perform a simple experiment to illustrate more thorough evaluation of the three methods on real-world
this. We work with the Xerox stop sign dataset and 95 tra- databases is reported next.
jectories (including 76 training trajectories without occlusion
and 19 occluded trajectories) obtained from 39 video clips. D. Experimental Results
Our training dictionary consists of nine normal trajectory 1) Detection Rates for Single-Object Anomaly Detection:
class (containing eight trajectories each) and one anomalous For the CAVIAR dataset, we test on 27 video clips from which
trajectory class (containing four trajectories). An independent 170 trajectories are extracted. Our training dictionary consists
set of 13 normal but occluded trajectories, and six anomalous of ten normal trajectory classes and three anomalous trajectory
but occluded trajectories, are used to test our approach. Fig. 9 classes, each containing ten different training trajectories. A
shows an example of occluded trajectories from Xerox stop total of 21 normal trajectories and 19 anomalous trajectories
sign data. Note that occluded trajectory locations are replaced are used as independent test data.1 Because training for anoma-
by a constant value (e.g., zero) for the duration that object lous events is well-represented in this database, (3) is used to
tracking is lost. This is a common characteristic of frame- classify the test trajectories as normal or anomalous. Our pro-
based tracking approaches [41], [47], [48]. posed KSM method employs the RBF function as kernel. The
RBF function is defined as κ(x, z) = e−γx−z , for γ > 0. The
2
Because training for anomalous events is well-represented
in this database, (3) is used to classify the test trajectories. confusion matrices of KSM are compared with the approach
The RBF function is chosen as the kernel for the KSM in [18], the sparsity model by Li et al. in [8], and Zhao et al.
algorithm. The confusion matrices of four methods—KSM [9] in Table II. First, we note the benefits of sparsity-based
and the techniques by Picarelli et al., Li et al., and Zhao anomaly detection in the form of improved detection rates of
et al. are reported in Table I. Note that all Li et al. and Li et al. and Zhao et al. over Piciarelli’s approach. Second,
Zhao et al., which are also a sparsity-based anomaly detection we conjecture that Zhao et al. performs mildly better than
technique, as well the proposed KSM methods, do better than 1 All training and test trajectories for both normal and anomalous trajectories
Picarelli et al. owing to the robustness of sparse coefficients are manually hand labeled by video analysis experts. The training and test
under occlusion. Further, KSM is mildly better than Li et al. sets picked for computing detection accuracy are completely non-overlapping.
MO et al.: ADAPTIVE SPARSE REPRESENTATIONS FOR VIDEO ANOMALY DETECTION 641
Fig. 8. Occlusion examples. (a) Moving occlusion: a car is occluded by another car. (b) Static occlusion: a car is occluded by the stop sign.
TABLE I
Confusion matrices of proposed and state-of-the art trajectory-based methods on Xerox stop sign occluded data:
supervised, single-object anomaly detection. KSM detects anomalies using (3)
TABLE II
Confusion matrices of proposed and state-of-the art trajectory-based methods on CAVIAR data:
supervised, single-object anomaly detection. KSM detects anomalies using (3)
TABLE III
Confusion matrices of proposed and state-of-the art trajectory-based methods on the Xerox stop sign data:
supervised, single-object anomaly detection. KSM detects anomalies using (3)
Li et al. because they can optimize and update dictionaries. normal trajectories and eight anomalous trajectories are used
Third, KSM serves to further improve detection rates by virtue to test our approach. Table III shows the confusion matrices
of the use of a nonlinear kernel that enhances the accuracy of of four approaches. Again, the benefits of classification using
the underlying sparsity reconstruction—which is assumed to sparsity model versus SVM-based classifier are readily appar-
be linear by Li et al. and Zhao et al.. ent. As with the CAVIAR data, a comparison of Li et al., Zhao
For the Xerox stop sign dataset, 118 trajectories from et al., and KSM demonstrates a significant advantage brought
39 video clips are extracted. The training dictionary comprises about by employing the nonlinear kernel.
nine normal trajectory classes (containing eight trajectories 1) ROC Curves: In order to test the robustness of the
each) and one anomalous trajectory class (containing four proposed method, we remove all the training for anomalous
trajectories). Again, we have training for anomalies so we use events from the dictionary and use all 40 test trajectories on
(3) to classify a given test trajectory. An independent set of 34 the CAVIAR dataset and all 42 test trajectories on the Xerox
642 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 4, APRIL 2014
Fig. 11. ROC curves for unsupervised single-object anomaly detection (Xe-
rox stop sign video dataset). KSM detects anomalies using (17), where τ1
varies from 0.1 to 0.9.
TABLE IV
Confusion matrices of proposed and state-of-the art trajectory-based methods on the Xerox stop sign data: unsupervised
multiple-object anomaly detection. The proposed JKSM detects anomalies using (27) and a threshold value of 0.5
TABLE V
Detection rates of proposed and state-of-the art methods on AVSS data: unsupervised multiple-object anomaly
detection. The proposed JKSM detects anomalies using (27) and a threshold value of 0.5
TABLE VI
Confusion matrices of proposed and state-of-the art
trajectory-based methods on the Xerox intersection dataset:
supervised, multiple-object anomaly detection.
The proposed JKSM detect anomalies using (26)
which anomalies are precharacterized into classes, the anomaly [18] C. Piciarelli, C. Micheloni, and G. Foresti, “Trajectory-based anomalous
event detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 18,
detection reduces to a sparsity-based classification problem. In no. 11, pp. 1544–1554, Nov. 2008.
the more realistic unsupervised scenario in which anomalies [19] W. Hu, T. Tan, L. Wang, and S. Maybank, “A survey on visual
cannot be sufficiently precharacterized, the anomaly detection surveillance of object motion and behaviors,” IEEE Trans. Syst., Man,
Cybern., vol. 34, no. 3, pp. 334–352, Aug. 2004.
is accomplished via a multiobject outlier rejection measure. [20] I. Junejo, O. Javed, and M. Shah, “Multifeature path modeling for video
The merits of a principled joint sparsity model for multiobject surveillance,” in Proc. IEEE Conf. Pattern Recognit., vol. 2, Aug. 2004,
anomaly detection are strongly corroborated via experiments pp. 716–719.
[21] Z. Fu, W. Hu, and T. Tan, “Similarity based vehicle trajectory clustering
on challenging real-world databases. Additionally, we propose and anomaly detection,” in Proc. IEEE Conf. Image Process., vol. 2, Sep.
a kernelization of the joint sparsity model so as to further 2005, pp. 602–605.
improve anomaly detection in which linear sparse reconstruc- [22] A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey,” ACM
Comput. Surveys, vol. 38, no. 4, article no. 13, Dec. 2006.
tion models do not hold directly. While we use fixed, expert [23] C. J. Burges, “A tutorial on support vector machines for pattern
designed training dictionaries that be updated periodically, recognition,” Data Min. Knowl. Discovery, vol. 2, no. 2, pp. 121–167,
online dictionary updates/learning, for example, in [9], is a 1998.
[24] A. J. Smola and B. Schölkopf, “On a kernel-based method for pattern
useful extension of our work and can be pursued in future recognition, regression, approximation, and operator inversion,” Algo-
research. rithmica, vol. 22, nos. 1–2, pp. 211–231, 1998.
[25] K. Grauman and T. Darrell, “The pyramid match kernel: Discriminative
classification with sets of image features,” in Proc. IEEE Conf. Comput.
References Vision, vol. 2, Oct. 2005, pp. 1458–1465.
[1] T. Xiang and S. Gong, “Video behavior profiling for anomaly detection,” [26] P. Vincent and Y. Bengio, “Kernel matching pursuit,” Mach. Learn.,
IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 5, pp. 893–908, vol. 48, nos. 1–2, pp. 165–187, 2002.
May 2008. [27] V. Guigue, A. Rakotomamonjy, and S. Canu, “Kernel basis pursuit,” in
[2] C.-H. Chuang, J.-W. Hsieh, L.-W. Tsai, S.-Y. Chen, and K.-C. Fan, Proc. Eur. Conf. Mach. Learn., 2005, pp. 146–157.
“Carried object detection using ratio histogram and its application to [28] S. Gao, I. W.-H. Tsang, and L.-T. Chia, “Kernel sparse representation for
suspicious event analysis,” IEEE Trans. Circuits Syst. Video Technol., image classification and face recognition,” in Proc. Eur. Conf. Comput.
vol. 19, no. 6, pp. 911–916, Jun. 2009. Vision: Part IV, 2010, pp. 1–14.
[3] V. Saligrama, J. Konrad, and P. Jodoin, “Video anomaly identifica- [29] “Caviar datasets.” [Online]. Available: http://homepages.inf.ed.ac.uk/
tion,” IEEE Signal Process. Mag., vol. 27, no. 5, pp. 18–33, Sep. rbf/CAVIARDATA1/
2010. [30] “Avss2007 datasets.” [Online]. Available: ftp://motinas.elec.qmul.ac.uk/
[4] X. Wang, X. Ma, and W. Grimson, “Unsupervised activity perception in pub/iLids/
crowded and complicated scenes using hierarchical Bayesian models,” [31] M. Han, W. Xu, H. Tao, and Y. Gong, “An algorithm for multiple
IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 3, pp. 539–555, object trajectory tracking,” in Proc. IEEE Conf. Comput. Vision Pattern
Mar. 2009. Recognit., Jun. 2004, pp. 864–871.
[5] C. Simon, J. Meessen, and C. De Vleeschouwer, “Visual event recog- [32] R. Baraniuk, “Compressive sensing,” IEEE Signal Process. Mag.,
nition using decision trees,” Multimedia Tools Appl., vol. 50, no. 1, vol. 24, no. 4, pp. 118–121, Jul. 2007.
pp. 95–121, Oct. 2010. [33] J. Chen and X. Huo, “Theoretical results on sparse representations of
[6] N. Vaswani, A. Roy-Chowdhury, and R. Chellappa, “Shape activity: multiple-measurement vectors,” IEEE Trans. Signal Process., vol. 54,
A continuous-state hmm for moving/deforming shapes with application no. 12, pp. 4634–4643, Dec. 2006.
to abnormal activity detection,” IEEE Trans. Image Process., vol. 14, [34] J. A. Tropp, A. C. Gilbert, and M. J. Strauss, “Algorithms for
no. 10, pp. 1603–1616, Oct. 2005. simultaneous sparse approximation. Part 1: Greedy pursuit,” Signal
[7] I. Pruteanu-Malinici and L. Carin, “Infinite hidden Markov models for Process., vol. 86, no. 3, pp. 572–588, 2006.
unusual-event detection in video,” IEEE Trans. Image Process., vol. 17, [35] J. Tropp and A. Gilbert, “Signal recovery from random measurements
no. 5, pp. 811–822, May 2008. via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53,
[8] C. Li, Z. Han, Q. Ye, and J. Jiao, “Abnormal behavior detection via no. 12, pp. 4655–4666, Dec. 2007.
sparse reconstruction analysis of trajectory,” in Proc. IEEE Int. Conf. [36] C. Jeon, V. Monga, and U. Srinivas, “A greedy pursuit approach to
Image Graph., Aug. 2011, pp. 807–810. classification using multitask multivariate sparse representations,” Penn.
[9] B. Zhao, L. Fei-Fei, and E. Xing, “Online detection of unusual events in State Univ., PA, USA, Tech. Rep., 2012.
videos via dynamic sparse coding,” in Proc. IEEE Conf. Comput. Vision [37] B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector
Pattern Recognit., Jun. 2011, pp. 3313–3320. Machines, Regularization, Optimization, and Beyond. Cambridge, MA,
[10] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face USA: MIT Press, 2001.
recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. [38] “Kernel machines.” [Online]. Available: http://en.wikipedia.org/wiki/
Intell., vol. 31, no. 2, pp. 210–227, Feb. 2009. File:Kernel Machine.png
[11] D. Zhang, M. Yang, and X. Feng, “Sparse representation or collaborative [39] X.-T. Yuan and S. Yan, “Visual classification with multitask joint
representation: Which helps face recognition?” in Proc. IEEE Conf. sparse representation,” in Proc. IEEE Conf. Comput. Vision Pattern
Comput. Vision, 2011, pp. 471–478. Recognit., Jun. 2010, pp. 3493–3500.
[12] R. Rigamonti, M. Brown, and V. Lepetit, “Are sparse representations [40] J. Tropp and S. Wright, “Computational methods for sparse solution of
really relevant for image classification?” in Proc. IEEE Conf. Comput. linear inverse problems,” Proc. IEEE, vol. 98, no. 6, pp. 948–958, Jun.
Vision Pattern Recognit., 2011, pp. 1545–1552. 2010.
[13] Q. Shi, A. Eriksson, A. van den Hengel, and C. Shen, “Is face [41] C. Stauffer and W. Grimson, “Learning patterns of activity using
recognition really a compressive sensing problem?” in Proc. IEEE Conf. real-time tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22,
Comput. Vision Pattern Recognit., 2011, pp. 553–560. no. 8, pp. 747–757, Aug. 2000.
[14] N. Johnson and D. Hogg, “Learning the distribution of object trajectories [42] G. Knott, Interpolating Cubic Splines. Birkhäuser, 2000.
for event recognition,” Image Vision Comput., vol. 14, pp. 583–592, [43] D. Koller, J. Weber, and J. Malik, “Robust multiple car tracking with
1996. occlusion reasoning,” in Proc. Eur. Conf. Comput. Vision, vol. 800,
[15] P. Kumar, S. Ranganath, H. Weimin, and K. Sengupta, “Framework for 1994, pp. 189–196.
real-time behavior interpretation from traffic video,” IEEE Trans. Intell. [44] L. Torresani, D. Yang, E. Alexander, and C. Bregler, “Tracking and
Transp. Syst., vol. 6, no. 1, pp. 43–53, Mar. 2005. modeling non-rigid objects with rank constraints,” in Proc. IEEE Conf.
[16] D. Makris and T. Ellis, “Learning semantic scene models from observing Comput. Vision Pattern Recognit., vol. 1, 2001, pp. 493–500.
activity in visual surveillance,” IEEE Trans. Syst., Man, Cybern., vol. 35, [45] T. Yang, Q. Pan, J. Li, and S. Li, “Real-time multiple objects tracking
no. 3, pp. 397–408, Jun. 2005. with occlusion handling in dynamic scenes,” in Proc. IEEE Conf.
[17] W. Hu, X. Xiao, Z. Fu, D. Xie, T. Tan, and S. Maybank, “A system for Computer Vision Pattern Recognition, vol. 1, Jun. 2005, pp. 970–975.
learning statistical motion patterns,” IEEE Trans. Pattern Anal. Mach. [46] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.:
Intell., vol. 28, no. 9, pp. 1450–1464, Sep. 2006. Cambridge Univ. Press, 2004.
MO et al.: ADAPTIVE SPARSE REPRESENTATIONS FOR VIDEO ANOMALY DETECTION 645
[47] M. Piccardi, “Background subtraction techniques: A review,” in Proc. Raja Bala (M’10) received the Ph.D. degree in
IEEE Conf. Syst., Man, Cybern., vol. 4, Oct. 2004, pp. 3099–3104. electrical engineering from Purdue University, West
[48] A. Adam, E. Rivlin, and I. Shimshoni, “Robust fragments-based Lafayette, IN, USA.
tracking using the integral histogram,” in Proc. IEEE Conf. Comput. He is a Principal Scientist with the Xerox Research
Vision Pattern Recognit., vol. 1, Jun. 2006, pp. 798–805. Center, Xerox Innovation Group, Webster, NY, USA.
He holds more than 150 patents and has a number
of publications in the field of color and imaging. His
Xuan Mo (S’09) received the master’s degree in au- research interests include color management, novel
tomation from Tsinghua University, Beijing, China, image rendering techniques, image processing for
in 2010. He is currently pursuing the Ph.D. de- augmented reality applications, and image and video
gree with the Department of Electrical Engineer- analytics.
ing, Pennsylvania State University, University Park, Dr. Bala is a fellow of the Society of Imaging Science and Technology.
PA, USA, and his current work focuses on video
anomaly detection for transportation application.
His research interests include computational color
and imaging, signal processing, and computer vision.