1 s2.0 S0968090X14000606 Main
1 s2.0 S0968090X14000606 Main
1 s2.0 S0968090X14000606 Main
a r t i c l e i n f o a b s t r a c t
Article history: We describe a real-time highway surveillance system (RHSS), which operates autonomously
Received 16 August 2012 to collect statistics (speed and volume) and generates incident alerts (e.g., stopped vehicles).
Received in revised form 25 February 2014 The system is designed to optimize long-term real-time performance accuracy. It also
Accepted 26 February 2014
provides convenient integration to an existing surveillance infrastructure with different
levels of service. Innovations include a novel 3-D Hungarian algorithm which is utilized for
object tracking and a practical, hands-off mechanism for camera calibration. Speed is
Keywords:
estimated based on trajectories after mapping/alignment with respect to dominant paths
Autonomous traffic surveillance
Camera calibration
learned based on an evolutionary dynamics model. The system, RHSS, is intensively
Vehicle tracking evaluated under different scenarios such as rain, low-contrast and high-contrast lightings.
Hungarian algorithm Performance is presented in comparison to a current commercial product. The contribution
is innovation of new technologies that enable hands-off calibration (i.e., automatic detection
of vanishing points) and improved accuracy (i.e., illumination balancing, tracking via a new
3-D Hungarian algorithm, and re-initialization of background detection on-the-fly). Results
indicate the capability and applicability of the proposed system in real-time and real-world
settings.
Ó 2014 Elsevier Ltd. All rights reserved.
1. Introduction
Intelligent traffic systems (ITS) have been expanding with incorporation of multiple technologies into vehicles, roadway,
highway, tunnel and bridge surveillance. Such include image processing, pattern recognition, electronics and communication
technologies. These have been employed for monitoring traffic conditions, reducing congestion, enhancing mobility, and
increasing safety. Vision-based technology is a state-of-the-art approach with advantages of easy maintenance, real-time
visualization and high flexibility compared with other existent technologies. This makes it one of the most popular
techniques in ITS for traffic control. The most recent widely read description of vision-based highway surveillance is that
of Coifman et al. (1998). The last one and a half decades have witnessed improvements in cameras, communications, and
video analytics. This paper presents an update to the latter.
Modern commercial vision-based surveillance systems have improved considerably since the tripwire systems described
by Coifman et al. (1998). Many commercial systems may be tailored to monitor either freeways, arterials, bridges, or tunnels
(Econolite’s Autoscope Solo Terra, Traficon’s VIP/T, Citilog’s XCam-I, Kapsch Trafficom AG’s VR-2). There are about two dozen
products by approximately a dozen vendors that have a freeway option. In contrast to the earlier survey cited (Coifman et al.,
1998), today’s systems all incorporate vehicle tracking. Incidents reportable by the systems include wrong-way driver
http://dx.doi.org/10.1016/j.trc.2014.02.018
0968-090X/Ó 2014 Elsevier Ltd. All rights reserved.
Y. Wan et al. / Transportation Research Part C 44 (2014) 202–213 203
(Traficon’s VIP/D), stopped vehicle (Econolite’s Autoscope Solo Terra), and roadway debris (Iteris’s Abacus). Nearly all sys-
tems collect data on speed and volume (Iteris’s Versicam, Traficon’s Traficam Collect-R among others). A few collect data
on occupancy (Econolite’s RackVision Terra, Iteris’s Vantage Edge 2, PEEK’s VideoTrak 900) and yet fewer collect data on
vehicle class (Iteris’s Abacus) or gap time (Traficon’s VIP/T).
Many researchers have devoted efforts to investigating different solutions for extracting important traffic information via
image and video analysis. Important traffic information includes speed (Mao et al., 2009; Cathey and Dailey, 2005), volume
(Pan et al., 2010) and incident detection (Zou et al., 2009). However, for any traffic surveillance system to be capable of such
high-level functionality it must be able to detect objects and track them correctly. Additionally, the camera must be cali-
brated such that the real-world distances can be calculated and utilized for speed estimation.
Normally, for vehicle detection, most methods (Gupte et al., 2002; Beymer et al., 1997; Hsu et al., 2004) assume that the
camera is static even if it is a pan-tilt-zoom camera. Given the assumption, foreground vehicles can be detected by image
differencing between current frame and estimated background. In order to maintain low computation complexity we have
followed this direction. Moreover, we applied a gamma correction based mechanism to suppress sudden illumination
changes.
After vehicles are detected, a vehicle tracking mechanism is needed. As mentioned in a tracking survey (Yilmaz et al.,
2006), objects are normally represented as points, primitive geometric shapes, object silhouettes and contours, articulated
shape models, and skeletal models. Objects in a highway traffic scenario are mostly rigid and a point is an efficient model for
representation. The traffic surveillance system proposed in Coifman et al. (1998) detects corner feature points to represent
vehicles in order to deal with occlusion. However, corner features may not be easily detected when image quality is low. For
instance, a strategy is to reduce transmission bandwidth by reducing resolution or the effects environmental variations. The
latter, often consisting of heavy rain or snow, can affect the accuracy and robustness of vehicle analysis. Due to concerns
mentioned previously, we utilized the center of the mass to represent objects. Here a 3-D Hungarian tracker is designed
to associate points (center of mass) over time. This is effective in dealing with streaming video having low frame rate in order
to reduce transmission bandwidth.
Shadow removal is also an issue which proves to be a major source of error in detection and classification. Especially the
shadows of large trucks prevent smaller, adjacent vehicles from being detected successfully. Most traffic surveillance sys-
tems detect and remove shadows or assume the analyzed sequence includes no shadows. We developed an approach pro-
posed in Xiao et al. (2007) to remove shadows.
Camera calibration is the process of estimating camera parameters so that pixel points in camera coordinates can be
mapped into real-world coordinates. In real-time traffic applications the process is more difficult than usual. Perspective
effects cause vehicle geometric features such as length, width, and height not to be constant. In other words, different posi-
tions at which cameras are installed give different perception angles for each lane. Researchers (Kanhere and Birchfield,
2010) have evaluated a number of both manual and automatic camera calibration techniques and from them we chose
the two-vanishing-points-and-known-width (VVW) approach. Practical reasons are: (1) through trajectory analysis one van-
ishing point can be easily estimated; (2) it requires no manual measurement as would methods starting from camera height
or length of a line segment on the pavement; (3) width of a lane is easily accessible because every U.S. state strictly follows
highway lane width standards1 and highway lanes can be easily found through trajectory analysis; and (4) performance-wise,
VVW is both accurate and stable. The camera perspective is preferred to be one of the typical views shown in Fig. 1(a) and (c).
Basically the height of the camera should be sufficient so that the camera captures a high-angle view such that the vertical
occlusion is minimized. Moreover, the horizontal distance from the camera to the road surface, Dx, should be reasonable so that
horizontal occlusion is also minimized. In Fig. 2, both poor and reasonable camera perspectives from different viewing angles
are illustrated.
The rest of this paper is organized as follows. In the next section, an overview of the entire system is given. Then, initial-
ization (i.e., learning the road structure and camera calibration) is discussed in Section 3. Section 4 describes the operational
phase including vehicle tracking, data collection, and event detection. Then, Section 5 reports experimental results over 24
lane hours under various lighting and weather conditions. Finally, the conclusion appears in Section 6.
1
http://www.fhwa.dot.gov/policyinformation/statistics/2008/hm39.cfm.
204 Y. Wan et al. / Transportation Research Part C 44 (2014) 202–213
Fig. 2. (a) Camera perspective with low angle due to camera height; (b) camera perspective with view distant from roadway due to a large dx and (c) the
ideal camera perspective.
2. RHSS overview
RHSS is based upon assumptions that assure performance as expected. (1) In general most roads, particularly freeways,
conform to a uniform standard. A typical structure is illustrated in Fig. 1. As shown in Fig. 1(b) and (d) most highways are
bidirectional (d1: oncoming traffic, d2: departing traffic) and the road segment between the camera and the dashed line is
both straight and horizontal. It performs as expected if the observed highway segment is typical as shown in Fig. 1(b) and (d).
(2) An acceptable camera perspective is necessary. RHSS is able to detect and track vehicles, then automatically calibrate the
camera with real-time collected trajectories. Thus it can operate autonomously to collect traffic statistics (speed and volume)
and generate alerts (stopped vehicle detection) when incidents occur.
Fig. 3 depicts the workflow for RHSS. The input of the system is streaming highway traffic video and prior knowledge
about road structure. The prior knowledge includes lane width, the number of lanes for both traffic directions and angle
of the leading edge of an exemplary car. (Selection of an exemplary car is discussed in Section 3.3.) The system starts with
the initialization phase and transitions to the operational phase. During the initialization phase, the background must be
estimated quickly. With a known background, the system begins to detect and track vehicles and trajectories which are col-
lected for further analysis for road structure understanding such as autonomous camera calibration and detection zone gen-
eration. During the operational phase, with the camera now calibrated and detection zones defined, the system collects
traffic data and detects events.
During the initialization phase, free-flow traffic is preferred for achieving an accurate background estimation. Estimation
assumes that the background is the most prevalent value at each pixel location. Nominal daytime conditions are preferred.
The most important step is to estimate perpendicular vanishing points based on detected car edges. Extreme weather con-
ditions hinder accurate calibration. Therefore the system should be initialized under conditions of free-flowing traffic and
normal weather during daytime. Fortunately during the operational phase the system functions with fewer restrictions
but performance degrades as visibility deteriorates. In general, video analytic techniques are sensitive to image visibility;
accordingly the better the visibility the better RHSS will perform.
3. Initialization phase
In this section we begin by describing the assumptions of the camera position and lighting. The next step is extracting
knowledge of the road structure. The calibration of the camera is discussed as well as how camera calibration is used to
improve performance.
We assume that the system operates only when the traffic pan-tilt-zoom (PTZ) camera is stabilized at a fixed position.
When the camera is static, moving vehicles can be detected through background subtraction. As described before, when
the system starts the background will be estimated quickly. A sequence of pixel intensity values is acquired at each location
from consecutive frames. The value at each locus that appears most often is assumed to be its background value. A temporal
median filter (Zhang et al., 2007) is applied. The lighting condition changes slowly throughout the day, however. Thus, it is
necessary to reestablish the background periodically such as every hour (T bgr ¼ 60 min). It should be mentioned that during
the initialization phase, background estimation is performed within 30 s.
Given the estimated background, the foreground is detected (i.e., the vehicles) by subtracting the background from each
frame. Using the difference image, we apply thresholding with a parameter n. In other words, we locate the pixels for which
the difference is larger than n. Because the background is estimated over a short time period, any sudden change of illumi-
nation affects the result. Automatic gain control (AGC) is basically a form of amplification. Illumination adjustment should
not be confused with visibility improvement. The latter is associated with discerning objects at greater distances (Babari
et al., 2012). When the lighting condition deteriorates the camera will begin to boost the signal to compensate. With the
exception of the passing of clouds, the built-in AGC is the most frequent cause for sudden and unexpected illumination
changes. Images which are not properly corrected can appear either bleached out or too dark. Thus in order to deal with sud-
den illumination changes gamma correction is applied to balance the illumination from frame to frame. In this way, it is
assured that the brightness of the background image is similar to each incoming frame. The foreground is detectable in a
much more robust way.
There is another illumination issue. The sun causes objects/vehicles to cast shadows at certain times of the day. As the sun
climbs/descends, the lengths of shadows change. The existence of shadow affects the description of the foreground objects.
Thus, we adapted the real-time shadow removal technique proposed in Xiao et al. (2007) directly on the detected
foreground.
While others have developed these components – median filtering for background detection, gamma correction for
energy equalization, and shadow removal – their synthesis here is new. The combination, each being computationally inex-
pensive, enables frequent reinitialization which, in turn, maintains a better background estimation, and that improves over-
all accuracy of detection and tracking.
During the initialization phase, objects/vehicles are tracked as shown in Fig. 4(b). Raw trajectories are noisy and a filtering
scheme is applied to keep only valid trajectories. There are two assumptions here: (1) In general most roads, particularly
freeways, conform to a uniform standard as shown in Fig. 1 and (2) vehicles within a certain distance of the traffic camera
are easier to track and regions of interest (ROIs) should be predefined for specifying an effective tracking region as in Fig. 4(b)
(dashed green rectangle). Therefore only trajectories of vehicles traveling directly across the ROI of a scene without changing
lanes or other unexpected movement are considered valid for discovering road structure.
A trajectory (a sequence of points) is fitted to a straight line. The validity (v) of a trajectory is calculated as v ¼ rl where r is
the ratio of length of a given trajectory to the length of fitted trajectory segment within the specified ROI. As illustrated in
Fig. 4(b) r is the ratio of a to b. The variable, l, is a measure of the linearity of the trajectory based on fitting errors. Valid
trajectories (those with higher v values) are collected for analysis. In Fig. 4(c), valid trajectories collected in a sample initial-
ization phase are marked in blue. The aim of trajectory analysis is to model the traffic scene in order to obtain the road struc-
ture (spatial knowledge) and to learn motion patterns (spatio-temporal knowledge) of objects/vehicles.
The first task is to estimate the parallel vanishing point. In order to learn road structure we adapted the unsupervised
dominant-set based clustering technique (Pavan and Pelillo, 2007) to hierarchically cluster trajectories collected in the ini-
tialization phase into K dominant sets. Practically, K is known a priori. Fig. 4(c) illustrates a set of trajectories that are clus-
tered into four dominant sets. Each cluster represents one highway lane and from each dominant set of trajectories we derive
one representative trajectory in green as shown in Fig. 4(c) by median filtering – eliminating most of the noise. Also we
derive one detection zone from each dominant set of trajectories as in Fig. 4(d).
Camera calibration is a process of estimating camera parameters such as f (focal length), hp (pan angle), ht (tilt angle), and
h (height of camera). As a general model, we adopted the two-vanishing-points based camera calibration (Schoepflin and
206 Y. Wan et al. / Transportation Research Part C 44 (2014) 202–213
Fig. 4. Learning the road structure: (a) background estimated during the initialization phase; (b) trajectory for one vehicle being tracked; (c) dominant
paths learned from clustering collected trajectories; and (d) detection zones generated automatically.
Dailey, 2003) which also requires a lane width value. The introduction of mathematical programming to locate the first van-
ishing point displaces ad hoc methods (Beymer et al., 1997). Locating the second vanishing point without manual input
enables hands-off system initialization.
Real world parallelism appears to converge to a point in an image – the vanishing point in the image plane. We consider
trajectories of vehicles to be parallel to one another in the real world with the assumption that most vehicles maintain lane
discipline. This enables the establishment of the first vanishing point, V par or ‘‘parallel vanishing point.’’ Unavoidably the tra-
jectories, even after filtering, are somewhat noisy. Thus, we adapted the Levenberg–Marquardt (Schoepflin and Dailey, 2003)
optimization algorithm for estimation of V par .
To describe the optimization problem, assume there are n dominant paths represented by n trajectories having slopes
ðk1 ; k2 ; . . . ; kn Þ. Choose an arbitrary horizonal line relative to image coordinates, v ¼ v h . This line intersects the trajectories
at ððu1 ; v h Þ; ðu2 ; v h Þ; . . . ; ðun ; v h ÞÞ. The objective function chooses ðu0 ; v 0 Þ that minimizes the aggregate slope divergence of
the n dominant paths.
X
n
ðu0 ; v 0 Þ ¼ arg min jki ðv h v 0 Þ=ðui u0 Þj ð1Þ
u 0 ;v 0 i¼1
We have found that greater the distance between v 0 and v h , the better subsequent performance (e.g., speed calculation). As
shown in Fig. 4(c), the parallel vanishing point V par ¼ ðu0 ; v 0 Þ is circled in black.
Similar to the parallel vanishing point, the perpendicular vanishing point, V per , is a point of convergence. It must be con-
structed from lines perpendicular to the highway lanes. Based on the assumption that the road is not inclined, i.e., on the plane
z ¼ 0 in real-world coordinates, the general VVW method requires two vanishing points constructed from two pairs of parallel
lines. Let V par ¼ ðu0 ; v 0 Þ and V per ¼ ðu1 ; v 0 Þ. Note that given z ¼ 0, the two vanishing points share the vertical image coordinate
value, v 0 . With only a single degree of freedom, we can estimate u1 with a single line perpendicular to the direction of traffic.
However, given the noisy nature of line extraction, we select several candidates and reduce to a single estimate.
With the assumption that leading edges of vehicles are approximately perpendicular to the highway lane we collect a set
of several perpendicular lines. This is accomplished by detecting the leading edges of vehicles using the Canny edge detection
followed by an application the Hough transform. In the image plane the intersection of every perpendicular line, ‘i , and the
ðiÞ
line v ¼ v 0 gives one candidate for V per ¼ ðu1 ; v 0 Þ. The perpendicular vanishing point candidates intersect the line v ¼ v 0
with noise. While one can apply the Levenberg–Marquardt optimization algorithm once more, we chose a simpler approach.
ð1Þ ð2Þ ðNÞ
We collect the u-coordinates of all the perpendicular point candidates fu1 ; u1 ; . . . ; u1 g. (N is the number of candidate per-
pendicular lines.) Values greater than 2 standard deviations from the mean are discarded and the remaining ones are aver-
aged for the final V per estimation.
Y. Wan et al. / Transportation Research Part C 44 (2014) 202–213 207
In addition to V par ðu0 ; v 0 Þ and V per ðu1 ; v 0 Þ, a real-world distance, D, between any two known parallel lines in the image
plane is needed for estimating the following camera parameters: f (focal length), hp (pan angle), ht (tilt angle), and h (height
of camera). Every U.S. state strictly follows highway lane width standards, thus we are able to utilize this information as a
known value (together with V par and V per ) prior to calibrating the camera. This is explained in Eqs. (2)–(5).
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
f ¼ ðv 0 v 0 þ u0 u1 Þ ð2Þ
v0
ht ¼ arctan ð3Þ
f
u0 cosðht Þ
hp ¼ arctan ð4Þ
f
f a sinðht Þ w
h¼ ; where a ¼ ð5Þ
cosðhp Þ D
In Eq. (5) a is a scale factor relating distance in image coordinates to distance in real-world coordinates; here the scale factor
a is calculated as the ratio of lane width w in pixels and D the real-world distance between two neighboring lanes (3.6 m by
default).
4. Operational phase
During the operational phase, the camera now calibrated and detection zones ready, the system collects traffic data
(speed and volume) and detects events (stopped vehicles). This entails accurate tracking of vehicles.
The Hungarian algorithm (Kuhn, 1955) is a classical method for the ‘‘assignment problem’’ such as the 2-D matching of
assigning persons to tasks. We generalized the classical 2-D Hungarian algorithm to an n-D Hungarian and applied a 3-D
Hungarian algorithm to solve the tracking problem. The additional tracking points increase accuracy. Its development dis-
places the dominant use of the more computational intensive Kalman filtering and the matching of point triples leads to
increased accuracy.
To formulate the problem, with the foreground successfully detected we calculate the center of mass to serve as a rep-
resentative of each object/vehicle detected. So given, we formulate the problem of object tracking as a motion correspon-
dence problem (Veenman et al., 2007) or, more generally, assignment problem (Kuhn, 1955). Approaches should be able
to deal with entering and exiting (initiation and termination of tracking), false points (due to errors in detection) and missing
points (due to occlusions). Given a sequence of n frames denoted by F t1 ; F t2 ; . . . ; F tn . Each frame, F tk , has a set of points (centers
of mass). The aim is to establish one-to-one object assignments for consecutive n frames (n P 2) such that proximate uni-
formity and smooth motion constraints are best preserved.
The larger the number of selected frames n, the greater the computational burden. Motion constancy can only be reflected
over a minimum of three frames, thus we choose n to be 3. As illustrated in Fig. 5(a), the motion of two objects is captured in
three consecutive frames F tk1 ; F tk , and F tkþ1 . In the figure, both objects are moving in the directions indicated by the arrows.
Tracking of these two moving objects is interpreted as establishing frame-to-frame matchings. Three objects, one from each
of the three consecutive frames, must be put in correspondence. As in Fig. 5(a), tracking of the two objects is illustrated as
matchings in solid and dashed lines, respectively.
Initiation and termination of tracking is incorporated; detection errors and occlusions are handled by logical reasoning.
Objects will not match unless proximal uniformity is satisfied and the motion is regular. Tracking is initialized to tolerate
absent points due to occlusion using a Kalman filter to predict the location of a missed object over a limited number of
frames (nmissed ). If the predicted location of a missed object fails to locate it, the object is assumed to have exited the scene.
(Kalman filtering is used for predicting missing objects and if a search does not identify occluding vehicles.)
Vehicle tracking across n frames can be visualized as a perfect matching within an n-partite graph. The vertices of each
partite instantiate the vehicles detected in one frame. The edges of an assignment indicate subtrajectories for vehicles. A per-
fect assignment is the selection of edges such that no edge shares endpoints with another and every vertex is an endpoint for
at least one edge. The algorithm is illustrated in Fig. 5(a) for the case of three vehicles tracked with the assumption that the
same vehicles are in the scene throughout F t1 ; F t , and F tþ1 . A cost must be computed for each pair of edges that comprise a
subtrajectory. The cost is calculated in terms of motion changes (direction and speed) and displacement. The cost of estab-
lishing one association over three vertices, one from each partite, is determined by Eqs. (6)–(9). The total cost of the match-
ing (i.e., across all assignments) is given in Eq. (10).
Let h½t be the set of vehicles identified in F t . Let hj½t be the jth vehicle in h½t . In most cases, jh½t1 j ¼ jh½t j ¼ jh½tþ1 j. If this is
not the case, pad the smaller sets with virtual vehicles having infinite displacement from real vehicles. We now describe the
assignment cost matrix pictured in Fig. 5(b) for which the axes X; Y, and Z are indexed by the vehicle sets h½t1 ; h½t , and h½tþ1 ,
respectively.
208 Y. Wan et al. / Transportation Research Part C 44 (2014) 202–213
Let Du ði; jÞ be the displacement in the u direction between hi½t1 and hj½t (i.e., Du ði; jÞ ¼ hj½t hi½t1 ). Likewise, we define
Dv ði; jÞ. Thus Du ðj; kÞ and Dv ðj; kÞ are relationships between F t and F tþ1 expressed in image coordinates. The Euclidean distance
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 2 2 2
between hi½t1 and hj½t (hj½t and hk½tþ1 ) is Dði; jÞ ¼ Du ði; jÞ þ Dv ði; jÞ (Dðj; kÞ ¼ Du ðj; kÞ þ Dv ðj; kÞ ). The cost of one assign-
ment hi½t1 ! hj½t ! hk½tþ1 is
C ijk ¼ bi 5direction ði; j; kÞ þ b2 5speed ði; j; kÞ þ b3 5displacement ði; j; kÞ ð6Þ
P3
where each bi is a nonnegative weight and i¼1 bi ¼ 1. The other components are defined in similar form to that of Punzo
et al. (2011) without the error biases.
A single match, hi½t1 ! hj½t ! hk½tþ1 , is compliant with respect to direction if one straight line crosses the assignment’s
ðu; v Þ-positions in all three frames. The nearer to 0 (zero) of Eq. (7), the more compliant the assignment.
Du ði; jÞ Du ðj; kÞ þ Dv ði; jÞ Dv ðj; kÞ
5direction ði; j; kÞ ¼ 1 ð7Þ
Dði; jÞ Dðj; kÞ
A single match is compliant with respect to speed if the Euclidean distances (based on the inter-frame vehicle assignments)
are approximately the same between the positions in consecutive frames. The value from Eq. (8) approaches 0 (zero) as this
condition is satisfied.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Dði; jÞ Dðj; kÞ
5speed ði; j; kÞ ¼ 1 2 ð8Þ
Dði; jÞ þ Dðj; kÞ
A single match is compliant with respect to displacement if the total distance traveled is ‘‘reasonable.’’ Reasonable, of course,
is a heuristic concept. Constants, automatically chosen, are used to normalize the measure.
Dði; jÞ þ Dðj; kÞ
5displacement ði; j; kÞ ¼ ð9Þ
dt1;t þ dt;tþ1
where dt1;t and dt;tþ1 are the constants.
For a bipartite graph having m vertices per partite, there exists m! perfect matchings. For an n-partite graph, there are
½m!n1 perfect matchings. Thus, the number of perfect matchings for each operational instance within RHSS is jh½t !j2 . Let
c ¼ fc1 ; c2 ; . . . ; cjh½t !j2 g. In turn, cq ¼ fi1 ; i2 ; . . . ; ijh½t j g for which each i‘ consists of a single hi½t1 ! hj½t ! hk½tþ1 and the set
cq is a perfect (tripartite) matching. Viewed in the context of the cost matrix of Fig. 5(b), each i‘ is one C ijk component. Each
cq is a set of C ijk components such that no pair share an axis (X; Y, or Z) value. The solution to the tracking problem may be
expressed as:
X
arg min ¼ C ijk ; q ¼ 1; 2; . . . ; jh½t !j2 ð10Þ
cq ði;j;kÞ2cq
In the description that follows, an X-plane of the C matrix is the set of components having the same value for the index i. A
Y- or Z-plane is defined analogously with respect to the indices j and k. Note that the index set i; j, and k define one i‘ . In
matrix terms, a set, cq ¼ fi1 ; i2 ; . . . ; ijh½t j g, for which no pair has the same i; j, or k value is termed a traversal.
Table 1
Characteristics of test videos.
Begin with apparent short shadows, long weak shadows, and night scene
20 May 2011 12:37–15:47 PM
Start: For each k 2 f1; 2; . . . ; jh½tþ1 jg; C ijk C ijk C kmin where i 2 f1; 2; . . . ; jh½t1 jg; j 2 f1; 2; . . . ; jh½t jg and C kmin ¼ minij C ijk
(i.e., for each Z-plane, subtract the minimum component from each component).
Step 1: Choose the minimum number of X-, Y-, and Z-planes such that all zeros in C are contained therein; a component
not contained within a chosen plane is said to be uncovered.
Step 2: If jh½t j planes are required to cover all zeros, go to Step 5 (termination).
Step 3: Let U min be the value of the minimum uncovered component; subtract U min from each uncovered C component.
Step 4: Add U min to each C component covered by more than one plane; go to Step 1.
Step 5: Let Z be the set of components of C having the value zero (0); construct the traversal c ¼ fi1 ; i2 ; . . . ; ijh½t j g from the
index sets in Z.
4.2. Speed
Any point in the image plane can be mapped/transformed to the plane z ¼ 0 in real-world coordinates. However, the
accuracy of speed estimation is not only dependent on camera calibration but also the noise from the trajectory. It is
assumed that most points over the trajectory lie noisily along the dominant path. Thus, to minimize the noise, every point
over the trajectory is aligned with the dominant path utilizing the perpendicular vanishing point, V per .
After point alignment the real-world distance of any two points in the image is easily calculated. Assume that it takes Dt
seconds for one vehicle to travel in the image plane from start point (us ; v s ) to end point (ue ; v e ) and these two points are
mapped to (xs ; ys ; 0) and (xe ; ye ; 0) in real-world coordinates using parameters derived in Eqs. (2)–(5). The average speed, v
(in miles per hour), of the vehicle can be calculated using Eqs. (11)–(16).
Table 2
Parameters for test videos.
Video D1 (lanes) D2 (lanes) Lane width (m) Top Bottom Left Right Perpendicular angle (degrees)
9 May 2011 2 2 3.6 0.4 0.8 0.3 1.0 10
17 May 2011 2 2 3.6 0.4 0.8 0.15 1.0 10
20 May 2011 2 2 3.6 0.35 1 0 0.6 4
Vendor 2 2 3.6 0.4 0.75 0.15 1.0 5
210 Y. Wan et al. / Transportation Research Part C 44 (2014) 202–213
h us secðht Þ
xs ¼ ð11Þ
v s þ f tanðht Þ
h ðf v s tanðht ÞÞ
ys ¼ ð12Þ
v s þ f tanðht Þ
h ue secðht Þ
xe ¼ ð13Þ
v e þ f tanðht Þ
h ðf v e tanðht ÞÞ
ye ¼ ð14Þ
v e þ f tanðht Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Dd ¼ ðxs xe Þ2 þ ðys ye Þ2 ð15Þ
Dd 0:621 3:6
v¼ ð16Þ
Dt
To explain the constants in Eq. (16), keep in mind that the units of v are miles/hour. Dd is in meters. Dt is in seconds. There
are 0.621 miles/km. There are 3600 s/hour. In most parts of the world, speed is measured in km/hour units. In that case, the
constant 0.621 would be dropped from Eq. (16).
Fig. 6. Speed estimation comparison (a) for May 09, 2011; (b) for May 17, 2011; and (c) for May 20, 2011 with light rain.
Y. Wan et al. / Transportation Research Part C 44 (2014) 202–213 211
4.3. Volume
It is noted earlier that detection zones are automatically generated utilizing the results of trajectory clustering. Detection
zones are necessary for accurately counting the objects/vehicles. A detection zone has two states – occupied (O) and unoc-
cupied (U). Transition from state O to state U indicates that a moving object/vehicle has just passed the detection zone. The
number of O ! U transitions captured is equal to the number of objects/vehicles that have passed.
We store the trajectories of moving objects/vehicles, which have not yet exited the scene and delete them after they exit.
Normally, an object exits the scene quickly. Objects/vehicles that have remained at the same place in the scene for an
extended period are associated with a stopped-car tracker. The stopped-car tracker keeps a record of how long the vehicle
has stopped and of the location. It is also designed to tolerate missed detections of the stopped vehicle over time; in other
words, it has the ability to recognize the stopped vehicle when it reappears again at the same location. If a vehicle remains
stationary for more than T s seconds, an alarm is triggered and sent to the traffic management center for response. An oper-
ator can set any reasonable value for T s according to needs. (T s is set to be 10 s by default.)
5. Experimental results
In order to analyze the performance of the proposed RHSS, we utilized real-time streaming videos from a traffic camera at
a freeway site in Denton Texas provided by the Texas Department of Transportation (TxDOT). From this traffic camera four
time periods having different characteristics were chosen for evaluation together with a stopped vehicle. All videos are
described in Table 1. While we had access to an MVD (microwave vehicle detector), for greater accuracy we acquired vehicle
volumes manually. Speed, volume, and stopped vehicle statistics were collected. For comparison, we mounted a commercial
highway traffic monitoring system in parallel with ours. We compared systems based on traffic data collected for the direc-
tion with lanes closer to the camera since speed from the MVD is less reliable for the direction with lanes farther from the
camera.
Parameters settings for the initialization phase are very important and settings for these four videos are listed in Table 2.
Top, bottom, left and right are parameters for specifying boundaries of the ROI. Perpendicular angle is the angle of the front
edge of an exemplary car.
Fig. 7. Volume estimation (a) for May 9, 2011; (b) for May 17, 2011; (c) for May 20, 2011 with light rain; and (d) for May 20, 2011 with heavy rain.
212 Y. Wan et al. / Transportation Research Part C 44 (2014) 202–213
Table 3
Traffic data estimation accuracy.
5.3. Evaluation
RHSS was evaluated on the basis of speed, volume, and incident detection. Three videos from an interstate highway were
utilized for system evaluation in terms of speed and volume. For speed we used data from an MVD as ground truth. For vol-
umes we generated the ground truth by manual counting since an MVD could not be more accurate than human observation.
The rainy subsequence of the third time period is separated into a separate chart, Fig. 7(d). The accuracy of traffic data esti-
mation is illustrated in Figs. 6 and 7 and statistical results in Table 3.
While a stopped vehicle is the most common incident, it is relatively rare. Across the three time periods characterized in
Table 1 only the last contains a stopped vehicle. As shown in Fig. 8 the stopped car is detected successfully. For stopped vehi-
cle detection a parameter for stop duration toleration needs to be entered since one wishes to ignore vehicles stopped
momentarily. We entered a stop duration tolerance of 10 s. The stopped vehicle was successfully detected at the 10-s mark.
The stopped vehicle continued to be detected for approximately 34–35 s when it left the scene. No redundant incident detec-
tion warnings were generated.
6. Conclusion
We have described a real-time highway traffic surveillance system (RHSS), which operates autonomously to collect sta-
tistics (speed and volume) and generate alerts (stopped vehicle detection) when incidents occur. The background is updated
for foreground vehicle detection by image differencing. Then the detected vehicles are tracked using the described 3-D Hun-
garian algorithm together with a Kalman filter to project positions of occluded vehicles. In the initialization phase, the pro-
posed practical mechanism for camera calibration is applied until the road structure is acquired. In the operational phase
trajectories collected by the tracker are mapped/aligned onto the learned dominant paths utilizing the estimated perpendic-
ular vanishing point for better speed estimation accuracy. Intensive evaluation of RHSS under different scenarios such as
rain, low-contrast and high-contrast lightings proved to be efficient for practical settings.
Acknowledgement
This work was performed as part of the Texas Department of Transportation (TxDOT) research program under Grant No.
0-6432-1. The contents do not express official TxDOT policy.
References
Babari, R., Hautiére, N., Dumont, E., Paparoditis, N., Misener, J., 2012. Visibility monitoring using conventional roadside cameras – emerging applications.
Transport. Sci. Part C: Emer. Technol. 22, 17–28.
Beymer, D., McLauchlan, P., Coifman, B., Malik, J., 1997. A real-time computer vision system for measure traffic parameters. In: IEEE Conf. Comput. Vis.
Pattern Recog. Institute of Electrical and Electronic Engineers, San Juan, PR, pp. 496–501.
Y. Wan et al. / Transportation Research Part C 44 (2014) 202–213 213
Cathey, F.W., Dailey, D.J., 2005. A novel technique to dynamically measure vehicle speed using uncalibrated roadway cameras. In: Proc of IEEE Intelligent
Vehicles Symposium. Institute of Electrical and Electronic Engineers, pp. 777–782.
Coifman, B., Beymer, D., McLauchlan, P., Malik, J., 1998. A real-time computer vision system for vehicle tracking and traffic surveillance. Transport. Res. Part
C: Emer. Technol. 6 (4), 271–288.
Gupte, S., Masoud, O., Martin, R.F.K., Papanikolopoulos, N.P., 2002. Detection and classification of vehicles. IEEE Trans. Intell. Transport. Syst. 3 (1), 37–47.
Hsu, W.L., Liao, H.Y., Jeng, B.S., Fan, K.C., 2004. Real-time traffic parameter extraction using entropy. Proc. Inst. Electr. Eng. Vis. Image Signal Process. 151 (3),
194–202.
Kanhere, N.K., Birchfield, S.T., 2010. A taxonomy and analysis of camera calibration methods for traffic monitoring applications. IEEE Trans. Intell. Transport.
Syst. 11 (2), 441–452.
Kuhn, H.W., 1955. The Hungarian method for the assignment problem. Naval Res. Logis. Quart. 2, 83–97.
Mao, H., Ye, C., Song, M., Bu, J., Li, N., 2009. Viewpoint independent vehicle speed estimation from uncalibrated traffic surveillance cameras. In: Proceeding of
IEEE International Conference on Systems, Man and Cybernetics. Institute of Electrical and Electronic Engineers, San Antonio, TX, USA.
Pan, X., Guo, Y., Men, A., 2010. Traffic surveillance system for vehicle flow detection. International Conference on Computer Modeling and Simulation
(ICCMS), vol. 1. Conference Publishing Service, pp. 314–318.
Pavan, M., Pelillo, M., 2007. Dominant sets and pairwise clustering. IEEE Trans. Pattern Anal. Mach. Intell. 29 (1).
Punzo, V., Borzacchiello, M.T., Ciuffo, B., 2011. On the assessment of vehicle trajectory data accuracy and application to the Next Generation SIMulation
(NGSIM) program data. Transport. Sci. Part C: Emer. Technol. 19 (6), 1225–1242.
Schoepflin, T.N., Dailey, D.J., 2003. Dynamic camera calibration of roadside traffic management cameras for vehicle speed estimation. IEEE Trans. Intell.
Transport. Syst. 4 (2), 90–98.
Veenman, C.J., Reinders, M.J.T., Backer, E., 2007. Resolving motion correspondence for densely moving points. IEEE Trans. Pattern Anal. Mach. Intell. 23 (1),
54–72.
Xiao, M., Han, C.-Z., Zhang, L., 2007. Moving shadow detection and removal for traffic sequences. Int. J. Autom. Comput. 4 (1), 38–46.
Yilmaz, A., Javed, O., Shah, M., 2006. Object tracking: a survey. ACM Comput. Surv. 38 (4), 1–45.
Zhang, G., Avery, R.P., Wang, Y., 2007. Video-based vehicle detection and classification system for real-time traffic data collection using uncalibrated video
cameras. Transport. Res. Board: J. Transport. Res. Board 1993, 138–147.
Zou, Y., Shi, G., Shi, H., Wang, Y., 2009. Image sequences based traffic incident detection for signaled intersections using HMM. Intern. Conf. on Hybrid
Intelligent Systems (HIS), vol. 1. Institute of Electrical and Electronics Engineers, pp. 257–261.