0% found this document useful (0 votes)
66 views9 pages

World of Fast Moving Object

This document introduces fast moving objects (FMOs) which appear as semi-transparent streaks in images due to their high speed of motion relative to the camera's exposure time. It proposes a method to detect, track, and reconstruct the appearance of FMOs. The method consists of three algorithms that localize FMOs efficiently. It can recover an object's appearance and axis of rotation despite blurring. The method is evaluated on a new annotated dataset containing FMOs, which are much faster moving than objects in existing datasets. Applications include temporal super-resolution and sports analytics.

Uploaded by

Pulkit Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views9 pages

World of Fast Moving Object

This document introduces fast moving objects (FMOs) which appear as semi-transparent streaks in images due to their high speed of motion relative to the camera's exposure time. It proposes a method to detect, track, and reconstruct the appearance of FMOs. The method consists of three algorithms that localize FMOs efficiently. It can recover an object's appearance and axis of rotation despite blurring. The method is evaluated on a new annotated dataset containing FMOs, which are much faster moving than objects in existing datasets. Applications include temporal super-resolution and sports analytics.

Uploaded by

Pulkit Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

The World of Fast Moving Objects

Denys Rozumnyi1,3 Jan Kotěra2 Filip Šroubek2 Lukáš Novotný1 Jiřı́ Matas1
1 2
CMP, Czech Technical University in Prague UTIA, Czech Academy of Sciences
3
SGN, Tampere University of Technology
arXiv:1611.07889v1 [cs.CV] 23 Nov 2016

Abstract

The notion of a Fast Moving Object (FMO), i.e. an ob-


ject that moves over a distance exceeding its size within the
exposure time, is introduced. FMOs may, and typically do,
rotate with high angular speed. FMOs are very common in
sports videos, but are not rare elsewhere. In a single frame,
such objects are often barely visible and appear as semi-
transparent streaks.
A method for the detection and tracking of FMOs is pro-
posed. The method consists of three distinct algorithms,
which form an efficient localization pipeline that operates
successfully in a broad range of conditions. We show that
it is possible to recover the appearance of the object and its
axis of rotation, despite its blurred appearance. The pro-
posed method is evaluated on a new annotated dataset. The
results show that existing trackers are inadequate for the
problem of FMO localization and a new approach is re-
quired. Two applications of localization, temporal super- Figure 1: Fast moving objects appear as semi-transparent
resolution and highlighting, are presented. streaks larger than their size. Examples (left-to-right, top-
to-bottom) from table tennis, archery, softball, tennis, hail-
storm and fireworks.
1. Introduction
Object tracking has received enormous attention by the of ground truth bounding boxes in two consecutive frames
computer vision community. Methods based on various is zero for the FMO set while it is close to one for ALOV,
principles have been proposed and several surveys have OTB and VOT [22, 27, 15]. The speed of the tracked object
been compiled [2, 3, 11]. Standard benchmarks, some projected to image coordinates, measured as the distance
comprising hundreds of videos, such as ALOV [22], VOT of object centers in two consecutive frames, is on average
[15, 16] and OTB [27] are available. Yet none of them in- ten times higher in the new dataset, see Fig. 2. Given the
clude objects that are moving so fast that they appear as difference in the properties of the sequences, it is not sur-
streaks much larger than their size. This is a surprising prising that state-of-the-art trackers designed for the classi-
omission considering the fact that such objects commonly cal problem do not perform well on the FMO dataset. The
appear in diverse real-world situations, in which sports play two “worlds” are so different that on almost all sequences
undoubtedly a prominent role; see examples in Fig. 11 . the classical state-of-the-art methods fail completely, their
To develop algorithms for detection and tracking of fast output bounding boxes achieving a 50% overlap with the
moving objects, we had to collect and annotate a new ground truth in zero frames, see Tab. 3.
dataset. The substantial difference of the FMO dataset and In the paper, we propose an efficient method for FMO
the standard ones was confirmed by ex-post analysis of localization and a method for estimation of their extrinsic –
inter-frame motion statistics. The most common overlap the trajectory and the axis and angular velocity of rotation,
1 Fast moving objects are often poorly visible and for improved under- and intrinsic properties – the size and color of the object.
standing, the reader is referred to videos in the supplementary material. In specific cases we can go further and estimate the full ap-

1
since the latent sharp map is a binary image. The same idea
applied to rotating objects was proposed in [21]. An inter-
esting variation was proposed in [7], where linear motion
blur is estimated locally using a relation similar to optical
flow. The main drawback of these methods is that an accu-
rate estimation of the transparency map using alpha matting
algorithms [18] is necessary.
Methods exploiting the fact that autocorrelation in-
object speed [pxl] intersection over union
creases in the direction of blur were proposed to deal with
Figure 2: The FMO dataset includes motions that are an or- objects moving over static backgrounds [5, 14]. Similarly
der of magnitude faster than three standard datasets - ALOV, [19, 23], autocorrelation was considered for motion detec-
VOT, OTB [22, 15, 27]. Normalized histograms of pro- tion of the whole scene due to camera motion. However, all
jected object speeds (left) and intersection over union IoU these methods require a relatively large neighborhood to es-
of bounding boxes (right) between adjacent frames. timate blur parameters, which means that they are not suit-
able for small moving objects. Simultaneously dealing with
rotation of objects has not been considered in the literature
pearance model of the object. Properties like the rotation so far.
axis, angular velocity and object appearance require precise
modeling of the image formation (acquisition) process. The 3. Problem definition
proposed method thus proceeds by solving a blind space- FMOs are objects that move over a large distance com-
variant deconvolution problem with occlusion. pared to their size during the exposure time of a single
Detection, tracking and appearance reconstruction of frame, and possibly also rotate along an arbitrary axis with
FMOs allows performing tasks with applications in diverse an unknown angular speed. For simplicity, we assume a
areas. We show, for instance, the ability to synthesize real- single object F moving over a static background B; an ex-
istic videos with higher frame rates, i.e. to perform tempo- tension to multiple objects is relatively straightforward. To
ral super-resolution. The extracted properties of the FMO, get close to the static background state, camera motion is
such as trajectory, rotation angle and velocity have applica- assumed to be compensated by video stabilization.
tion, e.g. in sports analytics. Let a recorded video sequence consist of frames
The rest of the paper is organized as follows: Related I1 (x), . . . In (x), where x ∈ R2 is a pixel coordinate. Frame
work is discussed in Section 2. Section 3 defines the main It is formed as
concepts arising in the problem. Section 4 explains in detail
the proposed method for FMO localization. The estimation It (x) = (1 − [Ht M ](x))B(x) + [Ht F ](x) , (1)
of intrinsic and extrinsic properties formulated as an opti- where M is the indicator function of F . In general, the
mization problem is presented in Section 5. In Section 6, operator Ht models the blur caused by object motion and
the FMO annotated dataset of 16 videos is introduced. Last, rotation, and performs the 3D→2D projection of the object
the method is evaluated and its different applications are representation F onto the image plane. This operator de-
demonstrated in Section 7. pends mainly on three parameters, {Pt , at , φt }, which are
the FMO trajectory (path), and the axis and angle of ro-
2. Related Work tation, respectively. The [Ht M ](x) function corresponds
Tracking is a key problem in video processing. A range to the object visibility map (alpha matte, relative duration
of methods has been proposed based on diverse principles, of object presence during exposure) and appears in (1) to
such as correlation [4, 9, 8], feature point tracking [24], merge the blurred object and the partially visible back-
mean-shift [6, 25], and tracking-by-detection [29, 12]. The ground.
literature sometimes refer to fast moving objects, but only The object trajectory Pt can be represented in the im-
the case with no significant blur is considered, e.g. [28, 17]. age plane as a path (set of pixels) along which the object
Object blur is a cue for object motion, since the blur moves during the frame exposure. In the case of no rotation
size and shape encode information about motion. However, or when F is homogeneous, i.e. the surface is uniform and
classical tracking methods suffer from blur, yet FMOs con- thus rotation is not perceivable, Ht simplifies to a convolu-
sist predominantly of blur. Most motion deblurring meth- tion in the image plane, i.e. [Ht F ](x) = |P1t | [Pt ∗ F ](x),
ods assume that the degradation can be modeled locally by where |Pt | is the path length – F can then be viewed as a
a linear motion. One category of methods works with oc- 2D image.
clusion and considers the object’s blurred transparency map Finding all the intrinsic and extrinsic properties of arbi-
[13]. Blind deconvolution of the transparency map is easier, trary FMOs means estimating both F and Ht , which is, at
this moment, an intractable task. To alleviate this problem, Detector Redetector Tracker
some prior knowledge of F is necessary. In our case, the It−1 , It , It+1 It , ε
IN It−1 , It , It+1
prior is in the form of object shape. Since in most sport µ, r, Pt−1 , ε µ, r, Pt−1
videos the FMOs are spheres (balls), we continue our the-
OUT µ, r, Pt Pt Pt
oretical analysis focusing on spherical objects, although, as
we further demonstrate, the proposed localization method high contrast, high contrast, linear traj.,
can also successfully handle objects of significantly differ- fast movement, fast movement, model
ent shapes. ASM no contact with model
We propose methods for two tasks: (i) efficient and reli- moving objects,
able FMO localization, i.e. detection and tracking, and (ii) no occlusion,
reconstruction of the FMO appearance, and the axis and an-
gle of the object rotation, which requires the precise output Table 1: Inputs, outputs and assumptions of each algorithm.
of (i). For tracking, we use a simplified version of (1) and Image frame at time t is denoted by It . Symbols µ and r
approximate the FMO by a homogeneous circle determined are used for FMO intrinsics – mean color and radius. FMO
by two parameters: color µ and radius r. The tracker out- trajectory in It is marked by Pt , camera exposure fraction
put (trajectory Pt and radius r) is then used to initialize the by ε.
precise estimation of appearance using the full model (1).

4. Localization of FMOs sure period and time difference between consecutive frames
(e.g. 25fps video with 1/50s exposure has ε = 0.5). This
The proposed FMO localization pipeline consists of can be done from any two subsequent FMO detections and
three algorithms that differ in their prerequisites and speed. we use average over multiple observations.
First, the pipeline runs the fastest algorithm and terminates We need three consecutive video frames to localize the
if a fast moving object is localized; otherwise, it proceeds FMO in the second of the three frames, which causes a con-
to run the remaining two more complex and general al- stant delay of one frame in real-time processing, but this
gorithms. This strategy produces an efficient localization does not present any obstacle for practical use.
method that operates successfully in a broader range of con-
ditions than either of the three algorithms alone. We call 4.1. Detector
them detector, re-detector, and tracker, and their basic prop- The detector is the only generic algorithm for FMO lo-
erties are outlined in Tab. 1. calization that requires no input, except for three consecu-
The first algorithm, detector, discovers previously un- tive image frames It−1 , It , It+1 . First we compute differ-
seen FMOs and establishes their properties. It requires suf- ential images ∆+ = |It − It−1 |, ∆0 = |It+1 − It−1 |, and
ficient contrast between the object and the background, an ∆− = |It − It+1 |. These are binarized (denoted by su-
unoccluded view in three consecutive frames, and no inter- perscript b) by thresholding, and the resulting images are
ference with other moving objects. Then the FMO can be combined by a boolean operation to a single binary image
tracked by either of the other two algorithms. The second
algorithm, re-detector, is applied in a region predicted by ∆ = ∆b+ ∧ ∆b− ∧ ¬∆b0 . (2)
the FMO trajectory in the previous frames. It handles the
problems of partial occlusions and object-background ap- This image contains all objects, which are present in the
pearance similarity while being as fast as the detector. Fi- frame It , but not in the frames It−1 and It+1 (i.e. moving
nally, the tracker searches for the object by synthesizing its objects in It ).
appearance with the background at the predicted locations. The second step is to identify all objects which can be
All three algorithms require a static background or a explained by the FMO motion model. We calculate the tra-
registration of consecutive frames. To this end, we apply jectory Pt and radius r for each FMO candidate and deter-
video stabilization by estimating the affine transformation mine if it satisfies the motion model. For each connected
between frames using RANSAC [10] by matching FREAK component C in ∆, we compute the distance transform to
descriptors [1] of FAST features [20]. get the minimal distance d(x) for each inner pixel x ∈ C
The detector also updates the FMO model properties re- to a pixel on its component’s contour. Then the maxi-
quired by the re-detector and tracker, namely FMO’s color µ mum of such distances for each component is its radius,
and radius r. For increased stability, the new value of any of r = max d(x), x ∈ C. Next, we determine the trajec-
these parameters is a weighted average of the detected value tory by morphologically thinning the pixels x that satisfy
and the previous value using a forgetting function proposed d(x) > ψr, where the threshold ψ is set to 0.7. Now we
in [26]. For each video sequence we also need to determine decide whether the object satisfies the FMO motion model
the so called exposure fraction ε, which is the ratio of expo- by verifying two conditions: (i) the trajectory Pt must be a
b∆b
∆0 0
(a) Detection
It
(b) Re-detection
It (c) Tracking

Figure 3: (a) FMO detection, (b) redetection where detection failed because FMO is not a single connected component,
(c) tracking where both algorithms failed due to imprecise ∆. Top row: cropped It with Pt−1 (blue) and Pt (green) with
contours. Bottom row: binary differential image ∆.

single connected stroke, and (ii) the area a covered by the


component C must correspond to the area â expected ac- It−1 It It+1
2
cording to the motion model, that is â = 2r
a
|Pt | + πr .
We say that the areas correspond, if â − 1 < γ, where

γ is a chosen threshold 0.2. All components which satisfy
these two conditions are then marked as FMOs. The whole
∆+ ∆0 ∆−
algorithm is pictorially described in Fig. 4.

4.2. Re-detector

∆∆
The re-detector requires the knowledge of the FMO, but
∆b+ ∆b0 ∆b−
allows one FMO occurrence to be composed of several con-
nected components in ∆ (e.g. the FMO passes in front of
background with similar color). Fig. 3 shows an example,
where the re-detector finds an FMO missed by the detector.
The re-detector operates on a rectangular window of ∆

∆0 ∆0
a binary differential image ∆ in (2), restricted to the lo-
cal neighborhood of the previous FMO localization. Let
Pt−1 be the trajectory from the previous frame It−1 , then
the re-detector works in the square neighborhood with side
4 1ε |Pt−1 | and centered on the position of the previous lo- Figure 4: Detection of FMOs. Three differential images
calization. Note that 1ε |Pt−1 |, where ε is the exposure frac- of three consecutive frames are binarized, segmented by
tion, is the full trajectory length between It−1 and It . For boolean operation, and connected components are checked
each connected component in this region, the trajectory Pt if they satisfy the FMO model. The two detected FMOs
and radius r are computed in the same way as in the detec- on this frame are the ball and the white stripe on player’s
tion algorithm. The mean color µ is obtained by averaging t-shirt. However, only the ball passed the check and was
all pixel values on the trajectory. In this region, connected marked as an FMO.
components with model parameters (µ, r) are selected if the
Euclidean distance in RGB kµ − µ0 k2 and the normalized
difference |r − r0 | /r0 are below prescribed thresholds 0.3 alpha value [Ht M ](x) from (1) is
and 0.4, respectively. Here, the previous FMO parameters
are denoted by (µ0 , r0 ). 1 1
Z
A(x|Pt ) = [Pt ∗ M ](x) = Pt (x − z)dz.
4.3. Tracker |Pt | |Pt | |z|≤r
(3)
The final attempt to find the FMO after both the detector For linear trajectories this integral can be solved analyti-
and re-detector have failed is the tracker, which uses image cally. Let D(x, Pt ) denote the distance function from x to
synthesis. The tracker is based on the simplified formation the trajectory Pt , then A is
model (1) by assuming an object F with color µ and radius r

b ∆b
moving along a linear trajectory Pt . The indicator function 2 p

∆0 0
M is then a ball of radius r, and given the trajectory Pt , the A(x|Pt ) ≈ max ((r2 − D2 (x, Pt )), 0). (4)
|Pt |
by the FMO detector. The objective is to estimate the ap-
pearance F , which is essentially a (modified) blind image
deblurring task. One has to first estimate the blur-and-
projection operator H, and then solve the non-blind deblur-
ring task for F . As mentioned in Sec. 3, to make the esti-
(1) (2) (3) mation of H tractable, we focus on ball-like objects moving
(approximately) parallel to the camera while undergoing ar-
Figure 5: Tracking steps. (1) detection of orientation, (2) bitrary 3D rotation. As in the FMO tracking, if the object
detection of starting point, (3) detection of ending point. rotation is negligible or unperceivable, the H operator is
Previous detection is in blue. Green cross denotes the min- fully determined by the object trajectory and we can pro-
imizer, red crosses the initial guess. All sampled points ceed directly to the non-blind estimation of F . Let us first
(gray) are scaled by their cost (6) (the darker the higher focus on the estimation of F and then on the problem of
cost). obtaining H.
Let F denote some representation of the object appear-
ance – in the absence of rotation, this can be directly the
This approximation is inaccurate only in the neighborhood image of the object projected in the video frame, and when
of the starting and ending point of the trajectory, and for 3D rotation is present we use the spherical parametrization
FMOs this area is small compared to the central section of to capture the whole surface. Following the model (1) we
the trajectory. Using the above relation, It in (1) can be solve the problem
written in a simpler form as
min k(1 − [HM ])B + [HF ] − Ik1 + αkDF k1 , (7)
Iˆt (x|Pt ) = (1 − A(x|Pt ))B(x) + µA(x|Pt ). (5) F

where D is the derivative operator (gradient magnitude) and


The tracker now looks for the trajectory Pt that best ex- α is the weighting parameter proportional to the level of
plains the frame It using the approximation Iˆt . This is noise in I. The L1 -norm while increasing robustness leads
equivalent to solving to nonlinear equations. We therefore apply the method of it-
Pt = arg min kIˆt (·|Pt ) − It k2 . (6) eratively re-weighted least squares to convert the optimiza-
Pt tion problem to a linear system and use conjugate gradients
to solve it. For object sizes in the FMO dataset (r < 100
As in the other two algorithms, instead of the background
pixels) this can be done in less than a second.
B we can use one of the previous frames It−1 or It−2 , since
In the case of object rotation, the blur operator H en-
a proper FMO should not occupy the same region in several
codes the object pose (orientation in space) as well as loca-
consecutive frames, and thus the previous frame can locally
tion in each fractional moment during the camera exposure.
serve as the background.
Trajectory aside, this is fully determined by the object’s an-
A linear trajectory Pt is given by its starting point st , ori-
gular velocity, which we assume constant throughout the
entation βt and length |Pt | (equivalently ending point et ).
exposure. Angular velocity (in 3D) is given by three pa-
We minimize (6) over these parameters by a coordinate de-
rameters (two for axis orientation, one for velocity). The
scent search.
functional in (7) is non-convex w.r.t the angular velocity
First, we find the best orientation. We extrapolate the
parameters. However, we can solve it with an exhaustive
starting point linearly from the previous detection and as-
search since the parametric space is not that large. We thus
1
 the length remains the same, st = et−1 +
sume that
construct H for each point in the discretized space of pos-
ε − 1 |Pt−1 |uβt and |Pt | = |Pt−1 |, where uβ =
sible angular velocities, estimate F , and then measure the
(cos(β), sin(β)) is a unit vector with orientation β. Next
error given by the functional in (7). The parametrization
we sample the space of βt ’s that differ from βt−1 by up to
which gives the lowest error is our solution.
15◦ and choose the one that minimizes the cost (6).
In Fig. 8 we illustrate the result of FMO deblurring in the
The minimization w.r.t. st and |Pt | is done in a similar
form of temporal super-resolution. The left side (a) shows
manner. For st , we sample points in the 12 |Pt−1 | neigh-
a frame captured by a conventional video camera (25fps),
borhood of the extrapolated st from the previous detection,
which contains a volleyball that is severely motion blurred.
and for |Pt | we again use the range |Pt−1 | ± 50%. The three
On the right side (b), the top row shows several frames cap-
minimization stages are illustrated in Fig. 5.
tured by a high-speed video camera (250fps) spanning ap-
5. Estimation of appearance proximately the same time frame – the volleyball flies from
left to right while rotating clockwise. In the bottom row of
Let us consider a video frame It acquired according to (b) we show the result of FMO deblurring, computed solely
(1) and the object trajectory Pt and size r as determined from the single frame in (a), at times corresponding to the
Figure 6: The FMO dataset – one example image per sequence. Red polygons delineate ground truth regions with fast moving
objects. For clearer visualization two frames do not show annotations because their area consists only of several pixels. The
sequences are sorted in natural reading order from left to right and top to bottom as in Tab. 2.

high-speed frames above. The restoration is on par with the


high-speed ground-truth; it significantly enhances the video
information content merely by post-processing. For com-
parison, we also display the calculated rotation axis and the
one estimated from the high-speed video. Both are close
to each other; compare the blue cross and red circle in (b).
Note that for a human observer it is impossible to determine
the ball rotation from blurred images while the proposed al-
gorithm with the temporal super-resolution output provides
this insight. Another appearance estimation example is in
Fig. 9, where we use the simplified model of pure transla-
tion motion for the table-tennis ball (top) and frisbee (bot-
tom). Figure 7: FMO detection and tracking. Each blue re-
gion refers to the object trajectory and contour in previous
6. Dataset frames.

The FMO dataset contains videos of various activities in-


volving fast moving objects, such as ping pong, tennis, fris- of interest does not strictly satisfy the notion of FMO).
bee, volleyball, badminton, squash, darts, arrows, softball, None of the public tracking datasets contain objects
as well as others. Acquisition of the videos differ: some are moving fast enough to be considered FMOs – with sig-
taken from a tripod with mostly static backgrounds, some nificant blur and large frame-to-frame displacement. We
have severe camera motions and dynamic backgrounds, analyzed three of the most widely used tracking datasets,
some FMOs are nearly homogeneous, while some have col- ALOV [22], VOT [15], and OTB [27] and compared them
ored texture. All the sequences are annotated with ground- with the proposed method in terms of the motion of the ob-
truth locations of the object (even in cases when the object ject of interest. For example, in the conventional datasets,
STRUCK[12]
n Sequence name # Pr. Rc. F-sc.

MEEM[29]
ASMS[25]

SRDCF[8]
Method

Proposed
DSST[9]
1 volleyball 50 100.0 45.5 62.5
2 volleyball passing 66 21.8 10.4 14.1
3 darts 75 100.0 26.5 41.7 Sq. name
4 darts window 50 25.0 50.0 33.3
5 softball 96 66.7 15.4 25.0 volleyball 80 0 50 0 10 46
6 archery 119 0.0 0.0 0.0 volleyball passing 12 6 95 88 8 10
7 tennis serve side 68 100.0 58.8 74.1 darts 3 0 6 0 0 27
8 tennis serve back 156 28.6 5.9 9.8 darts window 0 0 0 0 0 50
9 tennis court 128 0.0 0.0 0.0 softball 0 0 0 0 0 15
10 hockey 350 100.0 16.1 27.7 archery 5 5 5 5 0 0
11 squash 250 0.0 0.0 0.0 tennis serve side 7 0 0 0 6 59
12 frisbee 100 100.0 100.0 100.0 tennis serve back 5 0 0 0 3 6
13 blue ball 53 100.0 52.4 68.8 tennis court 0 0 3 3 0 0
14 ping pong tampere 120 100.0 88.7 94.0 hockey 0 0 0 0 0 16
15 ping pong side 445 12.1 7.3 9.1 squash 0 0 0 0 0 0
16 ping pong top 350 92.6 87.8 90.1 frisbee 65 0 6 6 0 100
Average – 59.2 35.5 40.6 blue ball 30 0 0 0 25 52
ping pong tampere 0 0 0 0 0 89
Table 2: Performance of the proposed method on the FMO ping pong side 1 0 0 0 0 7
dataset. We report precision, recall and F-score. The num- ping pong top 0 0 0 0 1 88
ber of frames is indicated by #. Average 17 1 1 1 3 36

Table 3: Performance of baseline methods on the FMO


the object frame-to-frame displacement is below 10 pixels dataset. Percentage of frames with FMOs present where
in 91% of cases, while in the FMO dataset the displace- tracking was successful (IoU > 0.5).
ment is uniformly spread between 0 and 150 pixels. Sim-
ilarly, the intersection over union (IoU) of bounding boxes
between adjacent frames is above 0.5 in 94% of times for of 0% (complete failure) for the archery, tennis court, and
the conventional datasets, whereas the proposed dataset has squash sequences, to 100% (complete success) for the fris-
zero intersection nearly every time. Fig. 2 summarizes these bee sequences. The sequences with the best results contain
findings. objects with prominent FMO characteristics, i.e. a large
An overview of the FMO dataset is in Fig. 6, showing motion against a contrasting background. False negatives
some of the included activities and the ground-truth anno- occur in three types of situations: (i) the object motion is
tations. The dataset and annotations will be made publicly too small (archery, volleyball), (ii) the object itself is too
available. small (tennis court, squash), and (iii) the background is too
similar to the object color (e.g., table tennis net, white edge
of the table). Problem (i) can be addressed by combining
7. Evaluation
the FMO detector with a state-of-the-art “slow” short-term
The proposed localization pipeline was evaluated on tracker. False positives usually occur when local move-
the FMO dataset. The performance criteria are preci- ments of larger objects, such as players’ body parts, can be
sion TP/(TP + FP), recall TP/(TP + FN) and F-score partially explained by the FMO model, or due to imprecise
2TP/(2TP + FN + FP), where TP, FP, FN is the number camera stabilization. Note that none of the test sequences
of true positives, false positive and false negatives, respec- contain multiple FMOs in a single frame, but the algorithm
tively. A true positive detection has an intersection over is not constrained to detect a fixed number of objects. The
union (IoU) with the ground truth polygon greater than 0.5 detection results are included in the supplementary material.
and an IoU larger than other detections. The second condi- Some examples are shown in Fig. 7.
tion ensures that multiple detections of the same object gen- Next, we compare the results of the FMO localization
erates only one TP. False negatives are FMOs in the ground pipeline to those of several standard state-of-the-art track-
truth with no associated FP detection. ers, namely ASMS [25], DSST [9], SRDCF [8], MEEM
Quantitative results for individual video sequences are [29], and STRUCK [12]. For a fair comparison, only frames
listed in Tab. 2. All results were achieved for the same set containing exactly one FMO were included. Since these
of parameters in the localization pipeline as discussed in trackers always output exactly one detection per frame and
Sec. 4. Performance varies widely, ranging from a F-score the proposed method can return any number of detections,
a) b)

Figure 8: Reconstruction of an FMO blurred by motion and rotation. a) Input video frame. b) Top row: actual frames from
a high-speed camera (250fps). Bottom row: frames at corresponding times reconstructed from a single frame of a regular
camera (25fps), i.e. 10x temporal super-resolution. The top left image shows the rotation axis position estimated from the
blurred frame (blue cross) and from the highspeed video (red circle).

including none, the proposed method would have an advan-


tage on the full set of frames. The results are presented in
Tab. 3 in terms of the percentage of frames with a successful
detection. Some of the standard trackers performed reason-
ably well on the volleyball sequences, where the motions
are relatively slow, but overall results are very poor. The
proposed method performs significantly better. This is ex-
plainable because the compared methods were not designed
for scenarios involving FMOs, but it highlights the need for
a specialized FMO tracker.
Besides FMO localization, the proposed model and es-
timator enable several applications which may be useful in
processing videos containing FMOs. In Sec. 5 on appear- Figure 9: Temporal super-resolution using plain interpola-
ance estimation, we suggested the task of temporal super- tion (left) and the appearance estimation model (right). The
resolution, which increases the video frame-rate by filling top right image shows the possibility of FMO highlighting.
out the gap between existing frames and artificially de-
creases the exposure period of existing frames. The naive
approach is the interpolation of adjacent frames, which is
inadequate for videos containing FMOs. A more precise ap-
proach requires moving objects to be localized, deblurred, 8. Conclusions
and their motions modeled, which the proposed method ac-
complishes (see Sec. 5), so that new frames can be synthe- Fast moving objects are a common phenomenon in real-
sized at the desired frame-rate. Figs. 8 and 9 show example life videos, especially sports. We proposed a generic, i.e.
results of the temporal super-resolution. not requiring prior knowledge of appearance, algorithm for
Another popular use case is highlighting FMO in sport their fast localization and tracking and a blind deblurring
videos. Due to the extreme blur, FMOs are often hard algorithm for estimation of their appearance. We created
to localize, even for humans, despite having the context a new dataset consisting of 16 sports videos with ground-
provided by perfect semantic scene understanding. Sim- truth annotations. Tracking FMOs is considerably different
ple highlighting, like recoloring or scaling, enhances the from standard object tracking targeted by state-of-the-art
viewer’s experience. Fig. 9 top-right demonstrates temporal algorithms and thus requires a specialized approach. The
super-resolution with highlighting. proposed method is the first attempt in this direction and
outperforms baseline methods by a wide margin. The es-
timated FMO appearance could support applications useful
in sports analytics, such as realistic increase of video frame-
Acknowledgments. The authors were supported by the Technology
Agency of the Czech Republic project TE01020415 V3C, the MSMT
rate (temporal super-resolution), artificial object highlight-
LL1303 ERC-CZ project and the Grant Agency of the Czech Republic ing, visualization of rotational axis and measurement of
under project GA13-29225S. speed and angular velocity.
References [16] M. Kristan, J. Matas, A. Leonardis, T. Vojir, R. Pflugfelder,
G. Fernandez, G. Nebehay, F. Porikli, and L. Čehovin. A
[1] A. Alahi, R. Ortiz, and P. Vandergheynst. Freak: Fast novel performance evaluation methodology for single-target
retina keypoint. In Computer Vision and Pattern Recognition trackers, Jan 2016. 1
(CVPR), 2012 IEEE Conference on, pages 510–517, June
[17] A. V. Kruglov and V. N. Kruglov. Tracking of fast moving
2012. 3
objects in real time. Pattern Recognition and Image Analysis,
[2] S. Avidan. Ensemble tracking. IEEE Trans. Pattern Anal. 26(3):582–586, 2016. 2
Mach. Intell., 29(2):261–271, Feb. 2007. 1 [18] A. Levin, D. Lischinski, and Y. Weiss. A closed-form so-
[3] B. Babenko, M. H. Yang, and S. Belongie. Robust ob- lution to natural image matting. IEEE Transactions on Pat-
ject tracking with online multiple instance learning. IEEE tern Analysis and Machine Intelligence, 30(2):228–242, Feb.
Transactions on Pattern Analysis and Machine Intelligence, 2008. 2
33(8):1619–1632, Aug 2011. 1 [19] J. Oliveira, M. Figueiredo, and J. Bioucas-Dias. Parametric
[4] T. A. Biresaw, A. Cavallaro, and C. S. Regazzoni. blur estimation for blind restoration of natural images: Lin-
Correlation-based self-correcting tracking. Neurocomput., ear motion and out-of-focus. IEEE Transactions on Image
152(C):345–358, Mar. 2015. 2 Processing, 23(1):466–477, 2014. 2
[5] A. Chakrabarti, T. Zickler, and W. T. Freeman. Analyz- [20] E. Rosten and T. Drummond. Machine Learning for High-
ing spatially-varying blur. In Proc. IEEE Conf. Computer Speed Corner Detection, pages 430–443. Springer Berlin
Vision and Pattern Recognition (CVPR), pages 2512–2519, Heidelberg, Berlin, Heidelberg, 2006. 3
San Francisco, CA, USA, June 2010. 2 [21] Q. Shan, W. Xiong, and J. Jia. Rotational motion deblurring
[6] D. Comaniciu, V. Ramesh, and P. Meer. Kernel-based ob- of a rigid object from a single image. In Proc. IEEE 11th
ject tracking. IEEE Trans. Pattern Anal. Mach. Intell., International Conference on Computer Vision ICCV 2007,
25(5):564–575, May 2003. 2 pages 1–8, Oct. 2007. 2
[7] S. Dai and Y. Wu. Motion from blur. In Computer Vision and [22] A. W. M. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara,
Pattern Recognition, 2008. CVPR 2008. IEEE Conference A. Dehghan, and M. Shah. Visual tracking: An experimental
on, pages 1 –8, June 2008. 2 survey. IEEE Transactions on Pattern Analysis and Machine
[8] M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg. Intelligence, 36(7):1442–1468, July 2014. 1, 2, 6
Learning spatially regularized correlation filters for visual [23] J. Sun, W. Cao, Z. Xu, and J. Ponce. Learning a convolu-
tracking. In Proceedings of the IEEE International Confer- tional neural network for non-uniform motion blur removal.
ence on Computer Vision, pages 4310–4318, 2015. 2, 7 In Computer Vision and Pattern Recognition (CVPR), 2015
[9] M. Danelljan, G. Hauml;ger, F. Shahbaz Khan, and M. Fels- IEEE Conference on, pages 769–777, 2015. 2
berg. Accurate scale estimation for robust visual tracking. [24] C. Tomasi and T. Kanade. Detection and tracking of point
In Proceedings of the British Machine Vision Conference. features. School of Computer Science, Carnegie Mellon
BMVA Press, 2014. 2, 7 Univ. Pittsburgh, 1991. 2
[10] M. A. Fischler and R. C. Bolles. Random sample consen- [25] T. Vojir, J. Noskova, and J. Matas. Robust Scale-Adaptive
sus: A paradigm for model fitting with applications to im- Mean-Shift for Tracking, pages 652–663. Springer Berlin
age analysis and automated cartography. Commun. ACM, Heidelberg, Berlin, Heidelberg, 2013. 2, 7
24(6):381–395, June 1981. 3 [26] K. G. White. Forgetting functions. Animal Learning & Be-
[11] M. Godec, P. M. Roth, and H. Bischof. Hough-based track- havior, 29(3):193–207, 2001. 3
ing of non-rigid objects. Comput. Vis. Image Underst., [27] Y. Wu, J. Lim, and M.-H. Yang. Online object tracking: A
117(10):1245–1256, Oct. 2013. 1 benchmark. In IEEE Conference on Computer Vision and
[12] S. Hare, S. Golodetz, A. Saffari, V. Vineet, M. M. Cheng, Pattern Recognition (CVPR), 2013. 1, 2, 6
S. L. Hicks, and P. H. S. Torr. Struck: Structured output [28] M. A. Zaveri, S. N. Merchant, and U. B. Desai. Small and
tracking with kernels. IEEE Transactions on Pattern Analy- fast moving object detection and tracking in sports video se-
sis and Machine Intelligence, 38(10):2096–2109, Oct 2016. quences. In Multimedia and Expo, 2004. ICME ’04. 2004
2, 7 IEEE International Conference on, volume 3, pages 1539–
[13] J. Jia. Single image motion deblurring using transparency. 1542 Vol.3, June 2004. 2
In Computer Vision and Pattern Recognition (CVPR), 2007 [29] J. Zhang, S. Ma, and S. Sclaroff. MEEM: Robust Track-
IEEE Conference on, pages 1–8, 2007. 2 ing via Multiple Experts Using Entropy Minimization, pages
[14] T. H. Kim and K. M. Lee. Segmentation-free dynamic scene 188–203. Springer International Publishing, Cham, 2014. 2,
deblurring. In Computer Vision and Pattern Recognition 7
(CVPR), 2014 IEEE Conference on, pages 2766–2773, 2014.
2
[15] M. Kristan, J. Matas, A. Leonardis, M. Felsberg, L. Ce-
hovin, G. Fernandez, T. Vojir, G. Hager, G. Nebehay, and
R. Pflugfelder. The visual object tracking vot2015 challenge
results. In The IEEE International Conference on Computer
Vision (ICCV) Workshops, December 2015. 1, 2, 6

You might also like