0% found this document useful (0 votes)
75 views10 pages

03 Object-Detection-Based - Video - Compression - For - Wireless - Surveillance - Systems

This document discusses object detection in compressed surveillance videos. It summarizes that: 1) Object detection is the first step in automated video surveillance systems, but video compression can significantly impact object detection performance. 2) The researchers studied how lossy video compression affects object detection and found that temporal fluctuations in background areas between frames hinders computer vision algorithms, even if those fluctuations are not noticeable to humans. 3) The researchers propose a new video encoding scheme called TFRE that suppresses unnecessary temporal fluctuations in stable background areas while maintaining acceptable rate-distortion performance. This aims to improve object detection performance on compressed videos.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views10 pages

03 Object-Detection-Based - Video - Compression - For - Wireless - Surveillance - Systems

This document discusses object detection in compressed surveillance videos. It summarizes that: 1) Object detection is the first step in automated video surveillance systems, but video compression can significantly impact object detection performance. 2) The researchers studied how lossy video compression affects object detection and found that temporal fluctuations in background areas between frames hinders computer vision algorithms, even if those fluctuations are not noticeable to humans. 3) The researchers propose a new video encoding scheme called TFRE that suppresses unnecessary temporal fluctuations in stable background areas while maintaining acceptable rate-distortion performance. This aims to improve object detection performance on compressed videos.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Multimedia Capturing, Mining, and Streaming

understanding of the surveillance videos than

Object-
what is available from individual cameras.4
A typical automatic surveillance system
includes five stages: object detection, object
classification, object tracking, understanding

Detection-
and description of behaviors, and human
identification.4 Object detection is the first and
most essential step of the entire procedure
because detecting the object provides a focus

Based Video
of attention for later processes, such as track-
ing and behavior analysis. However, the inevi-
table degradation of video quality caused by
lossy compression at embedded cameras sig-

Compression
nificantly impacts object detection.5,6 To
address this, video encoders for surveillance
systems should be designed to improve object-
detection performance.

for Wireless
In our recent work, we studied the effects of
lossy compression on object detection.7 Unlike
human beings, who can easily extract and focus
on a moving object in a blurred video, the per-

Surveillance
formance of computer vision algorithms can be
significantly affected by temporal fluctuation
in background areas. All modern video com-
pression standards—such as H.264/Advanced

Systems
Video Coding (H.264/AVC) and the latest High
Efficiency Video Coding (HEVC, also known as
H.265)—use the block-based hybrid approach,
which includes intra- and interpicture predic-
tion and 2D transform coding.8 As Figure 1
Lingchao Kong and Rui Dai shows, this approach measures the encoding
University of Cincinnati distortion by comparing the encoded video
with the original video (A direction), but it does
not measure the temporal domain fluctuation
in the encoded video (B direction). This strategy
results in temporal fluctuation when colocated

W
ireless embedded camera regions of consecutive frames—such as ft1 to
To obtain better
sensors play crucial roles in ft—are not consistently encoded, especially
object-detection
various distributed surveil- when intraframes are periodically inserted at
performance on
lance applications, includ- low and medium bit rates.
compressed videos,
this standard- ing those for border patrol, traffic monitoring, As the “Related Work on Temporal
compliant video- and environmental monitoring. In many dis- Fluctuation” sidebar shows, existing approaches
encoding scheme tributed wireless surveillance systems,1 camera are designed to optimize human visual percep-
introduces new sensors report their video observations to a tion; in contrast, we aim to address the temporal
mode-decision central base station through wireless communi- fluctuation problem itself to improve object-
strategies to suppress detection performance. This approach is worth-
cation. Given the embedded cameras’ low
unnecessary temporal while because the human and computer vision
computing power and limited energy and band-
fluctuation in stable systems might have different responses to an
width, raw videos acquired by cameras are
background areas encoded video. Our new temporal-fluctuation-
usually preprocessed, encoded, and compressed
while maintaining reduced video-encoding (TFRE) scheme can sup-
before being delivered to the base station.2,3 A press unnecessary temporal fluctuations in
acceptable rate-
powerful central server or data center at the base stable background areas. TFRE is designed to
distortion
station can fully utilize its powerful computing comply with the standardized hybrid block-
performance.
capability to perform data fusion on videos based video-coding architecture. It uses sum-of-
from multiple cameras, producing a much better absolute frame difference (SFD) to measure the

76 
1070-986X/17/$33.00 c 2017 IEEE Published by the IEEE Computer Society
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 04:55:34 UTC from IEEE Xplore. Restrictions apply.
Related Work on Temporal Fluctuation
Many researchers have investigated the problem of temporal fluctuation with the objective of improving the percep-
tual quality of compressed videos. The temporal fluctuation humans perceive is defined as flicker, which usually refers
to frequent luminance or chrominance perceptual changes that do not appear in uncompressed raw videos.1
Researchers have proposed a temporal low-pass filtering scheme that smooths the luminance changes on a
block-by-block basis,1 as well as a two-pass coding scheme, including a first pass of simplified P-frame coding that
derives a no-flicker reference of the current frame, and a second pass of actual I-frame coding with small quantization
parameters for closely approaching the no-flicker reference.2 Other researchers propose a modified distortion meas-
ure to reduce flicker; this approach considers the distortions both in A and B directions (see Figure 1 in the main
text), applying the measure during the intraprediction mode rate-distortion optimized selection process.3 To reduce
the flicker artifact in High-Efficiency Video Coding, researchers have proposed a region-classification-based rate con-
trol for coding tree units in I-frames that improves the reconstructed quality of I-frames.4

References
1 A. Jimenez-Moreno et al., “Standard-Compliant Low-Pass Temporal Filter to Reduce the Perceived Flicker Artifact,” IEEE Trans.
Multimedia, vol. 16, no. 7, 2014, pp. 1863–1873.
2 H. Yang, J.M. Boyce, and A. Stein, “Effective Flicker Removal from Periodic Intra Frames and Accurate Flicker Measurement,”
Proc. Int’l Conf. Image Processing (ICIP), 2008, pp. 2868–2871.
3 S.S. Chun, J.-R. Kim, and S. Sull, “Intra Prediction Mode Selection for Flicker Reduction in H. 264/AVC,” IEEE Trans. Consumer
Electronics, vol. 52, no. 4, 2006, pp. 1303–1310.
4 P. Wang et al., “Region-Classification-Based Rate Control for Flicker Suppression of i-Frames in HEVC,” Proc. Int’l Conf. Image
Processing (ICIP), 2013, pp. 1986–1990.

degree of temporal fluctuation for stable back-


ground blocks7 and introduces new mode-deci-
sion strategies for both intra- and interframes to
reduce SFD. We have conducted extensive
experiments on numerous video datasets to Original sequence
evaluate TFRE. This article extends our recent ft–2 ft–1 ft
conference report9 with more comprehensive
A
experimental results and discussions. Encoded sequence

Preliminary Study
We constructed a distorted video database to
study the impact of lossy compression on B
object-detection performance.7 Eight raw video
sequences with different spatial and temporal Figure 1. Schematic diagram of temporal fluctuation. The traditional block-
details were selected, including three traffic based hybrid approach does not measure the temporal domain fluctuation in
videos—container, GRAM Road-Traffic Monitoring an encoded video.
(GR), and GRAM Road-Traffic Monitoring HD
(GRHD)—and three indoor videos—hall, hori-
zontal, and overlook. We also included two out- video was compressed using 19 different QPs
door videos—people and vehicle. Figure 2 shows ranging from 22 to 40, which resulted in a total
snapshots of these videos. The open source of 152 compressed videos. We chose three
H.264/AVC encoder x264 (www.videolan.org/ object-detection algorithms from different cat-
developers/x264.html) was used to compress egories10 to be executed on the compressed
April–June 2017

the raw videos. The one-pass constant quantiza- videos: the Gaussian mixture model (GMM)
tion parameter (QP) mode with medium speed algorithm, the algorithm that combines statisti-
was applied in the x264 encoder, and the length cal background estimation and per-pixel Baye-
of the group of pictures (GOP) was set to 20 sian segmentation (referred to as the GMG
with the IPPP (where I denotes intra frame, P algorithm), and the adaptive background learn-
denotes predictive frame) structure. Each raw ing (ABL) algorithm.

77
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 04:55:34 UTC from IEEE Xplore. Restrictions apply.
(a) Container (b) GR (c) GRHD (d) Hall

(e) Horizontal (f) Overlook (g) People (h) Vehicle

Figure 2. Eight video sequences with different spatial and temporal details were used for this study. There are three traffic videos: (a)
container, (b) GRAM Road-Traffic Monitoring (GR), and (c) GRAM Road-Traffic Monitoring HD (GRHD). There are also three
indoor videos: (d) hall and (e) horizontal, and (f) overlook. Finally, there are two outdoor videos: (g) people and (h) vehicle.

1 performance of a detection algorithm using


0.9 QP=22 F111 as follows:
QP=25
0.8 QP=28
QP=31 2  Recall  Precision
0.7 QP=34 F1 ¼ :
QP=37 Recall þ Precision
False positive

0.6 QP=40
0.5
The performance of object-detection algo-
0.4
rithms can be affected by the background’s
0.3
quality. However, the video-coding procedure
0.2
might introduce temporal fluctuations in the
0.1
background that can cause FP. To describe the
0 degree of temporal fluctuation in stable back-
0 100 200 300 400 500
Sum-of-absolute frame difference (SFD) ground areas, we introduce SFD in the macro-
block (MB) unit between the current frame and
Figure 3. False positive pixels (FP) and the sum-of- the previous frame as follows:
absolute frame difference (SFD). FP increases
i;j¼16
X
when SFD increases.
SFD ¼ jmt ði; jÞ  mt1 ði; jÞj;
i;j¼1

Object-detection results from uncompressed where mt(i, j) is the reconstructed pixel value at
raw videos are regarded as the ground truth, and location (i, j) in an MB of the current frame and
object-detection results on compressed videos mt1(i, j) is the reconstructed pixel value in the
are the algorithm results compared against the previous frame’s corresponding MB.
ground truth. We use the commonly known We collected SFD and FP samples from stable
Recall and Precision measures to quantify background areas for all the compressed videos
object-detection performance as follows: in the aforementioned dataset. Figure 3 shows
IEEE MultiMedia

Recall ¼ TP/(TP þ FN) and Precision ¼ TP/(TP þ the relationship between SFD and FP from our
FP), where TP, FN, and FP stand for the amount test data: FP grows when SFD increases. We con-
of true positive pixels, false negative pixels, and ducted analysis of variance (ANOVA) on the
false positive pixels, respectively. Recall and Pre- pairs of FP and SFD, where a small p-value
cision selectively evaluate the level of missing (p  0.01) means significant correlation.12 The
TP and mistaken TP; we measure the overall resulting p-values are close to 0 and much

78
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 04:55:34 UTC from IEEE Xplore. Restrictions apply.
smaller than 0.01, indicating that FP is closely Input frame
associated with SFD.

The TFRE Approach Intra Inter


Current
Our research goal is to suppress unnecessary macroblock (MB)
type ?
temporal fluctuation in stable background areas Probe P_SKIP
to obtain better object-detection performance.
We propose TFRE to encode the stable back- Intramodes Yes
P_SKIP ?
ground areas. TFRE is designed to comply with analysis
the hybrid block-based video-encoding architec- No
ture. Because many surveillance applications P_16X16 mode
implement the H.264/AVC standard, we use it as analysis
the basis of the current TFRE implementation.
Here, we consider the following case: a Yes
Direct
copy ?
coarse-grain classification of dynamic fore-
ground and stable background MBs is obtained No
Joint T-RD selection
using a simple frame-differencing-based method Other inter- and
intramodes
at the encoder. For stable background MBs in analysis
intraframes, we calculate SFD and jointly opti-
Rate-distortion optimization
mize it with the rate-distortion optimization
(RDO) cost during the RDO process to decide the Encode current MB
type and prediction modes. For the interframe
analysis process, we introduce new strategies in
No All MB Yes Proceed to next
the analysis of P SKIP and P 16  16 type modes, encoded ? frame
with the goal of reducing temporal fluctuation
for interblocks while maintaining acceptable
distortion. Figure 4’s flowchart shows the entire Figure 4. The temporal-fluctuation-reduced video-encoding scheme
process for encoding a stable background MB; (TFRE) flow chart. The TFRE scheme designs new mode-decision
the gray blocks highlight the proposed scheme. strategies for both intra- and interframes to reduce the temporal
fluctuation while maintaining acceptable rate-distortion performance.
Intraframe Coding/Mode Selection
The intraframe RDO process of H.264/AVC con-
from a set of available modes that satisfy the
sists of two steps: a type-mode decision from
SFD constraint SFDi  SFDth. SFDth is the Ntop-th
I 16  16, I 8  8, I 4  4, and I PCM, based on
SFD in the ascending-order sorted array of SFDi,
RDO cost; and then a prediction-mode decision
and Ntop is given by
from nine prediction options—vertical predic-
tion, horizontal prediction, and so on—based Ntop ¼ dN  Ptop e;
on RDO cost. We calculate RDO cost C by where N is the total number of available modes,
C ¼ D þ k  R; and Ptop is a custom parameter that stands for
the number of top percent of total available
where D denotes the distortion of a candidate
modes N that the joint T-RD selection will
encoding option, R denotes the total bits of this
consider.
option, and k is the Lagrange multiplier that
Algorithm 1 (Figure 5) is designed to solve
controls the tradeoff of rate and distortion.
this problem for both type-mode and predic-
We formulate a joint temporal-fluctuation
tion-mode selection. For a stable background
and rate distortion—or a joint T-RD—mode
MB, first, all available type modes are tried, the
selection problem as follows:
corresponding RDO costs and SFD values are
Given : fMi ; Ci ; SFDi g recorded (lines 2–5 in Algorithm 1), and then
Find : M the best type mode is determined based on the
April–June 2017

Minimize : C SFD threshold (lines 6–9). Second, all available


Subject to : SFDi  SFDth ;
prediction modes of the selected type mode are
where Mi denotes the ith available type mode or tried, the corresponding RDO costs and SFD val-
prediction mode, Ci is the corresponding RDO ues are recorded (lines 10–13), and then the
cost, and SFDi is the SFD value of mode i. The best prediction mode is determined based on
problem seeks to minimize the RDO cost C the SFD threshold (lines 14–17).

79
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 04:55:34 UTC from IEEE Xplore. Restrictions apply.
1: if current MB belongs to stable background then In this figure, each block represents one MB
2: for available type mode M ti do unit; the yellow, blue, and red colors denote
3: encode current MB and store Cti
4:
P SKIP mode, other intermodes, and intramo-
calculate and store SF Dti
5: end for des, respectively. Obviously, the P SKIP location
6: sort records in ascending order based on SFD value, and obtain distribution fluctuates in these consecutive
valid number of records (N t) interframes. For an MB in the stable back-
7: obtain SFDt th based on N ttop = ⎡Nt × Ptop⎤
ground area, when the intermode changes
8: find the minimum Ct, subject to SF Dti ≤ SF Dtth
9: output the corresponding M t* as the selected type mode
from P SKIP to other interprediction modes or
10: for available prediction mode M pi of the selected type M t* vice versa in consecutive frames, a temporal
do fluctuation will occur in the encoded frames,
11: encode current MB and store Cpi which might result in FP for object detection
12: calculate and store SFDpi
due to mistaking this fluctuation for the appear-
13: end for
14:
ance of a new object.
sort records in ascending order based on SFD value, and obtain
valid number of records (N p ) We propose reducing temporal fluctuation
15: obtain SF Dpth based on N ptop = ⎡Np × Ptop⎤ by designing new criteria in the analysis of
16: find the minimum Cp, subject to SF Dpi ≤ SF Dpth intertype modes. Specifically, we expect to clas-
17: output the corresponding M p* as the selected prediction mode
sify more MBs in stable background areas as
18: end if
P SKIP—or set the MVs of these MBs to zero—
Figure 5. Algorithm 1: Intraframe joint temporal-fluctuation and and, at the same time, we expect to maintain
rate distortion mode selection. Based on temporal-fluctuation and acceptable traditional distortion sum of
rate distortion, joint T-RD selection first determines the best type squared differences (SSD), which is the differen-
mode, and then determines the best prediction mode. ces between the intensities of an original MB
and the intensities of an encoded MB. Based on
Interframe Coding/Mode Selection the typical inter-MB analysis process, we
A typical interframe analysis process includes designed new schemes in the probe P SKIP
three steps: process and the analysis of the P 16  16 mode.
For MBs dissatisfied with the original crite-
1. Probe P SKIP mode—that is, encode the
rion in the probe P SKIP process,13 we compare
current MB, assuming no encoding resid-
the encoding option of P SKIP with the encod-
uals and no motion vector (MV) differ-
ing option of using predictive MV; if the P SKIP
ence, and use only the predictive MV. The
option brings less SFD while maintaining
decimate score is computed, indicating
acceptable SSD, the current MB will be set as
whether we could set the discrete cosine
P SKIP. Algorithm 2 (see Figure 7) shows the
transform (DCT) coefficients to 0 given
steps, where SSDr and SFDr are SSD and SFD of
the DCT coefficients after the actual
the reconstructed MB based on predictive MV;
encoding of this inter-MB.13 If the deci-
SSDs and SFDs are SSD and SFD of the current
mate score of the current MB is less than 6,
MB assuming P SKIP encoding; and d w and s w
then the current MB can be encoded as
are weight variables that can be customized by
P SKIP and return.
encoders.
2. Otherwise, other inter-prediction modes, Furthermore, to analyze the P 16  16
including P 16  16, P 8  16, P 16  8, mode, we design an interframe P 16  16 direct
P 8  8, P 4  8, P 8  4, and P 4  4, are copy mode—directly copying from the corre-
all tried, the corresponding MVs are esti- sponding MB in the previous frame due to neg-
mated, and search is also performed on ligible motion in the stable background area. If
the intramodes. the distortion brought by assuming no motion
is comparable to the distortion of reconstructed
3. Run the RDO process and determine the MB after motion estimation, the process will
best mode from all available modes. skip other intermode analyses and jump to
encode current MB process without the RDO proc-
IEEE MultiMedia

However, the typical interframe analysis ess, as Figure 4 shows. Algorithm 3 (Figure 8)
process can result in temporal fluctuation for describes the detailed steps of the interframe
stable background areas, which will reduce the P 16  16 direct copy mode, where SSDme is the
accuracy of object detection. For example, the MB distortion based on MVme after motion esti-
first row of Figure 6 shows three consecutive mation, SSDdc is the MB distortion based on the
interframes of the GR video clip (Figure 6a–c). assumption that there is no motion and that a

80
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 04:55:34 UTC from IEEE Xplore. Restrictions apply.
(a) Frame 8 (b) Frame 9 (c) Frame 10

(d) Frame 8 (e) Frame 9 (f) Frame 10

Figure 6. Fluctuation of P SKIP distribution in the GR video. The top row (a–c) shows the results of x264 implementation; the bottom
row (d–f) shows results from the TFRE scheme.

direct copy from the previous frame’s corre- 1: Input: decimate score of current MB
sponding MB is applied, and d w is a custom 2: if decimate score of current MB < 6 then
3: current MB is set as P_SKIP
weight parameter that restricts SSDdc inside a
4: return
threshold of d w  SSDme. 5: else if current MB belongs to stable background then
To demonstrate how effectively Algorithms 6: encode current MB based on predictive MV
2 and 3 reduce temporal fluctuation, Figure 7: calculate SSDr and SFDr based on the reconstructed MB
6d–f shows an example of the proposed inter- 8: calculate SSDs and SFDs assume current MB as P_SKIP
9: if SSDs ≤ d_w × SSDr and SFDs ≤ s_w × SFDr then
coding scheme in three consecutive interframes
10: current MB is set as P_SKIP
of the GR clip. Compared with the images in 11: return
Figure 6a–c, which show results from the stand- 12: end if
ard interanalysis process, applying the pro- 13: end if
posed scheme encodes more background MBs
Figure 7. Algorithm 2: Interframe probe P SKIP algorithm. It
as P SKIP modes and the P SKIP distribution
compares the encoding option of P SKIP with the encoding option of
remains stable for consecutive frames.
using predictive MV, and tries to bring less temporal fluctuation
while maintaining acceptable distortion.
Performance Evaluation
We compared the proposed TRFE scheme’s per-
formance to the H.264/AVC-based open source Figure 2 for this test; Table 1 summarizes the
encoder x264 and the reducing-flicker video- compression settings. The x264 encoder (ver-
coding approach (RFC).14 The objective of sion 0.142.x) is configured to encode videos
April–June 2017

RFC is to improve perceptual video quality by using one-pass constant QP mode with medium
reducing flicker effects, and it considers the dis- speed. We applied the aforementioned three
tortions not only between the encoded video object-detection algorithms (GMM, GMG, and
and the original video, but also in the encoded ABL) on these compressed videos.
video’s temporal domain during the intra-RDO We first evaluate object-detection perform-
process. We used the eight raw videos shown in ance in terms of TP and FP levels. Figure 9

81
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 04:55:34 UTC from IEEE Xplore. Restrictions apply.
1: Input: M Vme after motion estimation in P_16×16 inter analysis modest gains for GMG (0.94–2.54 percent) and
2: if current MB belongs to stable background then GMM (0.54–2.57 percent).
3: encode current MB based on M Vme We summarize the average Recall, Preci-
4: calculate SSDme based on the reconstructed MB
5: calculate SSDdc assume current MB as Direct Copy mode
sion, and F1 scores over the eight videos for
6: if SSDdc ≤ d_w × SSDme then the three object-detection algorithms in
7: current MB is set as P_16 ×16 Direct Copy mode Table 2. The numbers in the D1 and D2 col-
8: return umns denote the gains of TFRE over the x264
9: end if encoder and RFC, respectively. Three points
10: end if
can be made based on the results of 10 differ-
Figure 8. Algorithm 3: Interframe P 16  16 direct copy mode. It ent QP values:
directly copies from the corresponding MB in the previous frame if Š RFC’s performance is comparable with that
only negligible motion in the stable background area. of x264, regardless of which measure is used.

Š TFRE’s Recall values are comparable with


shows the average TP and FP (in number of pix- those of x264 and RFC.
els) for the eight videos. The general trend is
that TP numbers drop and FP numbers rise Š TFRE’s Precision results improved upon
quickly when QP grows or quality deteriorates. those of x264 and RFC; this is because the
All three algorithms result in similar TP num- number of FP pixels is suppressed, thereby
bers. TFRE results in smaller numbers of FP pix- improving the overall performance eval-
els than both x264 and RFC. Because the RFC uated by F1.
scheme reduces temporal fluctuation to some
extent, RFC’s FP numbers are slightly lower These results indicate that, by reducing tempo-
than x264. The advantage of TFRE is more evi- ral fluctuation in stable background areas, TFRE
dent for higher QP scenarios, which correspond could effectively improve the accuracy of object
to more blurred background areas in an detection for different types of detection
encoded video. Generally speaking, TFRE signif- algorithms.
icantly suppresses the FP level and has little Finally, we investigate rate-distortion per-
effect on the TP level. formance for the three schemes. We summarize
Next, we evaluate the overall performance of the video quality and corresponding bit rates
object detection in terms of F1. Figure 10 shows based on different video categories, including
each video’s F1 scores for the three object-detec- traffic, indoor, and outdoor videos. We applied
tion algorithms. Regardless of which object- the industrial standard peak signal-to-noise
detection algorithm is used, the curves of RFC ratio (PSNR) and the structural similarity (SSIM)
always nearly overlap with those of x264 for index for video quality assessment. Figure 11
every video. The three object-detection algo- shows the resulting PSNR and SSIM, with the
rithms have different ranges of F1, but regardless corresponding bit rates. RFC’s rate-distortion
of these differences, the detection performance performances are nearly identical with those of
degrades when QP increases. TFRE results in the x264 in every video. The TFRE scheme achieves
largest F1 scores for different QP values, and the obvious savings in bit rate when bit rate is
gain of TFRE is higher with larger QP values. higher than 150 kilobit per second (kbps), with
More specifically, there is a noticeable gain for a small quality drop in the traffic videos con-
ABL (an average of 0.94–4.79 percent), and tainer and GR and the indoor video hall. For

Table 1. Video compression parameters.

Group of Intra- Inter-


pictures custom distortion Inter SFD
Frame (GOP) parameter weight GOP QP weight
rate structure Rate control (Ptop) (d w) Resolution Duration size range (s w)
25 frames IPPP Constant 0.1 6 Common 12 sec. 20 28–46 0.1
per second quantization Intermediate
parameter (QP) Format
(352  288)

82
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 04:55:34 UTC from IEEE Xplore. Restrictions apply.
5,500 5,500 1,300
5,400 5,400 1,250
5,300
5,300 1,200
5,200
5,100 1,150
5,200
5,000 1,100
TP

TP

TP
4,900 5,100
1,050
4,800 5,000 1,000
4,700
4,600 4,900 950
4,500 4,800 900
28 30 32 34 36 38 40 42 44 46 28 30 32 34 36 38 40 42 44 46 28 30 32 34 36 38 40 42 44 46
QP QP QP

4,000 9,000 2,400


3,500 8,000 2,200
3,000 2,000
7,000
2,500 1,800
6,000 1,600
2,000
FP

FP

FP
5,000 1,400
1,500 1,200
1,000 4,000
1,000
500 3,000 800
0 2,000 600
28 30 32 34 36 38 40 42 44 46 28 30 32 34 36 38 40 42 44 46 28 30 32 34 36 38 40 42 44 46
(a) QP (b) QP (c) QP
RFC x264 TFRE

Figure 9. True positive (TP) pixels (top row) and false positive (FP) pixels (bottom row) for three object-detection algorithms: the
(a) adaptive background learning (ABL) algorithm, (b) GMG algorithm, and (c) Gaussian mixture model (GMM) algorithm.

0.9 0.95 0.95 0.9


0.8 0.9 0.9 0.8
0.7 0.85 0.85 0.7
0.8
0.6 0.8 0.6
0.75
0.5 0.75 0.5
F1

F1

F1

F1

0.7
0.4 0.7 0.4
0.65
0.3 0.6 0.65 0.3
0.2 0.55 0.6 0.2
0.1 0.5 0.55 0.1
28 30 32 34 36 38 40 42 44 46 28 30 32 34 36 38 40 42 44 46 28 30 32 34 36 38 40 42 44 46 28 30 32 34 36 38 40 42 44 46
(a) QP (b) QP (c) QP (d) QP
1 1 0.9 1
0.9 0.9 0.8
0.9
0.8 0.8 0.7
0.7 0.7 0.8
0.6
0.6 0.6
0.5 0.7
F1

F1

F1

F1

0.5 0.5
0.4
0.4 0.4 0.6
0.3 0.3 0.3
0.5
0.2 0.2 0.2
0.1 0.1 0.1 0.4
28 30 32 34 36 38 40 42 44 46 28 30 32 34 36 38 40 42 44 46 28 30 32 34 36 38 40 42 44 46 28 30 32 34 36 38 40 42 44 46
(e) QP (f) QP (g) QP (h) QP

ABL RFC ABL x264 ABL TFRE GMG RFC GMG x264
GMG TFRE GMM RFC GMM x264 GMM TFRE

Figure 10. F1 scores of the eight test videos: (a) container, (b) GR, (c) GRHD, (d) hall, (e) horizontal, (f) overlook, (g) people, and
(h) vehicle.

83
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 04:55:34 UTC from IEEE Xplore. Restrictions apply.
Table 2. Average object-detection results for various algorithms.

Recall (%) Precision (%) F1 score (%)


QP x264 RFC* TFRE† D1 D2 x264 RFC TFRE D1 D2 x264 RFC TFRE D1 D2
28 81.92 81.97 82.02 0.10 0.05 73.66 73.90 74.22 0.56 0.32 77.03 77.21 77.43 0.40 0.22
30 80.35 80.37 80.48 0.13 0.11 69.59 69.83 70.32 0.73 0.49 73.80 73.97 74.34 0.54 0.37
32 78.75 78.76 78.87 0.12 0.11 65.46 65.78 66.59 1.13 0.81 70.51 70.74 71.34 0.83 0.60
34 77.43 77.37 77.52 0.09 0.15 60.94 61.36 62.43 1.49 1.07 67.06 67.33 68.14 1.08 0.81
36 75.63 75.60 75.68 0.05 0.08 56.56 57.05 58.50 1.94 1.45 63.45 63.77 64.81 1.36 1.04
38 73.74 73.73 73.85 0.11 0.12 51.43 52.02 53.82 2.39 1.80 59.25 59.66 60.96 1.71 1.30
40 71.96 71.90 72.09 0.13 0.19 47.26 47.86 50.32 3.06 2.46 55.62 56.06 57.87 2.25 1.81
42 69.90 69.87 70.00 0.10 0.13 43.16 43.78 47.03 3.87 3.25 51.95 52.42 54.83 2.88 2.41
44 67.72 67.71 67.77 0.05 0.06 39.09 39.60 43.40 4.31 3.80 48.17 48.56 51.46 3.29 2.90
46 65.20 65.06 65.31 0.11 0.25 36.29 37.10 41.27 4.98 4.17 45.31 45.92 49.20 3.89 3.28

*
Reducing-flicker video coding

Temporal-fluctuation-reduced video coding

38 40 40
36 38 38
34 36 36
PSNR (dB)

PSNR (dB)

PSNR (dB)
32 34 34
30 32 32
28 30 30
26 28 28
24 26 26
50 100 150 200 250 300 50 100 150 200 250 300 350 20 40 60 80 100 120 140 160 180 200
(a) Bit rate (kbps) Bit rate (kbps) Bit rate (kbps)

1 1 1
0.95 0.95
0.95
0.9
0.9
0.85 0.9
SSIM

SSIM

SSIM

0.85
0.8 0.85
0.8
0.75
0.8 0.75
0.7
0.65 0.75 0.7
50 100 150 200 250 300 50 100 150 200 250 300 350 20 40 60 80 100 120 140 160 180 200
(b) Bit rate (kbps) Bit rate (kbps) Bit rate (kbps)

Container RFC Container x264 Horizontal RFC Horizontal x264


Container TFRE GR RFC Horizontal TFRE Overlook RFC People RFC Vehicle RFC
GR x264 GR TFRE Overlook x264 Overlook TFRE People x264 Vehicle x264
GRHD RFC GRHD x264 Hall RFC Hall x264 People TFRE Vehicle TFRE
GRHD TFRE Hall TFRE

Figure 11. Rate-distortion curves: (a) peak signal-to-noise ratio (PSNR) versus bitrate and (b) structural similarity (SSIM) versus
bitrate for all the videos.

other video sequences, TFRE performs similar whereas TFRE decreases the bit rate by 2.45
to x264. Compared with x264 encoding, TFRE’s kbps on average. The slight decrease in bit rate
PSNR and SSIM values decrease slightly by is because TFRE encodes more inter-MBs as
0.237 dB and 0.0046 on average, respectively, P SKIP. Generally speaking, TFRE’s rate-

84
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 04:55:34 UTC from IEEE Xplore. Restrictions apply.
distortion performance is comparable with that Circuits and Systems for Video Technology, vol. 22,
of the x264 encoder. no. 12, 2012, pp. 1649–1668.
9. L. Kong and R. Dai, “Temporal-Fluctuation-

O ur experimental results indicate that, com-


pared with traditional encoding schemes,
TFRE improves object-detection accuracy and
Reduced Video Encoding for Improving the Accu-
racy of Object Detection,” Proc. Int’l Symp. Multi-
media (ISM), 2016, pp. 126–132.
offers a lower bit rate with comparable video 10. A. Sobral and A. Vacavant, “A Comprehensive Review
quality. The standard-compliant low-complexity of Background Subtraction Algorithms Evaluated
TFRE scheme can promote the development with Synthetic and Real Videos,” Computer Vision and
and application of distributed wireless sur- Image Understanding, vol. 122, 2014, pp. 4–21.
veillance systems. In the future, we plan to 11. S.-C. Huang and B.-H. Chen, “Automatic Moving
extend the proposed encoding scheme to Object Extraction Through a Real-World Variable-
comply with the newly developed HEVC Bandwidth Network for Traffic Monitoring Sys-
standard. MM tems,” IEEE Trans. Industrial Electronics, vol. 61, no.
4, 2014, pp. 2099–2112.
Acknowledgments 12. R.V. Hogg and J. Ledolter, Engineering Statistics,
This special issue is a collaboration between MacMillan, 1987.
the 2016 IEEE International Symposium on 13. “Working Draft Number 2, Revision 0 (WD-2),” Joint
Multimedia (ISM 2016) and IEEE MultiMedia. Video Team, ISO/IECMPEG and ITU-TVCEG, 2002.
This article is an extended version of 14. S.S. Chun, J.-R. Kim, and S. Sull, “Intra Prediction
“Temporal-Fluctuation-Reduced Video Encod- Mode Selection for Flicker Reduction in H. 264/
ing for Improving the Accuracy of Object AVC,” IEEE Trans. Consumer Electronics, vol. 52,
Detection,” presented at ISM 2016. This work no. 4, 2006, pp. 1303–1310.
was supported in part by a US National Science
Foundation grant (CNS-1644946).
Lingchao Kong is a doctoral student in the Depart-
References ment of Electrical Engineering and Computing Sys-
tems at the University of Cincinnati. His research
1. B. Tavli et al., “A Survey of Visual Sensor Network
interests include video coding, video processing, and
Platforms,” Multimedia Tools and Applications, vol.
multimedia communications. Kong received an MS
60, no. 3, 2012, pp. 689–726.
in control engineering from Harbin Institute of
2. T. Ma et al., “A Survey of Energy Efficient Compres-
Technology. Contact him at [email protected].
sion and Communication Techniques for Multime-
dia in Resource Constrained Systems,” IEEE
Comm. Surveys & Tutorials, vol. 15, no. 3, 2013, Rui Dai is an assistant professor in the Department
pp. 963–972. of Electrical Engineering and Computing Systems at
3. Y. Ye et al., “Wireless Video Surveillance: A Survey,” University of Cincinnati. Her research interests
IEEE Access, vol. 1, 2013, pp. 646–660. include multimedia communications, multimedia
4. W. Hu et al., “A Survey on Visual Surveillance of sensor networks, and cyberphysical systems. Dai
Object Motion and Behaviors,” IEEE Trans. Systems, received a PhD in electrical and computer engineer-
Man, and Cybernetics, Part C: Applications and ing from Georgia Institute of Technology, Atlanta.
Reviews, vol. 34, no. 3, 2004, pp. 334–352. Contact her at [email protected].
5. T. Hase et al., “Influence of Image/Video Compres-
sion on Night Vision Based Pedestrian Detection in
an Automotive Application,” Proc. IEEE Vehicular
Technology Conf., 2011, pp. 1–5.
6. E. Kafetzakis et al., “The Impact of Video Transcod-
ing Parameters on Event Detection for Surveillance
Systems,” Proc. 2013 IEEE Int’l Symp. Multimedia
(ISM), 2013, pp. 333–338.
April–June 2017

7. L. Kong, R. Dai, and Y. Zhang, “A New Quality


Model for Object Detection Using Compressed
Videos,” Proc. Int’l Conf. Image Processing (ICIP),
2016, pp. 3797–3801. Read your subscriptions through
8. G.J. Sullivan et al., “Overview of the High Efficiency the myCS publications portal at
http://mycs.computer.org.
Video Coding (HEVC) Standard,” IEEE Trans.

85
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 04:55:34 UTC from IEEE Xplore. Restrictions apply.

You might also like