Knowledge-Based Systems
Knowledge-Based Systems
Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys
article info a b s t r a c t
Article history: The capability of distinguishing between small objects when manipulated with hand is essential in
Received 28 October 2019 many fields, especially in video surveillance. To date, the recognition of such objects in images using
Received in revised form 28 January 2020 Convolutional Neural Networks (CNNs) remains a challenge. In this paper, we propose improving ro-
Accepted 30 January 2020
bustness, accuracy and reliability of the detection of small objects handled similarly using binarization
Available online xxxx
techniques. We propose improving their detection in videos using a two level methodology based
Keywords: on deep learning, called Object Detection with Binary Classifiers. The first level selects the candidate
Detection regions from the input frame and the second level applies a binarization technique based on a CNN-
Convolutional neuronal networks classifier with One-Versus-All or One-Versus-One. In particular, we focus on the video surveillance
One-Versus-All problem of detecting weapons and objects that can be confused with a handgun or a knife when
One-Versus-One
manipulated with hand. We create a database considering six objects: pistol, knife, smartphone, bill,
purse and card. The experimental study shows that the proposed methodology reduces the number
of false positives with respect to the baseline multi-class detection model.
© 2020 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.knosys.2020.105590
0950-7051/© 2020 Elsevier B.V. All rights reserved.
Please cite this article as: F. Pérez-Hernández, S. Tabik, A. Lamas et al., Object Detection Binary Classifiers methodology based on deep learning to identify small objects
handled similarly: Application in video surveillance, Knowledge-Based Systems (2020) 105590, https://doi.org/10.1016/j.knosys.2020.105590.
2 F. Pérez-Hernández, S. Tabik, A. Lamas et al. / Knowledge-Based Systems xxx (xxxx) xxx
convert a multi-class problem into several expert binary models 2. Binarization techniques and object detection
and calculate the final class using an aggregation method. These
techniques are often used to reduce the instability in imbal- This section is organised into two parts. Section 2.1 provides
anced problems [18,19] and they present a good potential for the a summary of related works that use binarization strategies, the
problem of similar objects detection. state-of-the-art in object detection in images and the studies that
This work proposes an accurate and robust methodology, Ob- address weapon detection in videos. Then, it presents a brief
summary of OVA and OVO binarization methods in Sections 2.2
ject Detection with Binary Classifiers based on deep learning
and 2.3 respectively.
(ODeBiC methodology), for the detection of small objects manip-
ulated similarly with hand applied to surveillance videos.
2.1. Related works on binarization for objects detection in images
The first model for weapon detection in videos was proposed
by Olmos et al. [20]. The authors formulated the problem into Related works can be divided into three categories: previous
a two-class (pistol and background) problem, built a training works that use OVA and OVO binarization strategies in classifi-
database using images from Internet and used Faster-RCNN based cation, detection or segmentation, the state-of-the-art of object
on VGG16 [20] as detection model. In general, this model reaches detection models in images and previous works that address
good results, but confuses the pistol with objects that can be weapon detection in videos.
handled similarly, for example, knife, smartphone, bill, purse and Most prior works that analysed OVA and OVO in visual tasks,
card. Fig. 1 shows some of these false positives. This results show object recognition, image classification and image segmentation,
that the way in which pistols are handled is considered by the only use classical models such as Support Vector Machine (SVM),
model as key feature of the pistol class, which is a problem from Linear Discriminant Analysis (LDA) and k-Nearest Neighbours
the video surveillance point of view. We address this case study (kNN):
with the ODeBiC methodology, with the aim of improving the
• In image classification, the authors in [21] analysed OVA and
detection among small objects handled similarly. OVO approach to reduce the features space on three well
The main contributions of this paper are: known benchmarks, MNIST, Amsterdam Library of Object
Images (ALOI) and Australian Sign Language (Auslan).
• We propose and evaluate a two level methodology called • For pose estimation using image segmentation, the authors
ODeBiC, based on the use of deep learning, to improve the in [22] compared an individual CNN-based classifier with
detection of small objects that can be handled similarly. The OVA and OVO based on SVM and showed that CNNs achieves
first level uses a detector to select from each input frame the slightly better performance than OVA and OVO based on
candidate regions with a specific confidence about the pres- SVM.
ence of each object. Then, the second level analyses these • Similarly, in the task of scene classification in remote sens-
proposals using a binarization technique to identify the ob- ing images, the authors in [23] also compared OVA and OVO
jects with higher accuracy. ODeBiC methodology maintains based on SVM and 1-NN (Nearest Neighbour) and concluded
a good accuracy for the detection of large objects as well. that OVA provided worse results due to the unbalance be-
• We analyse the potential of binarization techniques such as, tween classes. The best results were obtained by OVO based
OVA and OVO, to improve the detection of small objects, ma- on SVM.
nipulated with hand, that can be confused with a weapon. • In face recognition, the authors in [24] used a CNN-based
As far as we know, this is the first study in analysing such model for features extraction and an SVM, OVA and OVO
for classification. The best results were obtained by CNN
potential.
combined with SVM.
• We build a new dataset called Sohas_weapon (small objects
• The authors in [25] compared the Half-Against-Half (HAH)
handled similarly to a weapon, dataset) for the case study of
technique with OVA and OVO in image classification and
six small objects that are often handled in a similar way to a found that HAH provides similar or worse results on the
weapon: pistol, knife, smartphone, bill, purse and card. We evaluated benchmarks.
used different camera and surveillance camera technologies
to take the images. 10% of the images were downloaded On the other hand, we must highlight that the state-of-the-art
from Internet. All these images were manually annotated for detection models are end-to-end CNNs that combine a detection
the detection task. This useful dataset will be available for meta-architecture with a classification model. The most influ-
other studies.2 ential meta-architectures are Faster-RCNN [8], R-FCN [9] and
SDD [26]. According to [27], Faster-RCNN based on Inception
Our experimental study on the database Sohas_weapon apply- ResNet V2 obtains the highest accuracy on large objects while
ing the ODeBiC methodology overcomes the baseline detection Faster-RCNN ResNet 101 provides the highest accuracy on small
model by up to 19,57% in precision and reduces the number of objects. SSD is the fastest detection approach but offers lower
accuracies. The model that provide the best trade-off accuracy
false positives by up to 56,50%.
and execution time is Faster-RCNN ResNet 101.
This paper is organised as follows. Section 2 includes related
In video surveillance, the first pistol detection model was pro-
works and preliminaries of the binarization techniques and ob-
posed in [20], it provides good results but produces an important
ject detection. Section 3 provides a description of the database number of false positives in the background class due to the fact
construction and the test surveillance videos used to analyse that the model confuses the pistol with objects that are handled
the methodology and the proposed ODeBiC methodology. Sec- similarly to a pistol. The authors in [28] propose a fusion tech-
tion 4 gives the experimental analysis and comparison of ODe- nique with the support of two symmetric cameras to calculate
BiC methodology with different classification approaches. Finally, the disparity map then subtract the background and consequently
conclusions and future works are given in Section 5. decrease the number of false negatives in the background. In
the same direction, the authors in [29] reduce the number of
false negatives produced by the extreme light conditions using
2 http://sci2s.ugr.es/weapons-detection. a brightness guided pre-processing method.
Please cite this article as: F. Pérez-Hernández, S. Tabik, A. Lamas et al., Object Detection Binary Classifiers methodology based on deep learning to identify small objects
handled similarly: Application in video surveillance, Knowledge-Based Systems (2020) 105590, https://doi.org/10.1016/j.knosys.2020.105590.
F. Pérez-Hernández, S. Tabik, A. Lamas et al. / Knowledge-Based Systems xxx (xxxx) xxx 3
Fig. 1. False positives committed by the proposed model in [20], where the objects are (a) bill, (b) purse, (c) smartphone and (d) card.
Our current work is different from all the previously cited • VOTE random: select one randomly.
works in that it aims at developing a methodology that reduces • VOTE by weight: sum the predictions and select the maxi-
the number of false positives and improves the overall perfor- mum class value as the final class.
mance in the detection of small objects handled similarly. As
case study, we address the problem of identifying small objects Formally, the decision rule can be written as:
handled similarly to a weapon in surveillance videos. As far as ∑
Class = arg max sij ,
we know, this work is the first in applying OVA and OVO to deep i=1,...,m (2)
i≤j̸ =i≤m
learning models for object detection in images and videos.
where sij is 1 if rij > rji and 0 otherwise.
2.2. One-Versus-All (OVA)
Weighted voting strategy (WV)
OVA strategy [13,14] reformulates the multi-class classifica- The aim of this technique [31] is to obtain the class with the
tion problem into a set of binary classifiers where each classifier largest probability. Hence, each class sums it predictions and the
learns how to distinguish each individual class versus all the rest class with the maximum value is the final result. The decision rule
of classes together. This approach produces as many classifiers as is:
the number of classes in the original problem. The final prediction ∑
is calculated by combining the predictions of individual classifiers Class = arg max rij
i=1,...,m (3)
using an aggregation method called Maximum confidence strat- i≤j̸ =i≤m
egy (MAX). The class with the largest vote is considered as the
predicted class. Formally, the MAX decision rule can be expressed Learning valued preference for classification (LVPC)
as, Learning valued preference for classification (LVPC) technique
calculates some new values from the initial probabilities obtained
Class = arg max ri , (1) by the binary classifiers. LVPC is a weighted voting, it penalises
i=1,...,m
the classifiers that have not got a threshold confidence in their
where ri ϵ [0, 1] is the confidence for class i and m is the number decision. More details on this rule are provided in [32,33]. This
of classes. decision rule can be expressed as:
Please cite this article as: F. Pérez-Hernández, S. Tabik, A. Lamas et al., Object Detection Binary Classifiers methodology based on deep learning to identify small objects
handled similarly: Application in video surveillance, Knowledge-Based Systems (2020) 105590, https://doi.org/10.1016/j.knosys.2020.105590.
4 F. Pérez-Hernández, S. Tabik, A. Lamas et al. / Knowledge-Based Systems xxx (xxxx) xxx
(a) Compute p̂ where Pi is the histogram of the new instance and Qi is the
∑
nij rij average of the histogram of the k nearest neighbours.
1≤j̸ =i≤m
p̂i = p̂i for all i = 1, . . . , m (10) 3. Sohas_weapon database and ODeBiC methodology based on
nij µ̂ij
∑
1≤j̸ =i≤m deep learning
where nij is the number of training data in the ith
We propose the ODeBiC methodology based on deep learning
and jth classes.
for binary classifiers with the aim to detect small objects that can
(b) Normalise p̂
be confused because they are handled similarly. As case study, we
p̂i select a problem from the field of video surveillance, the detection
p̂i = ∑ for all i = 1, . . . , m (11)
p̂i of small objects that can be confused with a pistol or knife. We
i=1 create the datasets called Sohas_weapon.
In this section, first we describe the process we used to build
(c) Recompute µ̂ij
a dataset of small objects that can be hold similarly (Section 3.1).
p̂i Then, we present the ODeBiC methodology (Section 3.2).
µ̂ij = for all i, j = 1, . . . , m (12)
p̂i + p̂j
3.1. Sohas_weapon database construction for detection in surveil-
Finally, the output class:
lance videos
Class = arg max p̂i (13)
i=1,...,m
The quality of the learning of a CNN model depends sxtrongly
on the quality of the training database. The database must allow
Wu, Lin and Weng probability estimates by pairwise coupling ap-
the classification model to correctly distinguish between objects
proach (PE)
handled similarly.
The PE technique, also called Wu, Lin and Weng probability, is
We built four databases for training the classifications models,
similar to PC. It uses the pairwise coupling approach to calculate Database-1, 2, 3 and 4 using different types of images. These
the predictions [37]. The probabilities (p) of each class are esti- databases are based on the case study of the similar handled
mated starting from the pairwise probabilities. PE optimises the objects like pistol, knife, smartphone, bill, purse and card:
following problem:
m k 1. In the first step, we used the pistol images from the
database3 built in [20] and the knife images from the
∑ ∑ ∑
p 2
(rji pi − rij pj ) subject to pi = 1, pi ≥ 0, ∀i (14)
min
i=1 1≤j̸ =i≤m i=1
database built in [29]. Most images were downloaded from
Internet. We added the images of common objects that can
be handled similarly to a pistol and a knife: smartphone,
Distance-based relative competence weighting combination for One-
bill, purse and card. This database will be called Database-1.
Versus-One (DRCW-OVO)
2. In a second step, we added to each class images taken in
Distance-based relative competence weighting combination,
diverse conditions by a reflex camera, Nikon D5200. The
also called One-Versus-One strategy in multi-class problems
obtained database will be called Database-2.
(DRCW-OVO) [38], is one of variations [39] of OVO technique that
3. In a third step, we added to each object class images
intends to improve the problem of the imbalanced classes using
taken by two surveillance cameras with different qualities
the distance with the k elements near of the new instance.
and resolutions, Hikvision DS-2CD2420F-IW and Samsung
Once the score-matrix has been obtained, DRCW-OVO entails
SNH-V6410PN, and under diverse conditions. The obtained
the following:
database will be called Database-3.
1. Calculate the average distance of the k nearest neighbours
of each class in a vector d. 3 http://sci2s.ugr.es/weapons-detection.
Please cite this article as: F. Pérez-Hernández, S. Tabik, A. Lamas et al., Object Detection Binary Classifiers methodology based on deep learning to identify small objects
handled similarly: Application in video surveillance, Knowledge-Based Systems (2020) 105590, https://doi.org/10.1016/j.knosys.2020.105590.
F. Pérez-Hernández, S. Tabik, A. Lamas et al. / Knowledge-Based Systems xxx (xxxx) xxx 5
Table 1
Databases built to analyse the performance of objects that are manipulated similarly with hand.
Database- # img Pistol Knife Smartphone Bill Purse Card
1 4710 3394 1879 866 134 137 179
2 5454 3523 1879 1022 287 315 307
3 6658 3681 1879 1069 654 710 544
Sohas_weapon 5680 1580 1879 755 545 581 340
Sohas_weapon-Without_Pistol&Knife 2221 0 0 755 545 581 340
Sohas_weapon-Detection 3255 1425 1825 575 425 530 300
Sohas_weapon-Test 1170 294 470 115 123 104 64
Sohas_weapon-Test_Without_Pistol&Knife 406 0 0 115 123 104 64
Table 2
Four test surveillance videos created to analyse the performance of ODeBiC methodology.
Video # Frames Pistol Knife Smartphone Bill Purse Card Scenario
1 1962 235 289 217 302 342 391 Small office
2 2083 269 256 477 282 294 417 Hall view Left far
3 2070 329 274 284 294 330 356 Hall view Left near
4 2188 315 246 458 323 331 504 Hall wall
4. In the last step, we eliminated blurry images due to the ResNet101 feature extractor as it provides a good trade-off
motion and images where the human eye cannot recognise between accuracy and execution time. These candidates will
the object class. As we have mentioned the final database be analysed by the second level.
will be called Sohas_weapon. • Each output box will be analysed by a binarization tech-
nique, then an aggregation method is applied to calculate
To evaluate the quality of the databases guided by the qual- the final prediction. We will consider two binarization tech-
ity of the learning of the classification approaches we built a niques, OVA and OVO, in combination with different ag-
database called Database-Sohas_weapon-Test. The characteris- gregation methods. An illustration of OVA and OVO in the
tics of all the built databases are provided in Table 1. Besides, context of the pistol or knife and similar objects problem is
we used a database without pistol and knife class, Database- depicted in Figs. 2 and 3 respectively.
Sohas_weapon-Without_Pistol&Knife and Database-Sohas_
weapon-Test_Without_Pistol&Knife, to analyse the behaviour of The proposed two level methodology is depicted in Fig. 4.
the proposed classification approaches on the objects that have
a higher similarity in shape and way in which they are handled, 4. Experimental study
smartphone, bill, purse and card.
To training the detection models, we used Database-Sohas_ The purpose of this section is to analyse the performance of
weapon-Detection whose characteristics are summarised in Ta- different classification approaches, the baseline multi-classifier,
ble 1. This database contains the entire images (objects and OVA and OVO with several aggregation rules in Section 4.1, the
background) from which we cropped the images used to build study of similar objects in Section 4.2 and the evaluation of
the database Sohas_weapon. our methodology ODeBiC using four surveillance videos in Sec-
To analyse ODeBiC methodology, we created four test surveil- tion 4.3.
lance videos whose characteristics are summarised in Table 2.
These four surveillance videos were recorded in different scenar- 4.1. Evaluation of different classification approaches
ios: in a small office and in a hall at the entrance of a building, in
their viewpoints of the hall, with Samsung SNH-V6410PN camera. In this subsection we analyse the performance of different
classification approaches, the baseline multi-classifier, OVA and
3.2. ODeBiC methodology based on deep learning OVO with different aggregation rules, VOTE random, VOTE by
weight, WV, LVPC, ND, PC, PE, DRCW with k = 1, 2, 3 and
One of the main issues in object detection in surveillance 4, trained on Databases-1, 2, 3 and Sohas_weapon and tested
videos is that the objects that can be handled similarly can on Sohas_weapon-Test. All the analysed CNN models are based
be confused. This was shown in the pistol against background on ResNet-101 architecture [10] initialised with the pre-trained
detection model developed in our previous work [20]. weights on ImageNet [41]. We used TensorFlow [42] and NVIDIA
Herein, we propose using ODeBiC methodology based on deep Titan Xp for all the experiments. The training process takes ap-
learning to improve the reliability, robustness and accuracy to proximately two hours. The results are plotted in Fig. 5 and
identify small objects handled similarly. ODeBiC methodology has summarised in Table 3.
two level, the first level obtains candidate regions that contain As it can be observed from Fig. 5, in general, the performance
the target objects, and the second level classifies each region with of all the approaches increases from Database-1 to Database-
the binarization technique followed by an aggregation method to Sohas_weapon. In particular, when trained on Database-1, OVA
finally produce the output frame with the detection results. In and OVO provide similar performance as the baseline multi-
particular, ODeBiC methodology works as follows: classifier. On Database-2, OVA obtains the best performance over
all the methods. On Database-3 and Database-Sohas_weapon, all
• The first level analyses the input frame using a relaxed CNN- the OVO aggregation methods provide better performance than
detection model that outputs all the region proposals with the baseline multi-classifier.
a probability of having one or more target objects higher DRCW-OVO with k = 1 gets the best results on Database-
than 10%. This process could be seen as a candidate selec- 3. On Database-Sohas_weapon, OVO ND provide the best results
tion technique with an important knowledge of the target with a precision of 93,87%, recall of 93,09% and F1 of 93,43%.
object categories. We will consider Faster-RCNN based on The improvement with respect to the baseline multi-classifier on
Please cite this article as: F. Pérez-Hernández, S. Tabik, A. Lamas et al., Object Detection Binary Classifiers methodology based on deep learning to identify small objects
handled similarly: Application in video surveillance, Knowledge-Based Systems (2020) 105590, https://doi.org/10.1016/j.knosys.2020.105590.
6 F. Pérez-Hernández, S. Tabik, A. Lamas et al. / Knowledge-Based Systems xxx (xxxx) xxx
Fig. 2. OVA process in the problem of recognising small objects that can be manipulated with hand in a similar way.
Fig. 3. OVO process in the problem of recognising small objects that can be manipulated with hand in a similar way.
Fig. 4. The structure of the proposed two level methodology, ODeBiC, first detection then binarization.
Database-Sohas_weapon are 2,57% in precision, 2,06% in recall only the models that provide a good accuracy/execution time
and 2,34% in F1. trade-off, OVO with different aggregation rules, VOTE random,
However, in terms of execution time, DRCW-OVO takes 4,04 s VOTE by weight, WV, LVPC, ND, PC and PE.
per frame as it calculates the distance between all the images in As conclusion of this evaluation, the use of binarization tech-
the database. This makes DRCW-OVO inappropriate for real time niques produces better results than the baseline multi-classifier.
processing. Therefore, for evaluating our proposal, we selected Besides, this kind of techniques could be used in real time.
Please cite this article as: F. Pérez-Hernández, S. Tabik, A. Lamas et al., Object Detection Binary Classifiers methodology based on deep learning to identify small objects
handled similarly: Application in video surveillance, Knowledge-Based Systems (2020) 105590, https://doi.org/10.1016/j.knosys.2020.105590.
F. Pérez-Hernández, S. Tabik, A. Lamas et al. / Knowledge-Based Systems xxx (xxxx) xxx 7
Table 3
Results of all the classification approaches trained on Database-1, 2, 3 and Sohas_weapon and tested on Database-Sohas_weapon-Test.
Database-1 Database-2 Database-3 Database-Sohas_weapon Time (s)
Precision Recall F1 Precision Recall F1 Precision Recall F1 Precision Recall F1
Baseline multi-classifier 86.38% 77.35% 79.88% 86.87% 82.67% 84.18% 90.02% 89.42% 89.62% 91.30% 91.03% 91.09% 0,02821
OVA 87.02% 75.56% 78.71% 88.29% 82.40% 84.50% 90.49% 89.15% 89.72% 92.76% 92.03% 92.29% 0,03081
OVO VOTE random 85.29% 74.94% 78.23% 83.79% 78.87% 80.30% 91.59% 91.32% 91.38% 93.68% 93.16% 93.35% 0,02824
OVO VOTE weight 86.18% 75.40% 78.61% 85.35% 79.94% 81.67% 92.00% 91.54% 91.70% 93.85% 92.96% 93.35% 0,02823
OVO WV 85.95% 75.44% 78.60% 85.69% 80.23% 81.97% 91.44% 91.27% 91.29% 93.45% 92.68% 93.01% 0,02822
OVO LVPC 86.20% 74.15% 77.69% 85.35% 79.24% 81.28% 92.25% 91.32% 91.70% 93.55% 92.55% 93.00% 0,02828
OVO ND 85.50% 74.67% 77.81% 85.24% 80.25% 81.86% 91.86% 91.38% 91.55% 93.87% 93.09% 93.43% 0,02827
OVO PC 86.15% 73.98% 77.12% 84.70% 80.00% 81.34% 91.25% 90.97% 91.04% 93.41% 92.84% 93.07% 0,04493
OVO PE 84.84% 74.27% 77.37% 85.09% 79.84% 81.56% 91.72% 91.37% 91.47% 93.74% 92.96% 93.29% 0,02830
DRCW k = 1 85.23% 72.60% 76.33% 86.03% 78.68% 80.91% 92.74% 92.00% 92.32% 91.78% 91.42% 91.51% 4,02127
DRCW k = 2 85.99% 72.60% 76.51% 85.94% 78.19% 80.47% 92.36% 91.66% 91.94% 91.88% 91.48% 91.56% 4,02127
DRCW k = 3 85.68% 72.47% 76.36% 86.45% 78.80% 81.08% 92.24% 91.48% 91.79% 92.38% 91.81% 91.99% 4,02127
DRCW k = 4 85.62% 72.54% 76.40% 86.13% 78.76% 80.97% 92.09% 91.33% 91.65% 92.83% 91.93% 92.26% 4,02127
Please cite this article as: F. Pérez-Hernández, S. Tabik, A. Lamas et al., Object Detection Binary Classifiers methodology based on deep learning to identify small objects
handled similarly: Application in video surveillance, Knowledge-Based Systems (2020) 105590, https://doi.org/10.1016/j.knosys.2020.105590.
8 F. Pérez-Hernández, S. Tabik, A. Lamas et al. / Knowledge-Based Systems xxx (xxxx) xxx
Fig. 5. Results of each classification approach trained on Database-1, 2, 3 and Sohas_weapon and tested on Database-Sohas_weapon-Test.
• In terms of false positives, it reduces the number of false 5. Conclusions and future work
positives between 34,64% and 56,50% for threshold of 10%,
between 22,14% and 47,39% for a threshold of 50%, between This work presents the two level methodology ODeBiC based
12,76% and 43,01% for a threshold of 70% and between on deep learning for the detection of small objects that can be
−16,54% and 33,89% for a threshold of 90%. handled similarly. We considered as case study the detection of
• In terms of execution time, the baseline detection model small objects that can be confused with a handgun or a knife
takes 0,12341 s (equivalent to 8 fps) and ODeBiC method- in surveillance videos. We built a training database, called So-
ology with OVO PC takes around 0,16834 (equivalent to 6 has_weapon, which includes six objects that can be confused with
fps), which is appropriate for near real time system. a weapon as they are commonly handled in a similar way: pistol,
knife, smartphone, bill, purse or card.
In summary, ODeBiC methodology runs in near real time and Our experiments showed that ODeBiC methodology based on
achieves an improvement of up to 56,50% using an aggregation an aggregation method of OVO reduced the number of false
method of OVO. positives by up to 56,50% and between a −2,19% and 19,57%
Please cite this article as: F. Pérez-Hernández, S. Tabik, A. Lamas et al., Object Detection Binary Classifiers methodology based on deep learning to identify small objects
handled similarly: Application in video surveillance, Knowledge-Based Systems (2020) 105590, https://doi.org/10.1016/j.knosys.2020.105590.
F. Pérez-Hernández, S. Tabik, A. Lamas et al. / Knowledge-Based Systems xxx (xxxx) xxx 9
Table 5
Confusion matrix of best Database-Sohas_weapon and Database-Sohas_weapon-Without_Pistol&Knife, OVO-ND and DRCW-OVO k = 1 respectively.
Database-Sohas_weapon
Bill Knife Purse Pistol Smartphone Card Precision Recall F1
Bill 118 1 0 0 0 2 97,52% 95,93% 96,72%
Knife 0 457 2 2 4 0 98,28% 97,23% 97,75%
Purse 3 0 94 3 10 0 85,45% 90,38% 87,85%
Pistol 0 10 4 289 1 3 94,14% 98,30% 96,17%
Smartphone 0 1 4 0 99 1 94,29% 86,09% 90,00%
Card 2 1 0 0 1 58 93,55% 90,63% 92,06%
93,87% 93,09% 93,43%
Database-Sohas_weapon-Without_Pistol&Knife
Bill Knife Purse Pistol Smartphone Card Precision Recall F1
Bill 120 – 1 – 0 2 97,56% 97,56% 97,56%
Purse 2 – 101 – 13 0 87,07% 97,12% 91,82%
Smartphone 0 – 2 – 100 3 95,24% 86,96% 90,91%
Card 1 – 0 – 2 59 95,16% 92,19% 93,65%
93,76% 93,46% 93,48%
Table 6
Results of ODeBiC methodology on four surveillance videos.
Threshold 10% Threshold 50% Threshold 70% Threshold 90%
TP FP Precision TP FP Precision TP FP Precision TP FP Precision
Baseline 1189 630 65,37% 994 346 74,18% 926 272 77,30% 843 177 82,65%
OVO VOTE Random 1540 279 84,66% 1156 184 86,27% 1037 161 86,56% 900 120 88,24%
OVO VOTE Weight 1535 284 84,39% 1150 190 85,82% 1035 163 86,39% 898 122 88,04%
OVO WV 1545 274 84,94% 1158 182 86,42% 1043 155 87,06% 903 117 88,53%
Video 1 1776 GT
OVO LVPC 1529 290 84,06% 1148 192 85,67% 1037 161 86,56% 900 120 88,24%
OVO ND 1535 284 84,39% 1150 190 85,82% 1036 162 86,48% 898 122 88,04%
OVO PC 1525 294 83,84% 1137 203 84,85% 1022 176 85,31% 882 138 86,47%
OVO PE 1533 286 84,28% 1155 185 86,19% 1037 161 86,56% 899 121 88,14%
Baseline 1248 617 66,92% 1064 332 76,22% 992 235 80,85% 870 133 86,74%
OVO VOTE Random 1368 497 73,35% 1084 312 77,65% 972 255 79,22% 821 182 81,85%
OVO VOTE Weight 1385 480 74,26% 1094 302 78,37% 981 246 79,95% 826 177 82,35%
OVO WV 1402 463 75,17% 1101 295 78,87% 985 242 80,28% 828 175 82,55%
Video 2 1995 GT
OVO LVPC 1367 498 73,30% 1078 318 77,22% 970 257 79,05% 818 185 81,56%
OVO ND 1380 485 73,99% 1091 305 78,15% 978 249 79,71% 823 180 82,05%
OVO PC 1480 385 79,36% 1148 248 82,23% 1022 205 83,29% 848 155 84,55%
OVO PE 1378 487 73,89% 1092 304 78,22% 977 250 79,63% 825 178 82,25%
Baseline 1250 557 69,18% 1073 298 78,26% 1014 241 80,80% 901 158 85,08%
OVO VOTE Random 1403 404 77,64% 1116 255 81,40% 1041 214 82,95% 911 148 86,02%
OVO VOTE Weight 1417 390 78,42% 1127 244 82,20% 1051 204 83,75% 917 142 86,59%
OVO WV 1421 386 78,64% 1126 245 82,13% 1049 206 83,59% 914 145 86,31%
Video 3 1867 GT
OVO LVPC 1406 401 77,81% 1118 253 81,55% 1043 212 83,11% 910 149 85,93%
OVO ND 1409 398 77,97% 1120 251 81,69% 1045 210 83,27% 913 146 86,21%
OVO PC 1443 364 79,86% 1139 232 83,08% 1056 199 84,14% 912 147 86,12%
OVO PE 1407 400 77,86% 1118 253 81,55% 1044 211 83,19% 914 145 86,31%
Baseline 1502 742 66,93% 1301 381 77,35% 1211 292 80,57% 1063 208 83,63%
OVO VOTE Random 1816 428 80,93% 1404 278 83,47% 1266 237 84,23% 1093 178 86,00%
OVO VOTE Weight 1819 425 81,06% 1409 273 83,77% 1270 233 84,50% 1096 175 86,23%
OVO WV 1821 423 81,15% 1409 273 83,77% 1269 234 84,43% 1094 177 86,07%
Video 4 2177 GT
OVO LVPC 1794 450 79,95% 1392 290 82,76% 1253 250 83,37% 1083 188 85,21%
OVO ND 1819 425 81,06% 1408 274 83,71% 1269 234 84,43% 1095 176 86,15%
OVO PC 1874 370 83,51% 1440 242 85,61% 1296 207 86,23% 1113 158 87,57%
OVO PE 1807 437 80,53% 1397 285 83,06% 1260 243 83,83% 1087 184 85,52%
in precision, depending on the threshold, with respect to the Conceptualization, Methodology, Writing - original draft, Writing
baseline detection model. - review & editing. Alberto Lamas: Conceptualization, Method-
ODeBiC methodology can be used as a detection model in ology, Writing - review & editing. Roberto Olmos: Conceptual-
surveillance videos as it produces robust output, considerably ization, Methodology, Writing - review & editing. Hamido Fu-
reduces the number of false positives and obtains better precision jita: Conceptualization, Methodology, Writing - review & editing.
than the baseline detection model. Francisco Herrera: Conceptualization, Methodology, Writing -
As future work, we will design a new pre-processing strategy review & editing.
to filter noisy instances that can cause confusion in the CNN
model. Acknowledgements
CRediT authorship contribution statement This research work is partially supported by the Spanish Min-
istry of Science and Technology under the project TIN2017-
Francisco Pérez-Hernández: Conceptualization, Methodology, 89517-P and BBVA under the project DeepSCOP-Ayudas Fun-
Writing - original draft, Writing - review & editing. Siham Tabik: dación BBVA a Equipos de Investigación Científica en Big Data
Please cite this article as: F. Pérez-Hernández, S. Tabik, A. Lamas et al., Object Detection Binary Classifiers methodology based on deep learning to identify small objects
handled similarly: Application in video surveillance, Knowledge-Based Systems (2020) 105590, https://doi.org/10.1016/j.knosys.2020.105590.
10 F. Pérez-Hernández, S. Tabik, A. Lamas et al. / Knowledge-Based Systems xxx (xxxx) xxx
2018. Siham Tabik was supported by the Ramon y Cajal Pro- [22] M. Yu, L. Gong, S. Kollias, Computer vision based fall detection by a con-
gramme (RYC-2015-18136). The Titan X Pascal used for this volutional neural network, in: Proceedings of the 19th ACM International
Conference on Multimodal Interaction, ACM, 2017, pp. 416–420.
research was donated by the NVIDIA Corporation.
[23] X. Chen, T. Fang, H. Huo, D. Li, Measuring the effectiveness of various
features for thematic information extraction from very high resolution
References remote sensing imagery, IEEE Trans. Geosci. Remote Sens. 53 (9) (2015)
4837–4851.
[1] A.R. Pathak, M. Pandey, S. Rautaray, Application of deep learning for object [24] K. Öztürk, M.B. Yilmaz, A comparison of classification approaches for deep
detection, Procedia Comput. Sci. 132 (2018) 1706–1717. face recognition, in: Computer Science and Engineering (UBMK), 2017
[2] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with International Conference on, IEEE, 2017, pp. 227–232.
deep convolutional neural networks, in: Advances in Neural Information [25] H. Lei, V. Govindaraju, Half-against-half multi-class support vector ma-
Processing Systems, 2012, pp. 1097–1105. chines, in: International Workshop on Multiple Classifier Systems, Springer,
[3] Q. Zhang, L.T. Yang, Z. Chen, P. Li, A survey on deep learning for big data, 2005, pp. 156–164.
Inf. Fusion 42 (2018) 146–157. [26] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, A.C. Berg,
[4] J. Yuan, X. Hou, Y. Xiao, D. Cao, W. Guan, L. Nie, Multi-criteria active deep Ssd: Single shot multibox detector, in: European Conference on Computer
learning for image classification, Knowl.-Based Syst. 172 (2019) 86–94. Vision, Springer, 2016, pp. 21–37.
[5] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, [27] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z.
A. Karpathy, A. Khosla, M. Bernstein, et al., Imagenet large scale visual Wojna, Y. Song, S. Guadarrama, et al., Speed/accuracy trade-offs for modern
recognition challenge, Int. J. Comput. Vis. 115 (3) (2015) 211–252. convolutional object detectors, in: Proceedings of the IEEE Conference on
[6] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, 2017, arXiv Computer Vision and Pattern Recognition, 2017, pp. 7310–7311.
preprint arXiv:1709.01507 7. [28] R. Olmos, S. Tabik, A. Castillo, F. Pérez, F. Herrera, A binocular image fusion
[7] T.Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, approach for minimizing false positives in handgun detection with deep
C.L. Zitnick, Microsoft COCO: Common objects in context, in: European learning, Inf. Fusion 49 (2019) 271–280.
Conference on Computer Vision, Springer, 2014, pp. 740–755. [29] A. Castillo, S. Tabik, F. Pérez, R. Olmos, F. Herrera, Brightness guided
[8] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object de- preprocessing for automatic cold steel weapon detection in surveillance
tection with region proposal networks, in: Advances in Neural Information videos with deep learning, Neurocomputing 330 (2019) 151–161.
Processing Systems, 2015, pp. 91–99. [30] J.H. Friedman, Another Approach to Polychotomous Classification,
[9] K.J. Dai, Y.L. R-FCN, Object detection via region-based fully convolutional Technical Report, Statistics Department, Stanford University, 1996.
networks, 2016, arXiv preprint. arXiv preprint arXiv:1605.06409. [31] M. Galar, A. Fernández, E. Barrenechea, H. Bustince, F. Herrera, An overview
[10] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, of ensemble methods for binary classifiers in multi-class problems: Exper-
in: Proceedings of the IEEE Conference on Computer Vision and Pattern imental study on one-vs-one and one-vs-all schemes, Pattern Recognit. 44
Recognition, 2016, pp. 770–778. (8) (2011) 1761–1776.
[11] K. Simonyan, A. Zisserman, Very deep convolutional networks for [32] E. Hüllermeier, K. Brinker, Learning valued preference structures for
large-scale image recognition, 2014, arXiv preprint arXiv:1409.1556. solving classification problems, Fuzzy Sets and Systems 159 (18) (2008)
[12] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. 2337–2352.
Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings [33] J.C. Huhn, E. Hullermeier, Fr3: A fuzzy rule learner for inducing reliable
of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, classifiers, IEEE Trans. Fuzzy Syst. 17 (1) (2009) 138.
pp. 1–9. [34] S. Orlovsky, Decision-making with a fuzzy preference relation, Fuzzy Sets
[13] P. Clark, R. Boswell, Rule induction with cn2: Some recent improvements, and Systems 1 (3) (1978) 155–167.
in: European Working Session on Learning, Springer, 1991, pp. 151–163. [35] A. Fernández, M. Calderón, E. Barrenechea, H. Bustince, F. Herrera, Enhanc-
[14] R. Anand, K. Mehrotra, C.K. Mohan, S. Ranka, Efficient classification for ing fuzzy rule based systems in multi-classification using pairwise coupling
multiclass problems using modular neural networks, IEEE Trans. Neural with preference relations, EUROFUSE 9 (2009) 39–46.
Netw. 6 (1) (1995) 117–124. [36] T. Hastie, R. Tibshirani, Classification by pairwise coupling, in: Advances
[15] S. Knerr, L. Personnaz, G. Dreyfus, Single-layer learning revisited: a in Neural Information Processing Systems, 1998, pp. 507–513.
stepwise procedure for building and training a neural network, in: [37] T.F. Wu, C.J. Lin, R.C. Weng, Probability estimates for multi-class clas-
Neurocomputing, Springer, 1990, pp. 41–50. sification by pairwise coupling, J. Mach. Learn. Res. 5 (Aug) (2004)
[16] J.C. Platt, N. Cristianini, J. Shawe-Taylor, Large margin dags for multiclass 975–1005.
classification, in: Advances in Neural Information Processing Systems, [38] M. Galar, A. Fernández, E. Barrenechea, F. Herrera, Drcw-ovo: distance-
2000, pp. 547–553. based relative competence weighting combination for one-vs-one strategy
[17] S. Abe, Analysis of multiclass support vector machines, Thyroid 21 (3) in multi-class problems, Pattern Recognit. 48 (1) (2015) 28–42.
(2003) 3772. [39] R.M. Cruz, R. Sabourin, G.D. Cavalcanti, Dynamic classifier selection: Recent
[18] C. Zhang, J. Bi, S. Xu, E. Ramentol, G. Fan, B. Qiao, H. Fujita, Multi- advances and perspectives, Inf. Fusion 41 (2018) 195–216.
imbalance: An open-source software for multi-class imbalance learning, [40] O. Pele, M. Werman, The quadratic-chi histogram distance family, in:
Knowl.-Based Syst. (2019). European Conference on Computer Vision, Springer, 2010, pp. 749–762.
[19] A. Fernández, S. García, M. Galar, R.C. Prati, B. Krawczyk, F. Herrera, [41] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale
Learning from Imbalanced Data Sets, Springer, 2018. hierarchical image database, in: Computer Vision and Pattern Recognition,
[20] R. Olmos, S. Tabik, F. Herrera, Automatic handgun detection alarm in videos 2009. CVPR 2009. IEEE Conference on, Ieee, 2009, pp. 248–255.
using deep learning, Neurocomputing 275 (2018) 66–72. [42] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S.
[21] A. Rocha, S.K. Goldenstein, Multiclass from binary: Expanding one-versus- Ghemawat, G. Irving, M. Isard, et al., Tensorflow: a system for large-scale
all, one-versus-one and ecoc-based approaches, IEEE Trans. Neural Netw. machine learning., in: OSDI, Vol. 16, 2016, pp. 265–283.
Learn. Syst. 25 (2) (2014) 289–302.
Please cite this article as: F. Pérez-Hernández, S. Tabik, A. Lamas et al., Object Detection Binary Classifiers methodology based on deep learning to identify small objects
handled similarly: Application in video surveillance, Knowledge-Based Systems (2020) 105590, https://doi.org/10.1016/j.knosys.2020.105590.