Background Subtraction Using Artificial Immune
Background Subtraction Using Artificial Immune
https://doi.org/10.1007/s11042-020-08935-1
Abstract
Background subtraction is an essential step in the video monitoring process. Several models
have been proposed to differentiate background pixels from foreground pixels. However,
most of these methods fail to distinguish them in highly dynamic environments. In this
paper, we propose a new method robust and more efficient for distinguishing moving objects
from static objects in dynamic scenes. For this purpose, we propose to use a bio-inspired
approach based on the Artificial Immune Recognition System (AIRS) as a classification
tool. AIRS separates antibodies, represented by the pixels of the background model, from
the antigens that model foreground pixels representing moving objects. Each pixel is mod-
eled by a feature vector containing the attributes of a Gaussian. Only the pixels classified
as background are taken into account by the system and updated in the model. This combi-
nation has allowed to benefit from two advantages: the power of AIRS to provide an online
update of system parameters and the ability of Gaussians to adapt to scene variations at the
pixel level. To test the proposed approach, six videos representing the dynamic background
category of the CDnet 2014 dataset are selected. Obtained results proved the effectiveness
of this new process in terms of quality and complexity compared to other state-of-the-art
methods.
1 Introduction
Background subtraction (BS) also called motion detection or foreground detection, is a cru-
cial step in many computer vision application like: video surveillance [7, 35], multimedia
Wafa Nebili
[email protected]
Brahim Farou
[email protected]
Hamid Seridi
[email protected]
1 LabSTIC, 8 mai 1945 Guelma University, POB 401, 24000 Guelma, Algeria
Multimedia Tools and Applications
[11] and optical motion capture [5], etc. BS consists to model the background before
detecting the moving objects (foreground). Generally, the moving objects are a humans,
cars, texts, etc. The intuitive way to model, the background is to train the system using
a set of frames devoid of moving objects. After that, we applied an on-line or off-line
process to extract the foreground from frames. In the on-line process, the background
model updated during the whole execution to pick up any new changes in the back-
ground within the video sequence, but in the off-line process, the background model is
unchanged. An efficient method for detection moving objects must ensure a good separa-
tion between the background and the foreground, with gain in execution time and memory
space. Single Gaussian (SG) is among the most popular methods that have achieved great
success in the detection of moving objects, since it is simple, very fast and inexpensive
in the calculation. However, this method is sensitive to the fast variations of pixel. For
example, when the background is dynamic, SG cannot memorize all states of pixel. Sev-
eral research works are proposed to improve SG results quality, among these methods:
[10, 17, 39], etc.
Recently, with the development of the technology, there exist many bio-inspired tech-
niques and intelligent algorithms for scientific and engineering computing. One of these
techniques, Artificial Neural Network (ANN), which is used to mathematically model the
intelligence of the human brain [1]. AIRS also is a bio-inspired technique proposed by [56]
and it describes the recognition of self cells from strange cells in the human body. Another
algorithm proposed by Yang and Xin-She [59] consists to model the flight of bats during
their search on the food source. A novel extension of this algorithm has been proposed in
the work of Yong et al. [61]. In the context of the optimization, Ishibuchi et al. [19] pro-
posed an optimization algorithm with an objective function that simultaneously solves four
or more contradictory objectives, this algorithm named Many-objective optimization prob-
lems (MaOPs). An enhancing of MaOPs is proposed by Li et al. to improve the selected
features and the accuracy [28]. Moreover, there are many other bio-inspired algorithms like:
IBEA-SVM [29], PSO [8, 18, 23, 60], etc.
In this paper, we propose a new approach for background subtraction in dynamic scenes.
The proposed system performs a combination of AIRS and SG for background modeling.
The choice of the AIRS classifier among the other classifiers is determined by three main
factors. The first one is the concordance of the model with the treated problem, knowing
that the latter allows to separate all that is body (in our case the background) with all that
is foreign (in our case moving objects). The second factor is related to the dynamic nature
of the videos processed. Indeed, most of the methods cited in the state of the art do not
make it possible to adapt easily with frequent changes of the backgrounds and which in
most cases require several frames. The AIRS classifier can easily cope with this type of
problem, thanks to the cloning operations, which allows to create several valid backgrounds
(background model) with dynamic management of the number of models using only the
current frame which considerably increases the success rate. The ability to update the system
when it is operational is another factor that gives the AIRS classifier an advantage over other
classifiers. Indeed, the online update allows the systems to familiarize with the possible
changes in the background without going through a re-initialization step which requires a
system shutdown during a long learning period (for example: growing tree, changing the
color of a wall, etc.).
We have made some improvements in the affinity measurement, the competition for
resources and development of a candidate memory cells process and in the principle of
memory cell introduction so that it can meet the requirements of the background subtraction
process.
Multimedia Tools and Applications
The reminder of the paper is organized as follows: Section 2 presents a state of the art
on changes detection in dynamic scenes. Sections 3 and 4 explain respectively the basic
SG and AIRS methods. Section 5 focuses on the description of the proposed approach.
Experiments are discussed in Section 6, and we end with a conclusion and some perspectives
in Section 7.
2 Related work
Background subtraction in dynamic scenes is a difficult problem that requires an effective method
to ensure a good separation between the background and the foreground. Several studies have
been proposed to improve the background subtraction and to reduce noisy due to misiden-
tified pixels or the presence of haze in the scene [62, 64]. These studies can be divided into
five groups (Fig. 1). One of them focuses on the selection and the combination of good fea-
ture (color, edge, texture, etc.), while the others try to develop methods that can deal with
all the possible scenarios in order to separate the moving objects from the static ones.
The selection of the best characteristics is an essential way that allows the least efficient
classifier to well separate between classes. It is for this reason that several works have
focused on the selection and combination of several characteristics in order to increase
the discriminating power of classifiers. In color and texture context, St-Charles et al. [47]
used a binary spatio-temporal features and color information to detect local variations at
the pixel level. Authors in [53] proposed a new method (M 4 CD) that exploits color, tex-
ture, and other heterogeneous features to separate foreground from background pixels.
However, the extraction of color and texture features requires a lot of time which influ-
ences on the execution time. Edges based features are also used in background subtraction,
Allebosch et al. [2] propose a local ternary pattern descriptor with RGB color informa-
tion to identify foreground pixels. The problem with this proposal types lies in the fact
Background Subtraction
St-Charles et al. [21] Neural network methods Hybrid methods Parametric methods Non-parametric methods
Wang et al. [22] Babaee et al. [24] Friedman and Russell [35] Elgammal et al. [46]
Laugraud et al. [31]
Allebosch et al. [23] Lim and Keles [25] Stauffer and Grimson [36] St-Charles et al. [47]
Bianco et al. [32]
Wang et al. [26] Chen et al. [38] Krungkaew and Kusakunniran [48]
Wang et al. [33]
that all approaches require a robust edge detection method to achieve a good background
subtraction.
Neural network methods have been widly used for enhancing the performance of the back-
ground subtraction systems in dynamique scenes. Babaee et al. [3] proposed a background
subtraction system with deep Convolutional Neural Network (CNN). Following the same
perspectives, authors in [31] used a triplet convolutional neural networks to build a robust
background model. Wang et al. [55] presented a semi-automatic CNN for motion detection
which uses a few examples that manually describe the moving object for the training. When
the training step is finished, a segmentation maps (foreground and background models) are
produced to be used later in order to label the remain frames in the video sequence. Grego-
rio and Giordano [9] proposed a weightless neural network called WiSARDrp to model the
background. Auto-Adaptive Parallel SOM Architecture (AAPSA) is a self-organized maps
proposed by [42] for background subtraction. This method allows a parallel adaptation of
rates to detect changes in video sequences. Authors in [36] presented a new method that
creates a neural background model without any prior patterns. This model learns and adapts
automatically with the scene variations. Furthermore, authors introduced the notion of spa-
tial coherence in the background updating process (SC-SOBS) to provide a robust method
to dealing with false detections. An extension of this work is proposed by Maddalena and
Petrosino [37] in which they introduce the fuzzy concept in the update process of the
background model.
Despite the high quality results achieved by the deep Neuronal Networks based methods,
these latter needs an offline training mechanism which always requires a large number of
ground truth examples to create a robust model which are, in the most cases, not available
in real applications. In addition, this learning needs very powerful computers and takes a
lot of time for training the system. However, the major problem remains the inability of this
latter to adapt to the permanent changes of the background.
Recently, many approaches have focused on the combination of several algorithms to sub-
tract the background. Laugraud et al. [25] combined information extracted from a semantic
segmentation algorithm with information extracted from any background subtraction algo-
rithm to identify moving objects. However, semantic segmentation is difficult and requires
a very robust algorithm for the detection, localization and simultaneous segmentation of
objects and semantic regions. Bianco et al. [4] also combined a set of algorithms for video
change detection. They used the principle of genetic programming in: automatic selection
of the best algorithms, the combination of their results and the application of the most
appropriate post-processing operations on algorithm outputs. Nevertheless, the choice of
the algorithms that constitute the initial population influences on the results quality. Fur-
thermore, the evaluation of the fitness function is related to the used dataset and cannot be
generalized under any case. In the same context Wang et al. [51] performed a hybridiza-
tion between motion information, environmental change and object appearance to detect
moving objects. In [40] authors proposed background subtraction system based on graph
cut method that uses two algorithms: optical flow algorithm and GMM to identify moving
objects. However, this system is not efficient to detect moving objects that park for a period
of time.
Multimedia Tools and Applications
Gaussian Mixture model (GMM) is among the most commonly used algorithms to deal
with motion detection in dynamic scenes. This method represents the history of each pixel
separately with a mixture of Gaussian probability distribution. The first proposal of this
model was made by Friedman and Russell [14]; however, the standard model with a well-
defined equations was suggested by Stauffer and Grimson [48]. Since that, many works have
tried to use GMM in solving the problems of background disturbance [22]. Chen et al. [6]
provided a sharable GMM model in a well-defined region. The idea is that each pixel search
for the most matched and optimal model from the foreground and background models of
their neighborhood that can represent their state. This proposition reduced the noises caused
by the local small movements. The major drawback of this technique lies in the size of the
search region on shareable models. Indeed an efficient method for creating homogeneous
regions is required. In the same context, Farou et al. [13] proposed a Gaussian mixture with
background spotter, which assigns a spotter (agent) to each frame block to detect changes
in the background. Many extensions have been proposed to improve the model adaptation
speed and the dynamic number of Gaussians in GMM, among them: Martins et al. [38]
presented a MOG with dynamic learning rate (Boosted MOG (BMOG)) and Zivkovic [66]
added a recursive equations that automatically select the number of Gaussians needed for
each pixel at each observed scene. However, all these improvements remain valid under
very specific constraints and environments.
Wang and Dudek [52] proposed a new adaptive algorithm for modeling the background
in video scenes. This method represents the history of each pixel with a only few templates
(background value). The update process is performed at each period by removing the least
useful background values of the background model. This method provides very good execu-
tion speed due to the low algorithmic complexity; however, the number of samples used to
model the background does not represent all the pixel states, especially when the pixel vari-
ations are very fast. Authors in [20] proposed a new change detection method that used the
sliding window technique with a feedback mechanism to control and update the parameters
of the background model. This method allows to build a robust background model that can
be adjusted dynamically with the scenes variations. However, the result quality depends on
the size of the sliding window. In order to remove the false positives caused by the misclas-
sified pixels, authors in [26] proposed to perform a matching between the foreground and
the false positive pixels for background subtraction. The development of this method is very
difficult since it requires to find the unique trend of the dynamic background and to model
it mathematically in order to supervise the background regions. Authors in [43] proposed a
universal background subtraction system based on the notion of a megapixel (MP). It con-
sists to merge multiple background models in megapixels following a clustering algorithm
to detect moving objects. The use of megapixels allows to reduce the number of isolated
pixels. However, the results quality depends on the robustness of the clustering algorithm to
create a homogeneous megapixels.
In the same context, several studies have focused on the development of non-parametric
background subtraction models. KDE (kernel density estimator) is one of these mod-
els that estimated the probability density function of the recent N values of each pixel
with a kernel estimator. The model is adapted quickly to the scene variations which
allows a very sensitive detection of moving objects without any parameters [12]. Authors
Multimedia Tools and Applications
in [46] used an approach based on self-adjusted dictionary words to model the back-
ground. In this approach, the current state of each pixel is described with three values
(visual word), which are estimated during the learning stage. After each pixel classifi-
cation, internal parameters adjusted with a feedback mechanism to update regularly the
background model. Krungkaew and Kusakunniran [24] addressed a new visual word dic-
tionary method that uses the lab color space (light, color channels) for detecting moving
objects in dynamic background. In [34] the author presented a multi-scale spatio-temporal
background subtraction method that collects the history of the pixel samples and its neigh-
bors at different scales to detect moving objects. This method requires a lot of time
and a large memory space to transform pixel samples at different scales and to save it
with its neighbors. The method proposed by [21] used the notion of weighted samples
to build an effective background model. The model is initialized with a small number of
samples that have a variable weight. During the execution, the system replaces the sam-
ples having low weight with other samples. However, an efficient method is required
for initializing the weights. Furthermore, the sample size differs from scene to another.
Indeed, when the variation of pixels is very fast a small sample increases the false
detection.
Background modelling with a Single Gaussian (SG) was proposed by Wren et al. [58]. The
model assumes that there is an independence between pixels and it represents the history of
the last n pixel values with a probability density function (1).
2
1 − (Pt −u)
P (Pt ) = √ e 2σ 2 (1)
2π σ 2
This method creates a model for each pixel. The model is composed of the mean, variance
and current pixel value. The mean describes the dominant color of the current pixel and the
variance represents the viability of that pixel around the mean.
For each pixel of the current frame Pt , it is possible to determine whether it is a back-
ground or foreground pixel using (2). If (2) is verified the pixel belongs to the background
pixel set, otherwise the pixel represents the foreground.
|Pt − ut |
< 2.5 (2)
σt
To take into account the illumination changes of the scene in the model, the Gaussian
parameters are updated according to (3) and (4).
σt2 = (1 − α)σt−1
2
+ α(Pt − ut )(Pt − ut )T (4)
Where, the parameters ut , σt denote respectively the mean and the variance of pixel P at
time t.
The learning rate α determines the speed of adaptation.
Multimedia Tools and Applications
The natural immune system can determine which cells belong to it and which cells do
not, and it can protect the human body from external objects. In 2001, the first supervised
artificial immune system was proposed by [56]. The model is based on antigen-antibody
representation, which measures a degree of “closeness” or similarity between training data
(antigens) and cell B, also called affinity or stimulation degree. The algorithm takes an
antigens as an input and produces a set of memory cells. Memory cells represent the model
that can be used in the classification phase.
The AIRS process is described in four stages:
– Initialization This phase can be considered as a data pre-processing stage, because
all data set items are normalized to ensure that the affinity between any two-feature
vectors of the dataset is always in the range of [0,1]. After normalization, the Affinity
Threshold (AT) is calculated. AT represents the average affinity value of all training
data items (see (5)).
n−1 n
i=1 j =i+1 Aff inity(agi , agj )
AT = n(n−1)
(5)
2
Where, n represents the number of antigens in training data and Affinity (agi , agj )
is the euclidean distance between two pairs of antibodies or antigens. Affinity thresh-
old will be used later in the memory cell replacement process. The final step in this
stage consists to initialize the memory cells set MC and ARB populations with a set of
antigens ag randomly selected from the training data AG. This step is optional, ARBs
and MC set can be initialized with ∅. All the following stages are applied only on one
antigen ag.
– Memory cell identification and ARB generation This stage is composed of two
mechanisms; the first mechanism consists of identifying the memory cell mcmatch the
most similar to the antigen ag and which has the same class of this latter. mcmatch is
calculated using (6).
mcmatch = argmaxmc∈MCag.c Stimulation(ag, mc) (6)
Stimulation(ag, mc) = 1 − Aff inity(ag, mc) (7)
If MCag.c = ∅, then mcmatch ← ag and MCag.c ← MCag.c ∪ ag. After identifying
the memory cell mcmatch , AIRS generates new ARBs clones using mcmatch . It creates
N umClones new clones. Each clone represent a feature vector mcmatch that is muted
with an empirically fixed rate between 0 and 1. The number of clones is calculated with
(8):
N umClones = H mr × Clonal rate × Stimulation(ag, mcmatch ) (8)
Where, Hyper mutation rate (H mr) and Clonal rate are two integer values
selected empirically. The mutation function is defined in [57].
– Competition for resources and development of a candidate memory cell All the new
clones generated in the previous stage, mcmatch and all ARBs remained from previous
antigen reactions are added to the ARB set (AB). To generate better representatives,
the method has a mechanism to organize the survival of individuals within the ARB
populations. This mechanism involves a process of allocated resource based on the
sharing of cumulative resources according to the class of the antigen.
ab, which have the same class of the antigen, get the half of the cumulative resources.
The remaining half of resources will be split on the ab that belong to other classes.
Multimedia Tools and Applications
ARBs of each class have a maximum allocation, if the sum of the resources allocated
by that class exceeds the maximum allocation, in this case, resources are removed from
the least stimulated ab. If the resources of any ab are minimized to zero with this
process, this ab will be removed from ARB populations.
Number of resources for each ab ∈ AB is calculated with the (9).
ab.resources = ab.stimulation × Clonal rate (9)
Competition for resources process is repeated until the average stimulation of each class
is greater than stimulation threshold. The stopping criteria allows ARBs to profits the
opportunity to produce mutated offspring. If the stopping criterion is met, the remaining
ab are used to select a candidate memory cell (mccandidate ). mccandidate represents the
ab that most stimulated with ag and which have the same class of the latter.
– Memory cell introduction This stage is the last step in the training process of one
antigen. Candidate memory cell mccandidate which has a stimulation greater than that
of mcmatch is introduced into MC as a new memory cell. If the affinity between
mccandidate and mcmatch is less than AT × AT S, mccandidate replaces mcmatch in MC
set. Affinity Threshold Scalar (ATS) is a value between 0 and 1 chosen by the user.
This paper proposes a new method for subtracting the background through a fixed cam-
era in a video surveillance system by performing a combination between AIRS and SG.
AIRS algorithm is used as a classification tool that separates antibodies represented by pix-
els belonging to the background model from antigens that model pixels belonging to the
foreground representing moving objects.
AIRS allows to generate a set of background representatives with an iterative process
that searching among the existing models the Gaussians which have a degree of similarity
closest to the current value of the pixel. In the mutation step, the best representative among
all Gaussians previously generated is muted according to an empirically defined rate to
predict the values that a pixel can take. After creating new clones, a filtering process is
applied on this clones to increase precision. The current pixel is classified as background
pixel if it has at a minimum one similar element in all representatives. This combination
allowed us to benefit from the advantages of AIRS which allows an online update of the
system parameters and Gaussians power to adapt with scene variations at the pixel level.
Figure 2 presents the global architecture of our system and details the different modules that
composes it.
5.1 Pre-processing
This is a preliminary step in any video processing. During this step, the video is split into
sequences of 2D images called frames, which represents the input of our system. These
frames are encoded in RGB mode, unfortunately this mode is sensitive to the light effects
[15], and the use of this latter does not ensure the creation of an efficient model. HSV is a
more robust mode for changing lighting, since it separates the luminosity represented with
V component and the chromatic properties represented by H and S components.
In most cases the component V is ignored to reduce the light effects [50, 65]. After many
experiments, we noticed that the use of S component did not add a significant quality to the
system compared with the gain obtained in processing time by removing the latter. Indeed,
Multimedia Tools and Applications
Video
Fragmentation
Pre-processing
Frames
Conversion
RGB to HSV
Update
Background AIRS-SG
Models Use
Post Processing
Decision
S component express the shadow effect on color. For this purpose, only the H component is
taken into consideration.
5.2 AIRS-SG
In dynamic scenes, single Gaussian is sensitive to fast pixel variations. Indeed, a single
Gaussian cannot memorize the old states of the pixel. To overcome this problem, we have
proposed a new background subtraction method called AIRS-SG. This method uses the
principle of SG approach and the robustness of AIRS algorithm to create an efficient model
to fast pixel variation. The proposed model considers the pixel value H as antigens (ag) and
the corresponding Gaussians as memory cells (mc) that have recognized this antigen. In the
following, we will detail the principle of the proposed model AIRS-SG.
This is a standard step in any background subtraction algorithm. It plays a very important
role in the measurement, definition and identification of parameters. In our case, it consists
to use SG approach on the first images of the video to initialize the MC set of the each pixel.
This step is essential for a faster convergence of the recognition system. ARB populations
remains empty during the initialization stage and its construction is done gradually as the
system operates.
The affinity measure defined in the AIRS algorithm is not adequate in our context. For
this purpose, we proposed a new affinity measurement, that represents the distance between
each pixel and their models according to (10).
|Pt − ut |
(10)
σt
Multimedia Tools and Applications
n m Pi,j −ui,j
i=1 j =1 σi,j
AT = (11)
n×m
Where, n, m is the number of rows and columns in an image; Pi,j , ui,j , σi,j are respectively
the pixel value, the mean and the variance of the model provided by SG.
For each pixel Pt (agt ), we generate a subset named MCbackground from the MC set that
verifies (12). Each mc ∈ MCbackground is represented by a Gaussian (pixel value, mean u,
variance σ ). The pixel with an empty MCbackground , will be defined as a foreground pixel.
In the case where MCbackground is not empty, the pixel belongs to the background pixels.
After this step, AIRS-SG performs a research process on the closest mc called mcmatch
from MCbackground according to (13). mcmatch is mutated with a rate between 0 and 1 to
generate new clones. These latter are grouped together and placed with mcmatch in AB set.
agt − umc
MCbackground = , mc ∈ MC (12)
σmc
mcmatch = argminmc∈MC MCbackground (13)
The used mutation function has also been adapted to take into consideration the proposed
feature vector. The novel mutation function is defined as follows:
During this stage, the ab generated in the previous step are filtred, to keep only which
represents the background according to (14).
Pab − uab
< 2.5 (14)
σab
Pab , uab , σab , are respectively the pixel value, the mean and the variance of the
Gaussian ab.
ab selected are grouped in MCcandidate set. We proposed to select all ab that satisfies
(14) instead of keeping only the best ab, such it has been proposed in basic AIRS. This
proposition has allowed us to manage the multi-modality of the background.
Each element of MCcandidate that has a lower affinity to mcmatch affinity is introduced as a
new memory cell into MC set. If the affinity between mccandidate and mcmatch is less than
AT × AT S, mcmatch will be removed from the MC set. This step has also benefited from a
small contribution in the updating process of memory cell number introduced in the global
MC model. Indeed, AIRS as it has been proposed allows to introduce only one memory
cell for each iteration. This new mechanism gives the system a high opportunity to produce
more representative model. Figure 3 illustrates AIRS-SG flowchart.
The proposed approach produces binary images containing only black and white pixels,
black pixels represent the background and white pixels define the foreground (the moving
object). However, all background subtraction methods create cuts in the detected objects,
isolated pixels and some gaps generated by false detections. In order to overcome these
problems we have applied morphological operations on these images.
Dilation operation allows to cover the gaps inside foreground objects, in contrast to ero-
sion operation which eliminates isolated pixels mainly generated by the presence of dust and
illumination changes, etc. After the morphological operations, we applied a median filter to
correct edges of moving objects.
MC = {mc, mc ≡ Gaussian} Pt
Yes
MCbackground
=Ø
Pt ← 255
No
No
Pt ← 0
<
Yes
No Yes
Affinitycell < AT ×
ATS and mcmatchϵMC
operation. Noted that in this paper we have interested to problems related to moving objects
detection in dynamic environments. In fact, this latter has posed problems for the majority
of background subtraction systems. For this purpose, we have selected from CDnet 2014
dataset only videos of dynamic background category (DB). Table 1 shows some details
about sequences used in the experiments.
Multimedia Tools and Applications
Multimedia Tools and Applications
In addition to CDnet 2014 dataset, the proposed approach is also tested on Wallflower
(WavingTrees) [49], WaterSurface [27], UCSD [45] and Fountain [27] datasets. The set
of parameters used for all the datasets: learning rate (α), mutation rate, Clonal rate,
Hyper mutation rate and Affinity Threshold Scalar (ATS) have been set respectively to
0.01, 0.1, 10, 4, 0.01. This choice was fixed empirically after several tests with respect to
the influence of parameters on results quality.
AIRS algorithm as many other classification tools need to set the parameters empirically
and we did not find any work that can give us a method to get correctly these values. We
noticed that the mutation rate should be kept very low otherwise convergence may be
delayed unnecessarily. The Clonal rate controls the number of generated clones in each
iteration. In our case we used 100 times the mutation rate due to the small population used in
the system. The Hyper mutation rate controls the number of mutated clones. This latter
is set to 4 due to the number of permutations between features (3 possibilities when the
mutation is done 2 by 2 and 1 if all features muted at the same time). The Affinity Threshold
Scalar is set very low to ensure that the dissimilarity between foreground and background
is kept since our feature vector contains only 3 characteristics and the mutation can easily
bring the background closer to the foreground.
This sub-session represents some images of our obtained segmentation on datasets (CDnet
2014 (Dynamic Background), Wallflower (WavingTrees), WaterSurface, UCSD and Foun-
tain) based on ground truth (Tables 2 and 3).
However, this comparison is not sufficient, since the observable result does not ensure a
delicate representation of system performance. Therefore, quantitative tests are required in
this type of system to describe objectively the robustness of the entire system based on a set
of criterions.
The aim of the experimentation is to prove the efficacy of our system in background subtrac-
tion. To evaluate the performance of the proposed system, authors of CDnet 2014 proposed
to use seven measures:
TP
1. Recall (Re) : T P +F N
Table 2 Some images of obtained results for the Dynamic Background category of the dataset (CDnet 2014)
Multimedia Tools and Applications
Test images
Ground truth
Results
Table 3 Some results on some videos of the datasets Wallflower (WavingTrees), WaterSurface, Fountain, UCSD
Test images
Ground truth
Results
Video name Boats Birds Freeway Peds Rain
Test images
Ground truth
Results
Multimedia Tools and Applications
Multimedia Tools and Applications
TN
2. Specificity (Sp) : T N+F P
3. False Positive Rate (FPR) : F PF+T
P
N
4. False Negative Rate (FNR) : T PF+F N
N
(F N+F P )
5. Percentage of Wrong Classifications (PWC) : 100 × (T P +F N+F P +T N)
6. F Measure : 2×P recision×Recall
P recision+Recall
7. Precision (Pre) : T PT+F
P
P
With:
– True positive (TP): the result is positive(255), while the ground truth is also posi-
tive(255)
– False positive (FP): the result is positive(255), but the ground truth is negative(0)
– True negative (TN): the result is negative(0), while the ground truth is also negative(0)
– False negative (FN): the result is negative(0), but the ground truth is positive(255)
Furthermore these parameters, we have calculated the complexity (Table 4) and the accuracy
(Table 5) to well evaluate our proposition.
– Accuracy = Number of pixels well detected /Total number of pixels
The complexity of AIRS-SG is: O(n, m)
– T (n × m) = |AG| × (5 × |MCcandidate | + |AB| + 14 × N umclones + 8) + n × m + 1
Where: |AG| = n × m
To properly evaluate the proposed system, the results were compared with hand-
segmented images called ground truth. These are provided with videos of CDnet 2014
authors (Tables 6, 7, 8, 9, 10 and 11).
6.2.3 Discussions
In this section we will discuss the obtained results on the dynamic background category of
the CDnet 2014 dataset.
Our system has achieved good results in Canoe and Boats videos, and acceptable results
in Fountain01, Fountain02 and Fall videos. However, our system has failed in Overpass
video due to the similarity in colors between the foreground and background which lead the
system to make errors and getting inferior rates compared with other state of the art methods.
This failing in our system is related to the use of a single component H of HSV color space.
Noted that our objective is to detect moving objects. Despite the low rate of some evalu-
ation criteria, our system has succeeded in its task (Detection even partial with marginal
false detection) and the qualitative results show the effectiveness of the proposed system to
detect moving objects in dynamic scene. The partial detection of moving objects influences
on performance measurements based on total quantity of pixels detected in relation to the
ground truth.
Deep learning methods exceeded our system due to the use of in-depth learning without
features. Even with the success of these latter, they require a lot of time, several learning data
and a super computer to achieve such results. In addition, the application of deep learning
methods in a real environment present several problems. Indeed, the nature of the envi-
ronment can include changes in the structure itself (growing tree, changing the color of a
wall, new building, etc.) which forces the system to go through a re-initialization step which
requires a system shutdown during a long learning period (several days/months). This type
Multimedia Tools and Applications
Algorithm Complexity
FgSegNet v2 [32] O(l × hN )with h hidden neurons, N inputs, and l hidden layers
Cascade CNN [55] O(l × hN )with h hidden neurons, N inputs, and l hidden layers
IUTIS-5 [4] O(n2 )
DeepBS [3] O(l × hN )with h hidden neurons, N inputs, and l hidden layers
WisenetMD [26] log2 (RRegions )
SuBSENSE [47] O(n × m)
CwisarDRP [9] O(l × hN )with h hidden neurons, N inputs, and l hidden layers
M 4 CD V2.0 [53] O(n × m)
SWCD [20] O(n × m)
MBS [43] O(n × m)
GMM [48] O(n × m)
BMN-BSN [41] O(l × hN )with h hidden neurons, N inputs, and l hidden layers
KDE [12] O(n × m)
AMBER [52] O(n × m)
FCFNE [63] logN (n × m)
LCBC [16] O(n × m)
AAPSA [42] O(rneurals × n)
CL-VID [33] O(M × N × D) with M is the dataset size, N is the number of neurons and D is
the input dimension
CP3-Online [30] O(n × m × kclusters )
SSOBS [44] O(rneurons × n)
AIRS-SG O(n × m) with n is the width of the video frame and m is the height of the video
frame
of problem is shared with all methods that use offline learning. In the same context, most of
the methods that have published their results in CDnet 2014 dataset use supervised learning
and their systems have learned videos very well, which justifies the high recognition rate.
AIRS-SG has a linear complexity depending on the number of pixels in a video frame.
Our system has a better complexity than deep learning methods and we are at the same level
of complexity with the majority of methods cited in the state of the art except WisenetMD
Table 5 Accuracy, Re, Sp, FPR, FNR, PWC, precision, F-measure of CDnet 2014 dynamic background
category
Table 6 Comparison of quantitative results with well-known background subtraction methods on Fontain01
video
Table 7 Comparison of quantitative results with well-known background subtraction methods on Fontain02
video
Table 8 Comparison of quantitative results with well-known background subtraction methods on Canoe
video
Multimedia Tools and Applications
Table 9 Comparison of quantitative results with well-known background subtraction methods on Boats video
Table 10 Comparison of quantitative results with well-known background subtraction methods on Overpass
video
Table 11 Comparison of quantitative results with well-known background subtraction methods on Fall video
Multimedia Tools and Applications
and FCFNE techniques that have better complexity (logarithmic), since they used a block
processing method rather than a pixel processing method.
In this paper we have presented a new approach for background subtraction in dynamic
scenes. We have proposed a combination between SG and AIRS in order to overcome the
background variation problems which are a deficiency in the most of the subtraction sys-
tems proposed in the state of the art. The idea is to exploit the AIRS recognition mechanism
which is cable to generate a wide variety of solutions similar to the pixels in the current
processing and as a consequence generate several valid and robust models by using only
one frame. Indeed, the other methods require several samples to achieve the same pur-
pose. Furthermore, AIRS allows rapid adaptation to the environment through an online
update.
On the other hand, the SG approach is very efficient for detecting pixel changes but suf-
fers from problems related to dynamic background. Through the Memory cell identification
and ARB generation phase our method predicts and generates new background models.
These models are filtered with competition for resources and development of a candidate
memory cells process before being selected as new background models.
Evaluations of the proposed method on 6 videos in dynamic background category of
CDnet 2014 dataset prove the capacity and precession of our system (AIRS-SG) in the wide
variety of pixels. The obtained results are very acceptable in relation to the state of the art,
so that our system does not require any human intervention after its launch, thus overcom-
ing the drawbacks of deep learning approaches in the case of a change in environmental
infrastructure.
In future work, we propose to increase the number of features used to have additional
discriminative power since we have tested only one feature represented by H value of
HSV color space. Also we will extend our bio-inspired approach and methodology to other
multimedia applications, such as image processing.
References
14. Friedman N, Russell S (1997) In: Proceedings of the Thirteenth conference on Uncertainty in artificial
intelligence. Morgan Kaufmann Publishers Inc., pp 175–181
15. Haq AU, Gondal I, Murshed M (2010) In: 2010 IEEE Symposium on Computers and Communications
(ISCC). IEEE, pp 529–534
16. He W, Yong K, Kim W, Ko HL, Wu J, Li W, Tu B (2019) IEEE Access 7:92329
17. Hongwei X, Qian C, Weixian Q (2016) Laser Optoelectron Progress 4(1):3
18. Hou N, He F, Zhou Y, Chen Y (2019) Front Comput Sci 19:1
19. Ishibuchi H, Tsukamoto N, Nojima Y (2008) In: 2008 IEEE Congress on Evolutionary Computation
(IEEE World Congress on Computational Intelligence). IEEE, pp 2419–2426
20. Işık Ş, Özkan K, Günal S, Gerek ÖN (2018) vol 27
21. Jiang S, Lu X (2017) IEEE Transactions on Circuits and Systems for Video Technology
22. Jianzhao C, Victor OC, Gilbert OM, Changtao W (2017) In: 2017 9th International Conference on
Modelling, Identification and Control (ICMIC). IEEE, pp 133–138
23. Khan SA, Ishtiaq M, Nazir M, Shaheen M (2018) J Comput Sci 28:94
24. Krungkaew R, Kusakunniran W (2016) In: 2016 13th International Conference on Electrical Engineer-
ing/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON). IEEE,
pp 1–6
25. Laugraud B, Piérard S, Van droogenbroeck M (2018) J Imaging 4(7):86
26. Lee SH, Kwon SC, Shim JW, Lim JE, Yoo J (2018) arXiv:1805.09277
27. Li L, Huang W, Gu IYH, Tian Q (2004) IEEE Trans Image Process 13(11):1459
28. Li H, He F, Liang Y, Quan Q (2019) Soft Comput 24(9):1–20
29. Li H, He F, Yan XH (2019) Appl Math-A J Chin Universities 34(1):1
30. Liang D, Hashimoto M, Iwata K, Zhao X et al (2015) Pattern Recogn 48(4):1374
31. Lim LA, Keles HY (2018) arXiv:1801.02225
32. Lim LA, Keles HY (2018) arXiv:1808.01477
33. López-Rubio E, Molina-Cabello MA, Luque-Baena RM, Domı́nguez E (2018) vol 28
34. Lu X (2014) In: 2014 IEEE International Conference on Image Processing (ICIP). IEEE, pp 3268–3271
35. Ma C, Liu D, Peng X, Li L, Wu F (2019) J Vis Commun Image Represent 60:426
36. Maddalena L, Petrosino A (2012) In: 2012 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Workshops. IEEE, pp 21–26
37. Maddalena L, Petrosino A (2010) Neural Comput Appl 19(2):179
38. Martins I, Carvalho P, Corte-Real L, Alba-Castro JL (2017) In: Iberian Conference on Pattern
Recognition and Image Analysis. Springer, pp 50–57
39. Men Y, Zheng J, Meng L (2016) Video Eng 24(4):5
40. Miron A, Badii A (2015) In: 2015 International Conference on Systems, Signals and Image Processing
(IWSSIP). IEEE, pp 273–276
41. Mondéjar-Guerra V, Rouco J, Novo J, Ortega M (2019) In: British machine vision conference (BMVC),
Cardiff
42. Ramı́rez-alonso G, Chacón-murguı́a MI (2016) Neurocomputing 175:990
43. Sajid H, Cheung SCS (2017) IEEE Trans Image Process 26(7):3249
44. Sehairi K, Chouireb F, Meunier J (2017) J Electron Imaging 26(2):023025
45. Srivastava S, Ng KK, Delp EJ (2011) In: 2011 8th IEEE International Conference on Advanced Video
and Signal Based Surveillance (AVSS). IEEE, pp 60–65
46. St-Charles PL, Bilodeau GA, Bergevin R (2015) In: 2015 IEEE Winter Conference on Applications of
Computer Vision (WACV). IEEE, pp 990–997
47. St-Charles PL, Bilodeau GA, Bergevin R (2015) IEEE Trans Image Process 24(1):359
48. Stauffer C, Grimson WEL (1999) In: Proceedings. 1999 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (Cat. No PR00149), vol 2. IEEE, pp 246–252
49. Toyama K, Krumm J, Brumitt B, Meyers B (1999) In: Proceedings of the Seventh IEEE International
Conference on Computer Vision, vol 1. IEEE, pp 255–261
50. Wang HY, Ma KK (2003) In: 2003. ICIP 2003. Proceedings. 2003 International Conference on Image
Processing, vol 1. IEEE, pp I–153
51. Wang R, Bunyak F, Seetharaman G, Palaniappan K (2014) In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops, pp 414–418
52. Wang B, Dudek P (2014) In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition Workshops, pp 395–398
53. Wang K, Gou C, Wang FY (2018) IEEE Access 6:15505
54. Wang Y, Jodoin PM, Porikli F, Konrad J, Benezeth Y, Ishwar P (2014) In: Proceedings of the IEEE
conference on computer vision and pattern recognition workshops, pp 387–394
55. Wang Y, Luo Z, Jodoin PM (2017) Pattern Recogn Lett 96:66
Multimedia Tools and Applications
56. Watkins A (2001) Airs: a resource limited artificial immune classifier. Ph.D. thesis, Mississippi State
University Mississippi
57. Watkins A, Timmis J, Boggess L (2004) Genet Program Evolvable Mach 5(3):291
58. Wren CR, Azarbayejani A, Darrell T, Pentland AP (1997) IEEE Trans Pattern Anal Mach Intell
19(7):780
59. Yang XS (2010) In: Nature inspired cooperative strategies for optimization (NICSO 2010). Springer,
pp 65–74
60. Yang XS (2010) arXiv:1003.1409
61. Yong JS, He F, Li H, Zhou WQ (2019) Appl Math-A J Chin Univ 34(4):480
62. Yu H, He F, Pan Y (2019) Multimed Tools Appl 36(2):1–23
63. Yu T, Yang J, Lu W (2019) IEEE Access 7:14671
64. Zhang S, He F, Ren W, Yao J (2018) Vis Comput 79(9):1–12
65. Zhao M, Bu J, Chen C (2002) In: Multimedia systems and Applications V, vol 4861. International Society
for Optics and Photonics, pp 325–333
66. Zivkovic Z (2004) In: 2004. ICPR 2004. Proceedings of the 17th International Conference on Pattern
Recognition, vol 2. IEEE, pp 28–31
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.