Enabling AI Application in Data Science
Enabling AI Application in Data Science
Aboul-Ella Hassanien
Mohamed Hamed N. Taha
Nour Eldeen M. Khalifa Editors
Enabling AI
Applications
in Data
Science
Studies in Computational Intelligence
Volume 911
Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new develop-
ments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design
methods of computational intelligence, as embedded in the fields of engineering,
computer science, physics and life sciences, as well as the methodologies behind
them. The series contains monographs, lecture notes and edited volumes in
computational intelligence spanning the areas of neural networks, connectionist
systems, genetic algorithms, evolutionary computation, artificial intelligence,
cellular automata, self-organizing systems, soft computing, fuzzy systems, and
hybrid intelligent systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.
The books of this series are submitted to indexing to Web of Science,
EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink.
Enabling AI Applications
in Data Science
123
Editors
Aboul-Ella Hassanien Mohamed Hamed N. Taha
Faculty of Computers Faculty of Computers
and Artificial Intelligence and Artificial Intelligence
Cairo University Cairo University
Giza, Egypt Giza, Egypt
Chair of the scientific Research
Group in Egypt
Cairo University
Giza, Egypt
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Artificial Intelligence and Data Science are the most useful technologies that could
powerfully improve the human life. This will be achieved by merging both sciences
in order to solve real complex problems in various fields. This book provides a
detailed overview on the latest advancements and applications in the field of
Artificial Intelligence and Data Science. AI applications have achieved great
accuracy and performance with the help of the advancements in data processing and
storage. AI applications also gained power through the amount and quality of data
which is the main core of data science. This book is aimed to introduce the state
of the art in research on Artificial Intelligence with the Data Science. We accepted
28 chapters. The accepted chapters covered the following four parts:
• Part I—Artificial Intelligence and Optimization
• Part II—Big Data and Artificial Intelligence Applications
• Part III—IOT within Artificial Intelligence and Data Science
• Part IV—Artificial Intelligence and Security
We thank and acknowledge all persons who were involved in all the stages of
publishing. That includes (authors, reviewers, and publishing team). We profoundly
revalue their engagement and sustenance that was essential for the success of the
“Enabling AI applications in Data Science” edited book. We hope the readers
would equally love the chapters and their contents and appreciate the efforts that
have gone into bringing it to reality.
v
Contents
vii
viii Contents
1 Introduction
where ξ is a random variable associated with probability space (P, ), the function
f (w) := Eξ ∈ [ f (w; ξ )] is smooth and h(w) := Eξ ∈ [h(w; ξ )] convex and nons-
mooth. Most learning models in population can be formulated over the structure
of (1) using a proper decomposition dictated by the nature of prediction loss and
regularization.
Let the component f (·; ξ ) ≡ (·; ξ ) be a convex smooth loss function, such as
quadratic loss, and h ≡ r the “simple” convex regularization, such as w1 , then
the resulted model:
© The Editor(s) (if applicable) and The Author(s), under exclusive license 3
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_1
4 A. Pătraşcu et al.
has been considered in several previous works [15, 23, 30, 34], which analyzed
iteration complexity of stochastic proximal gradient algorithms. Here, the proximal
map of r is typically assumed as being computable in closed-form or linear time, as
appears in 1 /2 Support Vector Machines (SVMs). In order to be able to approach
more complicated regularizers, expressed as sum of simple convex terms, as required
by machine learning models such as: group lasso [11, 35, 43], CUR-like factorization
[42], graph trend filtering [33, 38], dictionary learning [7], parametric sparse repre-
sentation [36], one has to be able to handle stochastic optimization problems with
stochastic nonsmoothregularizations r (w) = E[r (w; ξ )]. For instance, the grouped
lasso regularization mj=1 D j w2 might be expressed as expectation by consider-
ing r (w; ξ ) = Dξ w2 . In this chapter we analyze extensions of stochastic proximal
gradient for this type of models.
Nonsmooth (convex) prediction losses (e.g. hinge loss, absolute value loss,
–insensitive loss) are also coverable by (1) through taking h(·; ξ ) = (·; ξ ). We
will use this approach with f (w) = λ2 w22 for solving hinge-loss-2 -SVM model.
Contributions.(i) We derive novel convergence rates of SPG with minibatches for
stochastic composite convex optimization, under strong convexity assumption.
(ii) Besides the sublinear rates, we provide computational complexity analysis,
which takes into account the complexity of each minibatch iteration, for stochas-
tic proximal gradient algorithm with minibatches. We obtained O N1 complexity
which highlights optimal dependency on the minibatch size N and accuracy .
(iii) We confirm empirically our theoretical findings through tests over 2 –SVMs
(with hinge loss) on real data, and parametric sparse representation models on random
data.
As extension of our analysis, scaling the complexity per iteration with the number
machines/processors, would guarantee direct improvements in the derived computa-
tional complexity. Although the superiority of distributed variants of SGD schemes
for smooth optimization are clear (see [23]), our results set up the theoretical founda-
tions for development of distributed proximal gradient algorithms in fully stochastic
environments. Further, we briefly recall the milestone results from stochastic opti-
mization literature with focus on the complexity of stochastic first-order methods.
to the behaviour of the stochastic gradient descent (SGD) with minibatches, see [9,
15–19, 22, 32, 32]. On short, SGD iteration computes the average of gradients on a
small number of samples and takes a step in the negative direction. Although more
samples in the minibatch imply smaller variance in the direction and, for moderate
minibatches, brings a significant acceleration, recent evidence shows that by increas-
ing minibatch size over certain threshold the acceleration vanishes or deteriorates
the training performance [10, 12].
Since the analysis of SGD naturally requires various smoothness conditions, proper
modifications are necessary to attack nonsmooth models. The stochastic proximal
point (SPP) algorithm has been recently analyzed using various differentiability
assumptions, see [1, 4, 13, 25, 27, 31, 37, 41], and has shown surprising analytical
and empirical performances. The works of[39, 40] analyzed minibatch SPP schemes
with variable stepsizes and obtained k1N convergence rates under proper assump-
tions. For strongly convex problems, notice that they require multiple assumptions
that we avoid using in our analysis: strong convexity on each stochastic component,
knowledge of strong convexity constant and Lipschitz continuity on the objective
function. Our analysis is based on strong convexity of the smooth component f and
only convexity on the nonsmooth component h.
A common generalization of SGD and SPP are the stochastic splitting methods. Split-
ting first-order schemes received significant attention due to their natural insight and
simplicity in contexts where a sum of two components are minimized (see [3, 20]).
Only recently the full stochastic composite models with stochastic regularizers have
been properly tackled [33], where almost sure asymptotic convergence is established
for a stochastic splitting scheme, where each iteration represents a proximal gradient
update using stochastic samples of f and h. The stochastic splitting schemes are also
related to the model-based methods developed in [6].
σf
f (w) ≥ f (v) + ∇ f (v), w − v + w − v2 ∀w, v ∈ Rn . (2)
2
(ii) There exists subgradient mapping gh : Rn × → Rn such that gh (w; ξ ) ∈
∂h(w; ξ ) and E[gh (w; ξ )] ∈ ∂h(w).
(iii)
F(·; ∗ξ ) has2 bounded gradients on the optimal set: there exists S F∗ ≥ 0 such that
∗
E g F (w ; ξ ) ≤ S F < ∞ for all w∗ ∈ W ∗ ;
(iv) For any g F (w∗ ) ∈ ∂ F(w∗ ) there exists bounded subgradients g F (w∗ ; ξ ) ∈
∂ F(w∗ ; ξ ), such that E[g F (w∗ ; ξ )] = g F (w∗ ) and E[g F (w∗ ; ξ )2 ] < S F∗ . More-
over, for simplicity we assume throughout the paper g F (w∗ ) = E[g F (w∗ ; ξ )] = 0
Condition (i) of the above assumption is typical for composite (stochastic) optimiza-
tion [3, 20, 27]. Condition (ii) guarantees the existence of a subgradient mapping for
functions h(·; ξ ). Also condition (iii) of Assumption 1 is standard in the literature
related to stochastic algorithms.
Discrete case. Let us consider finite discrete domains = {1, . . . , m}. Then[28,
Theorem 23.8] guarantees that the finite sum objective of (1) satisfy (3) if ξ ∈
ri(dom(h(·; ξ ))) = ∅. The ri(dom(·)) can be relaxed to dom(·) for polyhedral com-
mlet X 1 , . . . , X m be finitely many closed convex satisfying
ponents. In particular, m qual-
ification condition: i=1 ri(X i ) = ∅, then also (3) holds, i.e. N X (x) = i=1 N X i (x)
(see (by [28, Corollary 23.8.1])). Again, ri(X i ) can be relaxed to the set itself for
polyhedral sets. As pointed by [2], the (bounded) linear regularity property of {X i }i=1 m
Given some smoothing parameter μ > 0 and I ⊂ [m], we define the prox operator:
1 1
proxh,μk (w; I ) = arg minn h(z; i) + z − w2
z∈R |I | i∈I 2μ
In particular, when h(w; ξ ) = I X ξ (w) the prox operator becomes the projection oper-
ator proxh,μ (w; ξ ) = π X ξ (w). Further we denote [m] = {1, . . . , m}. Given the con-
stant μ0 ≥ 0 then a useful inequality for the sequel is:
T
μ0 γ T 1−γ
≤ μ0 1 + . (4)
i=1
i 1−γ
In the following section we present the Stochastic Proximal Gradient with Mini-
batches (SPG-M) and analyze the complexity of a single iteration under Assumption
1. Let w0 ∈ Rn be a starting point and {μk }k≥0 be a nonincreasing positive sequence
of stepsizes.
Stochastic Proximal Gradient with Minibatches (SPG-M):
For k ≥ 0 compute:
1. Choose randomly i.i.d. N –tuple I k ⊂ w.r.t. probability distribution P
2. Update:
μk
vk = wk − ∇ f (wk ; i)
N
i∈I k
1 1
wk+1 = arg minn h(z; i) + z − vk 2 .
z∈R N 2μ
k
i∈I
On the other hand, for nonsmooth objective functions, when f = 0, SPG-M is equiv-
alent with a minibatch variant of SPP analyzed [1, 27, 37, 39, 40]:
Tv (N ) = O(N ).
1 1
wk+1 = arg minn h(z; ξi ) + z − vk 2 . (5)
z∈R N i∈I 2μ
Even for small N > 0 the solution of the above problem do not have a closed-form
and certain auxiliary iterative algorithm must be used to obtain an approximation of
the optimal solution. For the above primal form, the stochastic variance-reduction
schemes are typically called to approach this finite-sum minimization, when h obey
certain smoothness assumptions. However, up to our knowledge, applicability of
variance-reduction methods leave out our general convex nonsmooth regularizers.
SGD attains an δ–suboptimal point z̃ − proxμ (w; I )2 ≤ δ at a sublinear rate, in
O μδ iterations. This sample complexity is independent of N but to obtain high
accuracy a large number of iterations have to be performed.
Stochastic SPG with Minibatches 9
1
N
1
minn h(z; ξi ) + z − w2
z∈R N i=1 2μ
1
N
1
= minn maxvi , z − h ∗ (vi ; ξi ) + z − w2
z∈R N i=1 vi 2μ
1
N N
1
= max minn vi , z − h ∗ (vi ; ξi ) + z − w2
v z∈R N i=1 i=1
2μ
μ 1 1 ∗
N N N
= max − 2 vj +2
vi , w − h (v j ; ξ j ). (6)
v∈R N n 2N j=1 N j=1 N j=1
Note that in the interesting particular scenarios when regularizer h(·; ξ ) results from
the composition of a convex function with a linear operator h(w; ξ ) = l(aξT w), the
dual variable reduces from N n to N dimensions. In this case (6) reduces to
μ
N N N
max − aξ j v j +
2
aξi vi , w − l ∗ (v j ). (7)
v∈R N 2N j=1 j=1 j=1
N
1 N
∗
z̃(w) − z(w) = μ ṽi − vi
N
i=1 i=1
μ
N
≤ ṽi − v∗ ≤ √μ v∗ − ṽ ≤ √
μδ
. (8)
i
N i=1 N N
μ N
Notice that the hessian of the smooth component 2N j=1 v j 2 is upper bounded by
∗
O(μ). Without any growth properties on h (·; ξ ), one would be ableto solve (6),
using
μRd2
Dual Fast Gradient schemes with O(N n) iteration cost, in O max N n, N n δ
sample evaluations to get a δ accurate dual solution [21]. This implies, by (8), that
there are necessary,
10 A. Pătraşcu et al.
μRd
Twin (N ; δ) = O max N n, N 3/4 n 1/2
δ
We will use the following elementary relations: for any a, b ∈ Rn and β > 0 we have
1 β
a, b ≤ a2 + b2 (9)
2β 2
1
a + b2 ≤ 1 + a2 + (1 + β)b2 . (10)
β
The main recurrences which will finally generate our sublinear convergence rates
are presented below.
Theorem 1 Let Assumptions 1 hold and μk ≤ 4L1 f . Assume wk+1 − proxh,μ
(vk ; I k ) ≤ δk , for all k ≥ 0, then the sequence {wk }k≥0 generated by SPG-M satis-
fies:
σ f μk
E[wk+1 − w∗ 2 ] ≤ 1 − E[wk − w∗ 2 ]
2
∗
2 E g F (w ; ξ )
2
2
+ μk + 3+ δk2 .
N σ f μk
Proof Denote w̄k+1 = proxh,μk (vk ; I k ) and recall that μ1k vk − w̄k+1 ∈ ∂h(w̄k+1 ;
I k ), which implies that there exists a subgradient gh (w̄k+1 ; I k ) such that
1 k+1
gh (w̄k+1 ; I k ) + w̄ − vk = 0. (11)
μk
(9) σ f μk
≤ 1+ wk − w∗ 2 + 2μk ∇ f (wk ; Ik ) + μk gh (w̄k+1 ; Ik ), w∗ − w̄k+1
2
1 2
− w̄k+1 − wk 2 + 3 + δ2 . (12)
2 σ f μk k
12 A. Pătraşcu et al.
By combining (13) with (12) and by taking the expectation with the entire index
history we obtain:
σ f μk
E[wk+1 − w∗ 2 ] ≤ 1 − E wk − w∗ 2
2
∗ 1
+ 2μk E F(w ) − F(w̄ ; Ik ) −
k+1
w̄ k+1
− w . (14)
k 2
8μk
A last further upper bound on the second term in the right hand side: let w∗ ∈ W ∗
Stochastic SPG with Minibatches 13
1
E F(w̄k+1 ; Ik ) − F ∗ + w̄k+1 − wk 2
8μk
1
≥ E g F (w∗ ; Ik ), w̄k+1 − w∗ + w̄k+1 − wk 2
8μk
1
≥ E g F (w∗ ; Ik ), wk − w∗ + g F (w∗ ; Ik ), w̄k+1 − wk + w̄k+1 − wk 2
8μk
1
≥ E g F (w∗ ; I ), wk − w∗ + min g F (w∗ ; I ), z − wk + z − wk 2
z 8μk
∗ ∗
∗
≥ g F (w ), w − w − 2μk E g F (w ; I )
k 2
Assump. 1.(iv) ∗
2 Lemma 1 E g F (w∗ ; I )2
= − 2μk E g F (w ; I ) = −2μk , (15)
N
⎛ ⎧ ! ⎫⎞
⎨ "
max{ 2 , 2/σ f } log(2r02 /) " 72 log2 (2r02 /) ⎬
T ≤ Tcout := O ⎝max 1 + ,1 + # ⎠
⎩ σ 2f N σ 3f N ⎭
4L f 2w0 − w∗ 2 C
T ≤ Tmout := log +O ,
μ0 σ f N
* +, - * +, -
T1 T2
14 A. Pătraşcu et al.
μ0 2
L f σ f + μ20 σ f + μ0 L f
where C = L 2f σ 2f
+ 2
+ μ0 + 1/σ f .
σfμ k 1 − (1 − σ f μ/2)k μ2 2
E[wk − w∗ 2 ] ≤ 1 − r02 + 2
+ 3μ +
2 σ f μ/2 N σf
σfμ k 2μ 2
≤ 1− r02 + 2
+ 3μ + , (16)
2 σf N σf
which imply a linear decrease of initial residual and, at the same time, the linear
convergence of wk towards a optimum neighborhood of radius σ2μ f N
2
+ 3μ + σ2f .
The radius decrease linearly with the minibatch size N . Given integer K > 0, let
μ = 2μK
0
then after
. /
2K 2r02
T = log (17)
σ f μ0
2μ0 6μ0 2
E[w T − w∗ 2 ] ≤ 2
+ + + (18)
Kσf N K σf 2
4 log(2r02 /) 12 log(2r02 /) 2
≤ 2
+ + + .
(T − 1)σ 2f N σ f (T − 1) σf 2
SPG-M iterations.
3/2
μk
Variable stepsize. Now let μk = 2μ0
k
, δk = N 1/2
, then Theorem 1 leads to:
2
k
σ f μj 2 k
μi2 2
k
σ f μj
E[wk − w∗ 2 ] ≤ 1− r0 + 2 + 3μ + 2 1− (19)
i
2 N σf 2
j=1 i=1 j=i+1
By using further the same (standard) analysis from [26, 27], we obtain:
r2 2
+ μ0 + 1/σ f
E[wk − w∗ 2 ] ≤ O 0 +O (20)
k Nk
* +, - * +, -
optimization error sample error
Stochastic SPG with Minibatches 15
Mixed stepsize. By combining constant and variable stepsize policies, we aim to get
a better “optimization error” and overall, a better iteration complexity of SPG-M.
Inspired by (17)–(18), we are able to use a constant stepsize policy to bring wk in a
small neighborhood of w∗ whose radius is inversely proportional with N .
μ0
Let μk = 4L f
, using similar arguments as in (17)–(18), we have that after:
4L f 2r02
T1 ≥ log
μ0 σ f
μ0 3μ0 2
E[w T1 − w∗ 2 ] ≤ 2
+ + + . (21)
4L f σ f N 4L f σf 2
E[w T1 − w∗ 2 ] 2
+ μ0 + 1/σ f
E[wk − w∗ 2 ] ≤ O +O
k Nk
μ0 L f σ f + μ0 σ f + μ0 L f
2 2
2
+ μ0 + 1/σ f
≤O + O + O .
L 2f σ 2f k N 2k Nk
(22)
4L f 2r02 C
T1 + T2 = log +O ,
μ0 σ f N
μ0 2
L f σ f + μ20 σ f + μ0 L f
where C = L 2f σ 2f
+ 2
+ μ0 + 1/σ f .
Remark 3 For variable stepsize μk = μk0 , the “optimization rate” O(r02 /k) of (20)
is optimal (for strongly convex stochastic optimization) and it is not affected by the
variation of minibatch size. Intuitively, the stochastic component within optimization
model (1) is not eliminated by increasing N , only a variance reduction is attained.
The authors of [42], under bounded gradients of the objective function, obtained
16 A. Pătraşcu et al.
2
O σ fLN k convergence rate for their Minibatch-Prox Algorithm in the average
sequence, using classical arguments. However, their algorithm is based on knowledge
of σ f , which is used to compute stepsize sequence μk = σ f (k2− 1) . Moreover, in the
first step the algorithm has to compute proxh,+∞ (·; I 0 ) = arg min N1 i∈I h(z; i),
z
which for small σ f might be computationally expensive. Notice that, under knowl-
edge of σ f , we could obtain similar sublinear rate in E[wk − w∗ 2 ] using similar
stepsizes.
Remark 4 We make a few observations about the Tmout . For a small conditioning
L
number σ ff the constant stage performs few iterations and the total complexity is
dominated by O CN . This bound (of the same order as [42]) present some advan-
tages: unknown σ f , evaluation in the last iterate and no uniform bounded gradients
assumptions. On the other hand, for a sufficiently large minibatch N = O(1/) and
a proper constant μ0 , one could perform a constant number of SPG-M iterations. In
this case, the mixed stepsize convergence rate provides a link between population
risk and empirical risk.
In this section we couple the complexity per iteration estimates from Sect. 2.1 and
Sect. 3 and provide upper bounds on the total complexity of SPG-M.
The sample complexity, often used for stochastic algorithms, refers to the entire
number of data samples that are used during all iterations of a given algorithmic
scheme. In our case, given the minibatch size N and the total outer SPG-M iterations
T out , the sample complexity is given by N T out . In the best case N T out is upper
bounded by O(1/). We consider the dependency on minibatch size N and accuracy
of highly importance, thus we will present below simplified upper bounds of our
estimates. In Sect. 2.1, we analyzed the complexity of a single SPG-M iteration
for convex components h(·; ξ ) denoted by Tvin + Twin . Summing the inner effort
Tvin + Twin over the outer number of iterations provided by Theorem 2 leads us to the
total computational complexity of SPG-M. We further derive the total complexity
for SPG-M with mixed stepsize policy and use the same notations as in Theorem 2:
out
Tm
T total
= (Tvin (N ) + Twin (N ; δ))
i=0
Tmout ) 1
μi Rd
≤ O(N n) + O max N n, N 3/4
n 1/2
i=0 δi
Stochastic SPG with Minibatches 17
Tm 3 out
1/4
≤ O(N n) + O N nμi Rd
i=0
(1.4)
≤ O(Tmout N n) + O N n(Tmout )3/4 Rd
3/4
Lf C Lf C
≤O log (1/) + Nn + O Nn log (1/) + Rd .
σf N σf N
Extracting the dominant terms from the right hand side we finally obtain:
3/4
L f Nn Cn C
T total
≤O log (1/) + +O N 1/4
n Rd .
σf
Lf
The first term O σf
log (1/) + C
N
N n is the total cost of the minibatch gra-
k L
dient step v and is highly dependent on the conditioning number σ ff . The second
term is brought by solving the proximal subproblem by Dual Fast Gradient method
depicting a weaker
dependence on N and than the first. Although the complexity
order is O Cn
, comparable with the optimal performance of single-sample stochas-
tic schemes (with N = 1), the above estimate paves the way towards the acceleration
techniques based on distributed stochastic iterations. Reduction of the complexity
per iteration to τ1 (Tvin (N ) + Twin (N ; δ)), using multiple machines/processors, would
guarantee direct improvements in the optimal number of iterations O( Cn τ
). The supe-
riority of distributed variants of SGD schemes for smooth optimization are clear (see
[23]), but our results set up the theoretical foundations for distributing the algorithms
for the class of proximal gradient algorithms.
4 Numerical Simulations
1. A dictionary of all possible words in the entire dataset and how many times each
occurred is constructed.
2. The top 200(= n; the number of features used in our classifier) most-used words
indices from the dictionary are stored.
3. Each email entry i then counts how many of each of the n words are in the i-th
email’s text. Thus, if X i is the numerical entry characteristic to email i, then X i j
contains how many words with index j in the top most used words are in the
email’s text.
The pseudocode for optimization process is shown below:
For k ≥ 0 compute
1. Choose randomly i.i.d. N –tuple I k ⊂
2. Update:
vk = (1 − λμk ) wk
μ
u k = arg max − X̃ Ik u22 + u T (e − X̃ TIk vk )
u∈[0,1] 2N
μk
wk+1 = vk + X I k uk
N
3. If the stopping criterion holds, then STOP, otherwise k = k + 1.
To compute the optimal solution w∗ , we let running the binary SVM state-of-the-art
method for the dataset using SGD hinge-loss [24] for a long time, until we get the
top accuracy of the model (93.2%). Considering this as a performance baseline, we
compare the results of training process efficiency between the SPG-M model versus
SG D with mini-batches. The comparison is made w.r.t. three metrics:
1. Accuracy: how well does the current set of trained weights performs at classifi-
cation between spam versus non-spam.
2. Loss: the hinge-loss result on the entire set of data.
3. Error (or optimality measure): computes how far is the current trained set of
weights (wk at any step k in time). From the optimal ones, i.e. wk − w∗ 2 .
The comparative results between the two methods and each of the metrics defined
above are shown in Figs. 1, 2 and 3. These were obtained by averaging several
executions on the same machine, each with a different starting point. Overall, the
results show the advantage of SPG-M method over SGD: while both methods will
converge to the same optimal results after some time, SPG-M is capable of obtaining
better results all the three metrics in a shorter time, regardless of the batch size being
used. One interesting observation can be seen for the SGD-Const method results,
when the loss metric tends to perform better (2). This is because of a highly tuned
constant learning rate to get the best possible result. However, this is not a robust
way to use in practice.
Stochastic SPG with Minibatches 19
Fig. 1 Comparative results between SPG-M and SGD for the Accuracy metric, using different
batchsizes and functions for chosing the stepsize
Fig. 2 Comparative results between SPG-M and SGD for the Loss metric, using different batchsizes
and functions for chosing the stepsize
Fig. 3 Comparative results between SPG-M and SGD for the Error metric, using different batch-
sizes and functions for chosing the stepsize
20 A. Pătraşcu et al.
where T and x correspond to the dictionary and, respectively, the resulting sparse
representation, with sparsity being imposed on a scaled subspace x with ∈ R p×n .
1
In pursuit of 1, we move to the exact penalty problem min x 2m T x − y22 + λx1 .
In order to limit the solution norm we further regularize the unconstrained objective
using an 2 term as follows:
1 α λ
min T x − y22 + x22 + x1 .
x 2m 2 p
The decomposition which puts the above formulation into model (1) consists of:
1 α
f (x; ξ ) = (Tξ x − yξ )22 + x22 (24)
2 2
where Tξ represents line ξ of matrix T , and
To compute the SPG-M iteration for the sparse representation problem, we note that
λ 1
proxh,μ (x; i) = arg min I z1 + z − x2 .
z N 2μ
μλ2 λ
z k = arg min TI z2 − z T I t
−1≤z≤1 2N 2 N
μλ T
then we can easily compute proxh,μ (x; I ) = x − N
I z. We are ready to formulate
the resulting particular variant of SPG-M.
Stochastic SPG with Minibatches 21
105
103
2
x−x
m = 400 101
10−1
10−3
22 A. Pătraşcu et al.
103
2
x−x
m = 400 101
10−1
10−3
104
2
x−x
102
m = 400
100
10−2
10−4
102
x−x
m = 400 100
10−2
10−4
Stochastic SPG with Minibatches 23
2
x−x
104
m = 400
101
10−2
10−5
μk T μk T μk λ
x k+1 = x k − T Ik T Ik + N α In x k + T Ik y Ik − sgn(i x k )iT ,
N N N k i∈I
⎧
⎪
⎨+1, i x > 0
where sgn(i x) = −1, i x < 0
⎪
⎩
0, i x = 0
3. If the stopping criterion holds, then STOP, otherwise k = k + 1.
When applying SGDM-SR on the same initial data as our experiment with α = 0.7
with the same parametrization and batch sizes we obtain the results depicted in
Fig. 8. Here the non-minibatch version of SGD is clearly less performant than SPG-
M, but what is most interesting is that the minibatch version for all batch sizes
behaves identically and takes 100 iterations to recover and reach the optimum around
iteration 150.
5 Conclusion
Acknowledgements C. Păduraru and P. Irofti are also with The Research Institute of the University
of Bucharest (ICUB) and were supported by a grant of the Romanian Ministry of Research and
Innovation, CCCDI-UEFISCDI, project number 17PCCDI/2018 within PNCDI III.
References
1. Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: convergence, optimal-
ity, and adaptivity. SIAM J. Optim. 29(3), 2257–2290 (2019)
2. Bauschke, H.H., Borwein, J.M., Li, W.: Strong conical hull intersection property, bounded linear
regularity, Jameson’s property (g), and error bounds in convex optimization. Math. Program.
86(1), 135–160 (1999)
3. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse
problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
4. Bianchi, P.: Ergodic convergence of a stochastic proximal point algorithm. SIAM J. Optim.
26(4), 2235–2260 (2016)
5. Dhillon, I.S., Hsieh, C.-J., Si, S.: Communication-efficient distributed block minimization for
nonlinear kernel machines. In: KDD ’17: Proceedings of the 23rd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, pp. 245–254 (2017)
6. Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions.
SIAM J. Optim. 29(1), 207–239 (2019)
7. Dumitrescu, B., Irofti, P.: Dictionary Learning Algorithms and Applications. Springer (2018)
8. Elad, M.: Sparse and Redundant Representations: From Theory to Applications in Signal and
Image Processing. Springer Science & Business Media (2010)
9. Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic methods for data fitting. SIAM
J. Sci. Comput. 34(3), 1380–1405 (2012)
10. Goyal, P., DollÃar, P., Girshick, R., Noordhuis, P., Kyrola, A., Wesolowski, L., Tulloch,
A., Jia, Y., He, K.: Accurate, large minibatch SGD: training ImageNet in 1 hour (2017).
arXiv:1706.02677 [cs.CV]
11. Hallac, D., Leskovec, J., Boyd, S.: Network lasso: clustering and optimization in large graphs.
In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, pp. 387–396 (2015)
12. Hoffer, E., Hubara, I., Soudry, D.: Train longer, generalize better: closing the generalization
gap in large batch training of neural networks (2017). arXiv:1705.08741 [stat.ML]
13. Koshal, J., Nedic, A., Shanbhag, U.V.: Regularized iterative stochastic approximation methods
for stochastic variational inequality problems. IEEE Trans. Autom. Control 58(3), 594–609
(2012)
14. Metsis, V.: Ion Androutsopoulos, and Georgios Paliouras. Spam filtering with Naive Bayes-
which naive bayes? In: CEAS, Mountain View, CA, vol. 17, pp. 28–69 (2006)
15. Moulines, E., Bach, F.R.: Non-asymptotic analysis of stochastic approximation algorithms for
machine learning. In Advances in Neural Information Processing Systems, pp. 451–459 (2011)
16. Nedic, A., Necoara, I.: Random minibatch subgradient algorithms for convex problems with
functional constraints. Appl. Math. Optim. 80, 801–833 (2019)
17. Nedić, A.: Random projection algorithms for convex set intersection problems. In: 49th IEEE
Conference on Decision and Control (CDC), pp. 7655–7660. IEEE (2010)
18. Nedić, A.: Random algorithms for convex minimization problems. Math. Program. 129(2),
225–253 (2011)
19. Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach
to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
20. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1),
125–161 (2013)
Stochastic SPG with Minibatches 25
21. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer
Science & Business Media (2013)
22. Nguyen, L.M., Nguyen, P.H., van Dijk, M., Richtárik, P., Scheinberg, K., Takáč, M.: SGD and
hogwild! convergence without the bounded gradients assumption (2018). arXiv:1802.03801
23. Niu, F., Recht, B.H., RÃ1, C., Wright, S.J.: Hogwild!: a lock-free approach to parallelizing
stochastic gradient descent. In: NIPS’11: Proceedings of the 24th International Conference on
Neural Information Processing Systems, pp. 693–701 (2011)
24. Patil, R.C., Patil, D.R.: Web spam detection using SVM classifier. In: 2015 IEEE 9th Interna-
tional Conference on Intelligent Systems and Control (ISCO), pp. 1–4. IEEE (2015)
25. Patrascu, A.: New nonasymptotic convergence rates of stochastic proximal point algorithm for
stochastic convex optimization. To appear in Optimization (2020)
26. Patrascu, A., Irofti, P.: Stochastic proximal splitting algorithm for composite minimization, pp.
1–16 (2020). arXiv:1912.02039v2
27. Patrascu, A., Necoara, I.: Nonasymptotic convergence of stochastic proximal point methods
for constrained convex optimization. J. Mach. Learn. Res. 18(1), 7204–7245 (2017)
28. Tyrrell Rockafellar, R.: Convex Analysis. Princeton University Press (1970)
29. Rockafellar, R.T., Wets, R.J.B.: On the interchange of subdifferentiation and conditional expec-
tation for convex functionals. Stoch. Int. J. Probab. Stoch. Process. 7(3), 173–182 (1982)
30. Rosasco, L., Villa, S., Vũ, B.C.: Convergence of stochastic proximal gradient algorithm. Appl.
Math. Optim., 1–27 (2019)
31. Ryu, E.K., Boyd, S.: Stochastic proximal iteration: a non-asymptotic improvement upon
stochastic gradient descent. Author website, early draft (2016)
32. Zhang, H., Ghadimi, S., Lan, G.: Mini-batch stochastic approximation methods for nonconvex
stochastic composite optimization. Math. Program. 155, 267–305 (2016)
33. Salim, A., Bianchi, P., Hachem, W.: Snake: a stochastic proximal gradient algorithm for regu-
larized problems over large graphs. IEEE Trans. Autom. Control 64(5), 1832–1847 (2019)
34. Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient
solver for SVM. Math. Program. 127(1), 3–30 (2011)
35. Shi, W., Ling, Q., Gang, W., Yin, W.: A proximal gradient algorithm for decentralized composite
optimization. IEEE Trans. Signal Process. 63(22), 6013–6023 (2015)
36. Stoican, F., Irofti, P.: Aiding dictionary learning through multi-parametric sparse representation.
Algorithms 12(7), 131 (2019)
37. Toulis, P., Tran, D., Airoldi, E.: Towards stability and optimality in stochastic gradient descent.
In: Artif. Intell. Stat., 1290–1298 (2016)
38. Varma, R., Lee, H., Kovacevic, J., Chi, Y.: Vector-valued graph trend filtering with non-convex
penalties. IEEE Trans. Signal Inform. Process. Over Netw. (2019)
39. Wang, J., Srebro, N.: Stochastic nonconvex optimization with large minibatches. In: In Inter-
national Conference On Learning Theory (COLT), pp. (98):1–26 (2019)
40. Wang, J., Wang, W., Srebro, N.: Memory and communication efficient distributed stochastic
optimization with minibatch prox. In: In International Conference On Learning Theory (COLT),
pp. 65:1–37 (2017)
41. Wang, M., Bertsekas, D.P.: Stochastic first-order methods with random constraint projection.
SIAM J. Optim. 26(1), 681–717 (2016)
42. Wang, X., Wang, S., Zhang, H.: Inexact proximal stochastic gradient method for convex com-
posite optimization. Comput. Optim. Appl. 68(3), 579–618 (2017)
43. Zhong, W., Kwok, J.: Accelerated stochastic gradient method for composite regularization. In:
Artificial Intelligence and Statistics, pp. 1086–1094 (2014)
The Human Mental Search Algorithm
for Solving Optimisation Problems
1 Introduction
Optimisation plays a crucial role in the performance of most data science algo-
rithms so that in many cases there is a direct relationship between the efficacy of the
optimisation algorithm and the efficacy of the data science algorithm. For example,
in a multi-layer neural network, optimal network weights are sought during train-
ing, while optimisation is also important for finding suitable parameters in regression
problems. Optimisation is thus a necessity to solve data-related problems effectively.
In a data science algorithm, in particular, a machine learning algorithm, represen-
tation, optimisation, and generalisation are often considered independently [29]. In
other words, when studying representation or generalisation, we often do not con-
sider whether optimisation algorithms can be effectively applied or not. On the other
hand, when considering optimisation algorithms, we often do not explicitly discuss
generalisation errors (sometimes representation error is assumed zero) [29].
Machine learning algorithms traditionally use conventional algorithms such as
gradient descent for optimisation. These however have several drawbacks such as
local optima stagnation and sensitivity to initialisation, leading to unsatisfactory
ability in finding the global optimum.
Stochastic algorithms such as metaheuristics represent a problem-independent
alternative to address the limitations of conventional algorithms. Metaheuristic algo-
rithms benefit from stochastic operators applied in an iterative manner and have
S. J. Mousavirad (B)
Faculty of Engineering, Sabzevar University of New Technology, Sabzevar, Iran
G. Schaefer
Department of Computer Science, Loughborough University, Loughborough, UK
H. Ebrahimpour-Komleh
Department of Electrical and Computer Engineering, University of Kashan, Kashan, Iran
© The Editor(s) (if applicable) and The Author(s), under exclusive license 27
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_2
28 S. J. Mousavirad et al.
become popular since they are more robust and do not require the gradient of the
cost function.
Generally speaking, metaheuristic algorithms can be divided into two categories:
single-solution-based and population-based algorithms. Single-solution-based algo-
rithms such as simulated annealing [3], variable neighbourhood search [18], or iter-
ated local search [27], start with a random candidate solution and try to improve it
during some iterations. In contrast, population-based algorithms begin with a set of
random candidate solutions which interact with each other to share information. Con-
sequently, population-based algorithms exhibit a higher likelihood of escaping from
local optima compared to single-solution-based approaches. Population-based meta-
heuristics include particle swarm optimisation (PSO) [13], harmony search (HS) [9],
artificial bee colony algorithm (ABC) [12], and biogeography-based optimisation
(BBO) [26].
Population-based metaheuristic algorithms have been employed in various data
science domains. For example, metaheuristics have shown superior ability in finding
neural network weights [1, 20, 21] or architecture [4, 11], while feature selection is
another domain, with the aim to find the optimal features for decision making [10,
19]. Other applications include clustering [5, 22], regression [30], and deep belief
networks [7, 15].
In this chapter, we present the Human Mental Search (HMS) algorithm as a
new class of population-based metaheuristic algorithms. We explain the idea behind
HMS and how the algorithm can successfully tackle optimisation problems. HMS
is inspired by exploration strategies in the bid space of online auctions, and com-
prises three main operators, mental search, grouping, and movement. Mental search
explores the vicinity of candidate solutions based on a Levy flight distribution,
whereas grouping partitions the current population using a clustering algorithm to
finding a promising area in search space. During movement, candidate solutions
move towards the promising area. We conduct an extensive set of experiments on
both normal and large-scale problems and compare HMS to other state-of-the-art
algorithms. The obtained results show HMS to yield very competitive performance
to tackle optimisation problems.
The remainder of the chapter is organised as follows. Section 2 explains the inspi-
ration, while Sect. 3 introduces the HMS algorithm in detail. Section 4 evaluates the
HMS algorithm on both normal and large-scale problems. Finally, Sect. 5 concludes
the chapter.
2 Inspiration
Radicchi et al. [23, 24] showed that humans employ a Levy flight strategy to explore
the bid space of online auctions. In particular, they studied the behaviour of attendants
in a new type of auction called Lowest Unique Bid (LUB). Some of their observations
were:
The Human Mental Search Algorithm … 29
3.1 Initialisation
Mental search explores the vicinity of each bid based on a Levy flight mechanism.
Levy flight is a type of random walk with its step size determined by a Levy distri-
bution so that there are many small steps and sometimes a long step. In comparison
to Brownian motion, Levy flight is more effective since its long and small steps lead
to an increase in both exploration and exploitation.
A new bid in mental search is created as
N S = bid + S, (1)
with S is calculated as
where Max I ter is the maximum number of iterations, iter is the current itera-
tion, α is a random number, and ⊕ denotes entry-wise multiplication. The (2 −
iter (2/ max iter )) factor is an ascending factor, starting from 2 and ending in 0 to
allow high exploration at the beginning and high exploitation capability towards the
end.
The Human Mental Search Algorithm … 31
with 1/β
(1 + β) sin( πβ )
σu = 2
, σv = 1, (5)
[( 1 +2 β )]β2(β−1)/2
3.3 Grouping
The grouping operator is used to cluster the current population. For this, clustering,
an unsupervised machine learning technique in which patterns close to each other
organised into groups, and in particular, k-means is employed. After clustering,
the mean objective function value for each cluster is calculated, and the cluster
with the best objective function value selected as the winner cluster to represent
a promising area in search space. Figure 3 illustrates the grouping operator for 12
candidate solutions. This mechanism is different from other algorithms that use the
best candidate solutions to find a promising area.
Fig. 2 The mental search operator generates several new bids in the neighbourhood of a current
bid
32 S. J. Mousavirad et al.
Fig. 3 The grouping operator partitions the bids based on a clustering algorithm. Each red point
indicates a candidate solution. As can be seen, the candidate solutions are clustered into 3 groups
so that each cluster includes candidate solutions close to each other
3.4 Movement
The movement strategy employs the best bid in the winner cluster as the destination
bid. Then, other candidate solutions move toward the promising area as
where bidnt+1 is the n-th bid element at iteration t + 1, winnernt is the n-th element
of the best bid in the winner cluster, t is the current iteration, C is a constant number,
and r is a random number between 0 and 1 taken from the normal distribution.
4 Experimental Results
respectively. The parameters used for the other algorithms are given in Table 1. Each
algorithm is run 50 times for each problem and we report the mean and standard
deviation of the obtained results.
Unimodal functions have only one global optimum without any local optima and
are thus useful to consider the exploitation capability of algorithms. The employed
benchmark functions are shown in Table 2, while Fig. 4 depicts their 2-D search
spaces.
The results for all algorithms are given in Table 3, which also gives a ranking of
the algorithms for each test function. As can be seen, HMS is top ranked for 5 of the
7 functions, indicating its power in solving unimodal functions. Also, from the last
row of the table, HMS yielded the first overall rank to tackle unimodal functions,
while the second and third ranks went to GWO and SFLA, respectively.
34 S. J. Mousavirad et al.
Table 2 Unimodal benchmark functions. D is the number of dimensions, range defines is the
boundary of the search space, and f min is the optimum value
Function D Range f min
n 2
F1 x 30 [–100,100] 0
i=1
n
i
n
F2 i=1 |x i | + i=1 |x i | 30 [–10,10] 0
n i
F3 i=1 ( (x
j−1 j
2 )) 30 [–100,100] 0
F4 maxi (|xi |), 1 ≤ i ≤ n 30 [–100,100] 0
n−1
F5 [100(xi+1 − xi2 )2 + (xi − 1)2 ] 30 [–30,30] 0
i=1
n
F6 ([xi + 0.5])2 30 [–100,100] 0
i=1
n
i=1 i x i + random[0, 1]
F7 4 30 [–1.28,+1.280] 0
F1 F2 F3
F4 F5 F6
F7
Rank 7 5 3 9 4 6 8 2 10 1
F4 Avg. 14.0870 21.7955 0.0291 60.5905 8.4321 1.5562 7.1223 7.6278E–07 43.9332 0
Stddev. 3.2558 2.3199 0.0214 3.3712 3.2616 0.1586 2.5367 5.0380E–08 29.5684 0
Rank 7 8 3 10 6 4 5 2 9 1
F5 Avg. 1.9504E04 2.4305E04 25.8845 3.3098E07 5.8725 245.1556 84.0091 27.1047 27.9634 28.2242
Stddev. 2.4883E04 1.2178E04 18.5675 9.0472E06 2.4985 252.8022 32.6139 0.7762 0.4133 0.2476
Rank 8 9 2 10 1 7 6 3 4 5
F6 Avg. 368.2913 253.8506 7.6109E–19 2.6756E04 7.8532 2.4665 8.0577 0.7865 0.4143 1.4690
Stddev. 140.6384 87.7723 1.4446E–18 4.8617E03 2.6143 0.5996 0.1428 0.3173 0.1744 1.1588
Rank 8 8 1 10 6 5 7 3 2 4
F7 Avg. 0.1284 0.2177 0.0097 19.3983 1.1611 0.0141 0.1232 0.0020 0.0033 1.6218E–05
Stddev. 0.0577 0.0567 0.0054 4.4983 1.1418 0.0045 0.0461 9.7713E–04 0.0034 1.7430E–05
Rank 7 8 4 10 9 5 6 2 3 1
Average rank 8 7.71 3 9.86 5.14 5.71 6.43 2.57 4.57 2
Overall rank 9 8 3 10 5 6 7 2 4 1
35
36 S. J. Mousavirad et al.
F1 F2 F3
F4 F5 F6
F7
The convergence curves for the algorithms can be seen in Fig. 5. As can be
observed from there, HMS algorithm is converging fastest in most functions.
Multi-modal benchmark functions contain several local optima and consequently can
be employed to assess the exploration ability of algorithms and how well they are
able to escape from local optima. The employed multi-modal benchmark functions
are listed in Table 4, while 2-D graphs of the search spaces are shown in Fig. 6.
Table 5 compares the HMS algorithms with others other algorithms on multi-
modal benchmark functions. HMS yields the best results for 3 of the 6 functions and
ranks second for 2 further functions, clearly demonstrating its high ability in escaping
from local optima. Overall, HMS is ranked first overall rank, followed by WOA and
FA. Comparing Tables 3 and 5, we can observe that while GWO and SFLA are
ranked second and third for solving unimodal functions, they cannot preserve their
performance for multi-modal functions.
The convergence curves of the algorithms in Fig. 7 verify the high convergence
speed of HMS. For example, for functions such as F9 and F11, the convergence curve
decreased steeply for HMS.
The Human Mental Search Algorithm … 37
F11 1 n x − n cos( √ xi
) + 1 30 [–600, 600] –4.687
4000 i=1 i 1 i
F12 π (10sin(π y ) + n−1 (y − 1)2 [1 + 10 sin2 (π y + 1)] + (y − 30 [–50, 50] –1
n 1 i=1 i i n
n x +1
1)) + i=1 u(xi , 10, 100, 4), with yi = 1 + i 4
⎧
⎪ m
⎨k(xi − a) , xi > 0
⎪
and u(xi , a, k, m) = 0, −a < xi < a
⎪
⎪
⎩k(−x − a)m , x < −a
i i
n
F13 0.1 sin2 (3π x1 ) + i=1 (xi − 1)2 [1 + sin2 (3π xi + 1)] 30 [–50, 50] –1
n
+(xn − 1) [1 + sin2 (2π xn )] + i=1
2 u(xi , 10, 100, 4)
F8 F9 F10
Large-scale problems occur in many data science domains and large-scale global
optimisation (LSGO), which is dealing with optimisation problems with more than
100 decision variables [14] is a challenging task since as the number of decision
variables increases, the search space grows exponentially, making it difficult for
optimisation algorithms to work effectively. Algorithms that work well for both
normal and LSGO problems are hence highly sought after.
To evaluates HMS for LSGO problems, we increase the number of dimensions
from 30 to 200, while employing the same benchmark functions are in Sects. 4.1
38 S. J. Mousavirad et al.
F8 F9 F10
F11 F1 2 F13
and 4.2. Table 6 shows the results for the unimodal function and D = 200. As can
be observed, HMS yields the best performance for most functions by a high margin.
Comparing with Table 3, we can conclude that HMS is even better suited for large-
scale unimodal problems in comparison with the other algorithms.
Table 7 compares the HMS algorithm with others on large-scale multi-modal
functions. Here, HMS performs best for 3 of the 6 functions and is overall ranked
second. While WOA performs best overall here, it did significantly worse on the
unimodal functions.
Non-parametric statistical approaches are divided into two groups, namely, pairwise
comparisons and multiple comparisons. Pairwise comparisons evaluate only two
Table 5 Results for all algorithms on multi-modal benchmark functions
Function PSO HS SFLA ABC ICA BBO FA GWO WOA HMS
F8 Avg. –5839.7546 – –7119.1248 –9113.9842 –8512.5212 –8099.3154 –8969.5712 –6266.9171 –9045.2153 –9308.3218
12405.1614
Stddev. 802.4674 64.5581 1475.2159 459.6007 632.2129 576.9913 678.2812 608.7592 1581.2441 600.9386
Rank 10 1 8 3 9 7 5 9 4 2
F9 Avg. 87.9261 19.3135 68.7188 151.9296 3.2897 50.2997 75.5336 4.2873 0 0
Stddev. 16.7137 3.1289 19.7114 13.0229 1.8242 14.7106 13.2396 7.8023 0 0
Rank 9 5 7 10 3 6 8 4 1.5 1.5
F10 Avg. 6.6234 4.9583 0.3039 18.4379 6.0145 0.6089 0.2699 1.0415E-13 3.8488E-15 8.8816E-16
The Human Mental Search Algorithm …
Stddev. 0.9991 0.5705 0.6440 0.2734 2.6799 0.0909 0.3801 1.4897E-14 2.8119E-15 0
Rank 9 7 5 10 8 6 4 3 2 1
F11 Avg. 4.1369 3.5463 0.0102 240.8548 11.5789 1.0024 0.0084 0.0028 0.0114 0
Stddev. 1.2931 0.8018 0.0140 39.8329 3.8413 0.0267 0.0047 0.0074 0.0436 0
Rank 8 7 4 10 9 6 3 2 5 1
F12 Avg. 9.1955 6.4039 0.3153 184237.4110 6.7856 0.0080 3.2959 0.0480 0.0219 0.0178
Stddev. 4.3929 1.6343 0.3836 104481.1280 2.8135 0.0037 0.7816 0.0197 0.0135 0.0281
Rank 9 7 5 10 8 1 6 4 3 2
F13 Avg. 143.5001 220.7775 0.0095 920691.2156 6.4195 0.1236 0.0019 0.6069 0.5574 0.2194
Stddev. 341.9978 250.9874 0.0201 31057.1275 2.3270 0.0315 7.5304E-04 0.2392 0.1735 0.2599
Rank 8 9 2 10 7 3 1 6 5 4
Average rank 8.83 6.00 5.17 8.83 6.83 4.83 4.50 4.67 3.42 2.25
Overall rank 9.5 7 6 9.5 8 5 3 4 2 1
39
40
Table 6 Results for all algorithms on unimodal benchmark functions for D = 200
Function PSO HS SFLA ABC ICA BBO FA GWO WOA HMS
F1 Avg. 1.5726E04 1.3116E05 4.4589E05 5.0335E05 9.8617E03 1.1125E03 2.3430E04 9.2015E–08 5.3182E–71 0
Stddev. 1.8113E03 6.5347E03 1.1290E04 9.8706E03 3.2471E03 84.7199 4.2076E03 6.1483E–08 2.5841E–10 0
Rank 6 8 9 10 5 4 7 3 2 1
F2 Avg. 156.8753 296.3713 5.5107E84 6.2941E25 228.348 39.231 227.3975 3.3703E–05 7.8590E–49 0
Stddev. 12.3644 9.6628 1.6213E85 3.01999E26 55.1634 2.9843 14.3488 9.0712E–06 2.8757E–48 0
Rank 5 8 10 9 7 4 6 3 2 1
F3 Avg. 1.9921E05 1.7694E06 1.6958E06 1.7433E06 5.9663E05 2.3123E05 2.8345E08 1.8616E04 4.6119E06 0
Stddev. 4.9198E04 2.3577E05 1.3892E05 2.9197E05 7.8567E04 3.6379E04 3.7962E07 8.4802E03 1.2566E06 0
Rank 3 8 6 7 5 4 10 2 9 1
F4 Avg. 33.1202 82.0031 67.7457 98.8834 94.2710 36.5689 72.0320 24.2905 81.7410 0
Stddev. 2.7025 0.9841 1.0111 0.8226 1.3616 2.0162 5.3872 6.8181 20.9828 0
Rank 3 8 5 10 9 4 6 2 7 1
F5 Avg. 4.1310E06 3.2875E08 1.5132E09 1.9762E09 8.5196E06 4.4911E04 8.0270E06 198.0339 197.7540 197.0797
Stddev. 1.1278E06 2.6852E07 6.2308E07 9.5497E07 6.3057E06 6.4151E03 2.7874E06 0.4679 0.1895 0.2143
Rank 5 8 9 10 7 4 6 3 2 1
F6 Avg. 1.5875E04 1.3089E05 4.4758E05 5.0564E05 9.4243E03 1.1154E03 2.4963E04 29.2361 11.4810 45.7007
Stddev. 2.1562E03 8.8120E03 8.4940E03 9.4677E03 3.6849E03 78.9891 4.3514E03 1.5876 4.0354 0.4879
Rank 6 8 9 10 5 4 7 2 1 3
F7 Avg. 12.8865 931.9836 4.7249E03 6.0920E03 46.7428 0.6002 139.1196 0.0184 0.0031 1.3296E–05
Stddev. 2.6560 82.1684 196.0919 409.1337 39.9407 0.0962 25.6236 0.0073 0.0029 1.3816E–05
Rank 5 8 9 10 6 4 7 3 2 1
Average rank 4.71 8.00 8.14 9.43 6.29 4.00 7.00 2.57 3.57 1.29
Overall rank 5 8 9 10 6 4 7 2 3 1
S. J. Mousavirad et al.
Table 7 Results for all algorithms on multi-modal benchmark functions for D = 200
Function PSO HS SFLA ABC ICA BBO FA GWO WOA HMS
F8 Avg. –2.2858E04 –5.3458E04 –1.1670E04 –1.8499E04 –1.25E04 –4.1833E04 –5.3993E04 –2.9357E04 –7.0312E04 –2.952E04
Stddev. 2.4079E03 1.2999E03 918.5048 795.8096 2.2361E03 1.8554E03 1.7927E03 2.4428E03 1.1908E04 2.883E03
Rank 7 3 10 8 9 4 2 6 1 5
F9 Avg. 1.4365E03 1.2649E03 3.0613E03 2.6686E03 629.8474 871.7112 1.2319E03 22.6712 7.5791E–15 0
Stddev. 70.7638 47.0148 36.9157 70.9750 48.5355 51.0479 82.1383 14.6199 4.1513E–14 0
Rank 8 7 9 10 4 5 6 3 2 1
F10 Avg. 11.3312 17.7619 20.7198 20.1281 17.8343 4.5262 14.5738 2.2673E–05 4.3225E–15 8.8818E–16
The Human Mental Search Algorithm …
Stddev. 0.5800 0.1685 0.0362 0.0464 0.9127 0.1128 0.6897 6.2867E–06 2.3756E–15 0
Rank 5 7 10 9 8 4 6 3 2 1
F11 Avg. 148.8701 1.1755E03 4.0241E03 4.5481E03 94.7528 11.1871 206.8672 0.0028 0 0
Stddev. 20.2986 49.5853 73.8511 112.0118 32.2501 0.8330 30.1673 0.0110 0 0
Rank 6 8 9 10 5 4 7 3 1.5 1.5
F12 Avg. 4.5756E04 5.0453E08 2.8806E09 4.4458E09 1.6951E07 19.8444 1.4453E05 0.5522 0.0628 1.0309
Stddev. 5.1036E04 5.8615E07 1.3324E08 2.6288E08 3.7016E07 3.7585 1.8727E05 0.0494 0.0317 0.0285
Rank 5 8 9 10 7 4 6 2 1 3
F13 Avg. 2.3014E06 1.1824E09 6.1629E09 8.5301E09 2.6582E07 335.7491 7.7316E06 16.9810 6.8141 19.4815
Stddev. 1.0705E06 1.1889E08 2.8682E08 4.9642E08 2.8040E07 110.9103 4.3347E06 0.6140 2.4383 0.1009
Rank 5 8 9 10 7 4 6 2 1 3
Average rank 6.00 6.83 9.33 9.50 6.67 4.17 5.50 3.17 1.42 2.42
Overall rank 6 8 9 10 7 4 5 3 1 2
41
42 S. J. Mousavirad et al.
F25 Avg. 9.5614E05 4.0175E05 7.4508E05 1.4557E04 3.6401E04 3.4787E04 2.9238E04 1.0303E06 7.4994E05 3.3920E04
Std. 2.3017E05 8.5910E04 5.6959E05 6.1302E03 1.8570E04 1.8215E04 1.9832E04 4.0719E05 2.2676E05 1.0668E04
Rank 9 6 7 1 3 5 2 10 8 4
F26 Avg. –111.3386 –119.1150 –123.3829 –41.1588 –123.9024 –122.8261 –13.2207 –95.7207 –99.9633 –124.6913
Std. 3.3220 2.0039 1.3633 36.3903 0.6872 1.7160 8.8421 7.5058 6.2787 1.6775
Rank 6 5 3 9 2 4 10 8 6 1
F27 Avg. –286.7639 –286.1862 –286.5418 –286.4730 –286.3202 –285.5187 –286.5392 –286.8498 –286.3400 –286.9038
Std. 0.3497 0.1630 0.3045 0.2351 0.3085 0.0653 0.3948 0.4003 0.3474 0.3453
Rank 3 9 4 6 8 10 5 2 7 1
F28 Avg. 788.0670 516.8825 531.6254 930.9383 447.9970 552.0418 843.3938 654.4284 909.9902 423.4599
Std. 101.0248 34.1908 73.26 137.3253 134.6112 73.0665 166.0494 77.3970 196.0838 80.6935
Rank 7 3 4 10 2 5 8 6 9 1
Average 5.1 5 6.2 7.1 6.5 3.57 7.13 5.4 7.13 1.87
rank
Overall 4 3 6 8 7 2 9.5 5 9.5 1
rank
45
46 S. J. Mousavirad et al.
The results of the Friedman test are shown in Table 11 which gives the average
Friedman rank for each algorithm as well as the yielded p-value. HMS obtained the
best rank, while the p-value is negligible, indicating a significant difference among
the algorithms.
5 Conclusions
References
1. Amirsadri, S., Mousavirad, S.J., Ebrahimpour-Komleh, H.: A levy flight-based grey wolf opti-
mizer combined with back-propagation algorithm for neural network training. Neural Comput.
Appl. 30(12), 3707–3720 (2018)
2. Atashpaz-Gargari, E., Lucas, C.: Imperialist competitive algorithm: an algorithm for optimiza-
tion inspired by imperialistic competition. In: IEEE Congress on Evolutionary Computation,
pp. 4661–4667 (2007)
3. Brooks, S.P., Morgan, B.J.: Optimization using simulated annealing. J. R. Stat. Soc. Ser. D
(The Statistician) 44(2), 241–257 (1995)
4. Carvalho, A.R., Ramos, F.M., Chaves, A.A.: Metaheuristics for the feedforward artificial neural
network (ANN) architecture optimization problem. Neural Comput. Appl. 20(8), 1273–1284
(2011)
5. Das, S., Abraham, A., Konar, A.: Automatic clustering using an improved differential evolution
algorithm. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 38(1), 218–237 (2008)
The Human Mental Search Algorithm … 47
6. Derrac, J., García, S., Molina, D., Herrera, F.: A practical tutorial on the use of nonpara-
metric statistical tests as a methodology for comparing evolutionary and swarm intelligence
algorithms. Swarm Evolut. Comput. 1(1), 3–18 (2011)
7. Espinoza-Pérez, S., Rojas-Domınguez, A., Valdez-Pena, S.I., Mancilla-Espinoza, L.E.: Evo-
lutionary training of deep belief networks for handwritten digit recognition. Res. Comput. Sci.
148, 115–131 (2019)
8. Eusuff, M., Lansey, K., Pasha, F.: Shuffled frog-leaping algorithm: a memetic meta-heuristic
for discrete optimization. Eng. Optim. 38(2), 129–154 (2006)
9. Geem, Z.W., Kim, J.H., Loganathan, G.V.: A new heuristic optimization algorithm: harmony
search. Simulation 76(2), 60–68 (2001)
10. Hancer, E., Xue, B., Zhang, M.: Differential evolution for filter feature selection based on
information theory and feature ranking. Knowl.-Based Syst. 140, 103–119 (2018)
11. Kapanova, K., Dimov, I., Sellier, J.: A genetic approach to automatic neural network architecture
optimization. Neural Comput. Appl. 29(5), 1481–1492 (2018)
12. Karaboga, D., Basturk, B.: A powerful and efficient algorithm for numerical function opti-
mization: artificial bee colony (ABC) algorithm. J. Glob. Optim. 39(3), 459–471 (2007)
13. Kennedy, J., Eberhart, R.: Particle swarm optimization (PSO). In: IEEE International Confer-
ence on Neural Networks, pp. 1942–1948 (1995)
14. Mahdavi, S., Shiri, M.E., Rahnamayan, S.: Metaheuristics in large-scale global continues opti-
mization: a survey. Inform. Sci. 295, 407–428 (2015)
15. Minija, S.J., Emmanuel, W.S.: Imperialist competitive algorithm-based deep belief network
for food recognition and calorie estimation. Evolut. Intell., 1–16 (2019)
16. Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)
17. Mirjalili, S., Mirjalili, S.M., Lewis, A.: Grey wolf optimizer. Adv. Eng. Softw. 69, 46–61 (2014)
18. Mladenović, N., Hansen, P.: Variable neighborhood search. Comput. Oper. Res. 24(11), 1097–
1100 (1997)
19. Mousavirad, S., Ebrahimpour-Komleh, H.: Feature selection using modified imperialist com-
petitive algorithm. In: ICCKE 2013, pp. 400–405. IEEE (2013)
20. Mousavirad, S.J., Bidgoli, A.A., Ebrahimpour-Komleh, H., Schaefer, G.: A memetic imperialist
competitive algorithm with chaotic maps for multi-layer neural network training. Int. J. Bio-
Inspir. Comput. 14(4), 227–236 (2019)
21. Mousavirad, S.J., Bidgoli, A.A., Ebrahimpour-Komleh, H., Schaefer, G., Korovin, I.: An effec-
tive hybrid approach for optimising the learning process of multi-layer neural networks. In:
International Symposium on Neural Networks, pp. 309–317 (2019)
22. Mousavirad, S.J., Ebrahimpour-Komleh, H., Schaefer, G.: Effective image clustering based on
human mental search. Appl. Soft Comput. 78, 209–220 (2019)
23. Radicchi, F., Baronchelli, A.: Evolution of optimal lévy-flight strategies in human mental
searches. Phys. Rev. E 85(6), 061121 (2012)
24. Radicchi, F., Baronchelli, A., Amaral, L.A.: Rationality, irrationality and escalating behavior
in lowest unique bid auctions. PloS One 7(1) (2012)
25. Shi, Y., Eberhart, R.: A modified particle swarm optimizer. In: IEEE International Conference
on Evolutionary Computation, pp. 69–73 (1998)
26. Simon, D.: Biogeography-based optimization. IEEE Trans. Evolut. Comput. 12(6), 702–713
(2008)
27. Stützle, T.: Local search algorithms for combinatorial problems. Darmstadt University of Tech-
nology PhD Thesis, vol. 20 (1998)
28. Suganthan, P.N., Hansen, N., Liang, J.J., Deb, K., Chen, Y.P., Auger, A., Tiwari, S.: Prob-
lem definitions and evaluation criteria for the CEC 2005 special session on real-parameter
optimization. Nanyang Technological University Singapore, Technical report (2005)
29. Sun, R.: Optimization for deep learning: theory and algorithms (2019). arXiv:1912.08957
30. Tran, T.H., Nguyen, H., Nhat-Duc, H., et al.: A success history-based adaptive differential
evolution optimized support vector regression for estimating plastic viscosity of fresh concrete.
Eng. Comput., 1–14 (2019)
31. Yang, X.S.: Firefly algorithm, stochastic test functions and design optimization (2010).
arXiv:1003.1409
Reducing Redundant Association Rules
Using Type-2 Fuzzy Logic
1 Introduction
© The Editor(s) (if applicable) and The Author(s), under exclusive license 49
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_3
50 E. A. E. Reheem et al.
popular algorithm for ARs. It is implemented for derivation all frequent itemsets
from the database and producing the ARs for discovering the knowledge using the
pre-defined threshold measures (minsup and minconf) [7].
In usual, the ARM encompasses of two principal parts: First, minsup is calculated
to find out whole frequent itemsets in a database by implementing apriori algorithm
where takes the itemsets and begets candidate K-itemsets and next check it with the
minsup, if candidate K-itemset is fewer than minsup then pruning it, but if it is higher
than minsup then it is frequent itemset. Second, from those frequent itemsets, the
ARs are produced which meet both the minsup and minconf thresholds [8].
In the literature, frequent itemsets are interesting if its sup value is greater than
or equal to the minsup and an ARs are robust if its con value is bigger than or
equal to minconf. In the apriori-based mining algorithm, users can recognize the
minsup and minconf for their databases. Thus the performance of this algorithm is
heavy since it relies on certain user-specified thresholds, For instance, if the user
identifies the minsup is very high, nothing might be located in a database and rules
are misleading, whereas if the minsup is exceedingly low; might perform to poor
mining performance and generating many uninteresting ARs. Therefore, users are
unreasonably demanded to know details of the database to be mined, to specify
a suitable threshold. Wherefore, gaining every frequent itemsets in a database is
challenging since it involves searching all possible itemsets (item combinations).
So, the problem can be described as how to gain every AR that meets user-specified
minsup and minconf values [7].
A fundamental problem in ARM is presence a tremendous quantity of obtained
rules; some of which might be redundant and present no new information so severely
hampers the efficient usage of the explored knowledge. Moreover, several extracted
rules provide no benefit to the user or can change by other rules, thus deemed redun-
dant rules [9]. To handle problems in ARM, some researchers presented the fuzzy
set theory to ARM (called FARs).
Fuzzy system can aid in minimizing the key disadvantage current ARM suffers
from depending on fuzzy set concept. FARs is a stretch of the classical ARM by
defining sup and conf of the fuzzy rule. It utilized for transforming quantitative data
into fuzzy data [9, 10]. So, the FARs has a suitable property in terms of quantization
of numerical attributes in database compared with Boolean quantized generalized
ARM. Eventually, the mining results of FARs produce linguistic terms rather than
intervals, which are easy to understand and nearer to user’s mind [11].
The contribution of this article is to prune redundant rules resulting from the apriori
algorithm for an actual DM application and to find frequent itemsets by generating
minsup and minconf suitable to a database to be mined. The proposed system relies on
the using of T2FARM to reduce the redundant rules extracted for an actual application
in addition to find out all ARs that satisfy minsup and minconf values. A type-2 fuzzy
logic system (T2FLS) is beneficial in cases where it is incapable of determining an
exact certainty and measurement uncertainties. It recognized that T2 fuzzy sets permit
to model and to reduce uncertainties in rule-based FLS. The results conducted to
displays the improvement of redundant rules pruning compared to traditional FARs.
Reducing Redundant Association Rules Using Type-2 Fuzzy Logic 51
This paper is ordered as follows: Sect. 2 briefly discusses the research related
to ARs. The proposed model is introduced in Sect. 3. The experimental result and
evaluation of the proposed system are displayed in Sect. 4. Finally, the conclusion
and future work are given in Sect. 5.
2 Literature Review
DM became a requisite research area so, there are too many works done in academia
and industry on developing new approaches to DM over the last decade. The research
in DM still stays high due it portrays problem- rich research area, the richness of prac-
tical applications to derive and obtain the useful hidden information from the data,
and finding associations among lots of domains in immense relational databases. For
instance, in [12], DM techniques can find valuable information to educational systems
to improve learning that adopted in the formative evaluation to assistance educators
create an educational foundation for decisions when adjusting an environment or
teaching approach.
The authors in [13] compared the performing of numerous DM techniques in
e-learning systems to foretell the marks that university students will obtain in the
final exam of a course. Many classification methods have been used, like decision
trees, fuzzy rules, and neural networks. There are three steps intended for the mining
process: First, pre-processed to transform the valuable data into KEEL data files.
Then, DM algorithms executed to find out hidden knowledge inside the data of
concern for the instructor. Finally, post-processing models obtained saved into result
files that necessity be understood by the teacher to achieve decisions regarding the
students. Their tests show not gets better classification accuracy, and there has not
one single algorithm that obtains the best classification accuracy with every datasets.
ARM is a common researched technique for DM. It assists in detecting associa-
tions between items in addition to identifying strong rules discovered from databases.
This technique benefits users by allowing buying products relying on their prefer-
ences. The academics in [14] applied a generalized ARM that employs pruning
methods for generalizing rules. This algorithm produces fewer candidate itemsets,
and a vast count of rules pruned by minimal confidence. This study suffers from
utilizing original frequent and ARs as opposed to rescanning the database. However,
it shows better than another algorithm, where it can prune a tremendous count of
rules.
Another relates work in [15] DM algorithm presented to find out positive and
negative ARs. It studies the methods of sup and conf level, the explanation of the
positive and negative ARs, and explains the conflicted rules problems which are in
the positive and negative ARM and the solutions of these conflicting rules problems.
This study can apply in various applications to find robust patterns and generates all
varieties of limited rules.
Some works that employed FLS for processing the uncertain information in ARs.
For illustration, in [16] a fuzzy rule algorithm depend on fuzzy decision tree is
52 E. A. E. Reheem et al.
provided for DM, which integrates the comprehensibility of rules produced relies
on decision tree and fuzzy sets. First, they implement histogram analysis to specify
membership functions for each attribute of data. Then they build a fuzzy decision tree.
The authors also implement the genetic algorithm to improve the initial membership
functions. Their article is effective in the performance and comprehensibility of rules
compared with other methods.
In [3] submitted the fuzzy transaction as a fuzzy subset of items to find out FARs
in relational databases that contain quantitative data. This model operates in distinct
mining varieties of patterns, from regular ARs to fuzzy and approximates functional
dependencies and gradual rules. The academic in [11] presented FARs to derive rules
from the database in addition to prune the redundant rules extracted. They define
redundancy of FARs and show theorems concerning the redundancy of FARs. But
their algorithm shows limitation in terms of computational time, and non-redundant
rules are unexpectedly deleted. So, the authors see that this algorithm must enhance
by implementing the other method.
The academics in [17] generated rare ARs from an educational database by
applying the fuzzy-based apriori algorithm. This method is utilized to less frequent
itemsets by implementing measure ‘maximum sup’ for providing rare items and to
prune rare items and measure ‘Rank’ is utilized to prune the particular outliers from
the rare items created. The ARs can produce after the rare items produced.
Aim of the Work Due to the significant challenge for the ARM algorithm, which
is the enormous number of the extracted rules, could be redundant; the traditional
fuzzy algorithm can meet some difficulties. From this point, we intend to design
efficient ARs to prune redundant rules extracted from the apriori algorithm for DM
application and detect frequent itemsets by applying the T2FLS. Results displayed
that such a techniques could be worked effectively in DM, minimizing the impacts
of typical drawbacks of the FLS that is less able to pruning redundant rules.
3 Proposed System
The proposed model mainly focuses on every previously mentioned problem. This
model purposes to reduce the number of redundant rules to maximize the accuracy
of the results that happened by combines an apriori algorithm with T2FLS to prune
redundant rules for a real DM application. The schema of the proposed system is
presented in Fig. 1.
Reducing Redundant Association Rules Using Type-2 Fuzzy Logic 53
We apply the dataset, popularly known as “Adult” data from the UCI machine learning
repository. The Adult data set contains 32,659 records. 60% of the data are used for
training, and 40% of data are used for testing.
T2FLS model and minimizes the effects of uncertainties. A fuzzy membership func-
tion characterizes T2 FLS, i.e., the membership value for each element of this set
is FS in [0, 1]. The membership functions (MF) of T2FLS is three dimensional:
upper membership functions (UMF), lower membership functions (LMF) and the
area between these two functions is footprint of uncertainty (FOU), that give addi-
tional degrees of freedom that make it feasible to handle uncertainties (see Fig. 2)
[18, 19].
Type-2 Fuzzy set is concerned with quantifying using natural language in which
words can have vague meanings. We operated to transforms the quantitative values
vi j of each transaction Ti = (i = 1, 2, . . . , n) for each item I j into fuzzy values
upper
UMF f i jl and LMF f ilower
jl by using type-2 MF for each R jl , where R jl is the
l-th fuzzy region of item I j . These items are fuzzified using type-2 Triangular Fuzzy
Number (TFN), provided to decide the degree of membership of items in the apriori
algorithm as exposed in Fig. 3. For example, if attribute age in database takes values
from [10, 70], then it could be partitioned into four new attributes such as young [10,
30], youth [20, 40], middle age [30, 50], and old [50, 70]. In this work, we utilize
type-2 Triangular MF to describe all variables (age, income, and education degree)
that is defined as [20]:
⎧
⎨ h(x + a)/a, i f − a ≤ x ≤ 0
LMF = h(a − x)/a, i f 0 ≤ x ≤ a (1)
⎩
0, otherwise
⎧
⎨ h(x + b)/b, i f − b ≤ x ≤ 0
UMF = h(b − x)/b, i f 0 ≤ x ≤ b (2)
⎩
0, other wise
3.3 Type-Reducer
The type-reducer produces a type-1 fuzzy set output by center of sets (COS) type
upper
reduction. To get the type-reducer set, it is enough to reduce its UMF f i jl and
lower
LMF f i jl to fuzzy value f i jl .
In this stage, after reduce type-2 fuzzy values to type-1 fuzzy values by the center
of sets type-reduction method, the apriori DM is related to finding frequent itemsets
and discovers interesting ARs. The apriori algorithm requires two thresholds: minsup
and minconf. These two thresholds identify the association that must hold before the
rule will be mined. The steps of the apriori algorithm are given as follows:
The objective of this stage is to find all the itemsets that meet the minsup threshold
(these itemsets are called frequent itemsets). An itemset with sup higher than or equal
the minsup is a frequent itemset.
To produce the frequent itemsets is as fol1ows
1. Compute the supp of ARs for each Rjl in the transactions. Thus suppjl is the sum
of fijl in transactions.
2. Check if suppjl ≥ minsup, If the value of suppjl is equal or larger than the minsup
value, then this suppjl is a frequent and put it in the large 1-itemsets (L 1 ). If L 1 is
not null, then do the next step..
3. Set r = 1, where r is used to represent the number of items in the current itemsets
to be processed.
4. Join the large r-itemsets Lr to generate the candidate (r + 1) itemsets
5. Do the following substeps for each newly formed (r + 1) itemset
upper
a. Compute fijl and flower
ijl by using type-2 MF.
upper
b. Reduce each fijl and flower
ijl of each item Ij to fuzzy value.
Reducing Redundant Association Rules Using Type-2 Fuzzy Logic 57
c. Compute fuzzy value of each item by using the minimum operator for the
intersection.
d. Compute the supp of ARs and then Check if supp jl ≥ minsup, If the value
of supp jl is equal or larger than the minsup value, then put it in Lr + 1, if Lr
+ 1 is null then do the next step; otherwise, set r = r +1 and repeat STEPs
4–5.
After getting on fuzzy frequent itemsets, we capable of generates FARs. For instance,
some FARs is expressed as the next form:
• IF age is young, THEN income is high.
• IF age is youth, THEN income is low.
• IF income is high, THEN age is young.
• IF age is youth, and income is low THEN education degree is high school.
Where age, income, and education degree are sets of attribute and young, youth,
low, high school, and high are linguistic labels.
Supp f uzzy (A ∪ B)
Con f ( A→B) = (3)
Supp f uzzy (A)
where Conf(A →B) the percentage of the fuzzy number of transactions that contain
A∪ B to the total fuzzy number of transactions that contain A.
Then, we compare the conf of a rule with the minconf. The rule is satisfied when
its conf is higher than or equals the minconf.
ARM finds an association among items based on the sup and the conf of the rule.
However, these measures have many limitations as the database grows larger and
larger. Thus the mined rules increase faster, and many redundant rules are extracted,
and it becomes impossible for humans to find interesting rules. For this reason, in
this paper, we use the certainty factor (CF) to measure the redundancy of extracted
FARs. CF was developed to measure the uncertainty in the ARs. The value of the
certainty factor is a number from −1 to 1. The CF of the rule X → Y is described
as follows:
58 E. A. E. Reheem et al.
Con f (X → Y ) − supp(Y )
C F(X → Y ) = , i f Con f (X → Y ) > supp(Y ) (4)
1 − supp(Y )
Con f (X → Y ) − supp(Y )
C F(X → Y ) = , i f Con f (X → Y ) ≤ supp(Y ) (5)
supp(Y )
Then now, we can remove the redundant FARs by assuming some theorems
that consider the consequent part of rules is identical for redundancy theorem. The
theorems are represented as follows:
• Theorem 1: A → B and A → B be two FARs:
IF Con f (A → B) ≥ Con f (A → B), THEN C f (A → B) ≥ C f (A → B)
– Ex: IF Conf (age is young → income is low) ≥ Conf (education-degree is high
school → income is low)
– THEN Cf (age is young → income is low) ≥ Cf (education-degree is high
school → income is low)
• Theorem 2: Combine FARs: we consider A → C, B → C and A, B → C be two
FARs, where A, B, and C are fuzzy itemsets.
IF max (Con f (A → C), Con f (B → C) ≥ Con f (A, B → C))
THEN max (C f (A → C), C f (B → C)) ≥ C f (A, B → C)
– Ex: IF max (Conf (age is young → income is low), Conf (education degree
is high school → income is low)) ≥ Conf (age is young & education degree
is high school → income is low)
– THEN max (Cf (age is young → income is low), Cf (education degree is
high school → income is low)) ≥ Cf (age is young & education degree is high
school → income is low)
• Theorem 3: We consider A → B FAR, where A and B are fuzzy itemsets. Let
fuzzy itemset family Q,
IF max(Con f (Y → B)) ≥ Con f (A → B),
Y ∈Q
THEN max(C f (Y → B)) ≥ C f (A → B)
Y ∈Q
– Ex: IF max (Conf (age is young & education degree is high school), or (age is
youth & education degree is master) → income is low) ≥ Conf (age is young
→ income is low)
– THEN max (Cf (age is young & education degree is high school),or (age is
youth & education degree is master) → income is low) ≥ Cf (age is young →
income is low)
Reducing Redundant Association Rules Using Type-2 Fuzzy Logic 59
From these theorems, we now able to extract redundant rule and non-redundant of
FARs in terms of a joining of FARs to prune redundant rules by considering A → B
FAR, where A and B are fuzzy itemsets based on Q = 2x − X − ∅
• IF max(Con f (Y → B)) ≥ Con f (A → B),
Y ∈Q
THEN the rule A → B is a redundant rule.
• IF max(Con f (Y → B)) < Con f (A → B),
Y ∈Q
THEN the rule A → B is non-redundant rule.
Finally, The CF value of the non-redundant rules is larger than the CF of the corre-
sponding redundant rule. So, the association between the antecedent and consequent
of the non-redundant rule is stronger than any corresponding redundant rule.
4 Experimental Results
In this section, we do some experiments that have been carried out to check the
performance of the proposed system and assert improvements over the traditional
approach. A program is implemented using MATLAB to assess the performance of
the proposed system. We used “Adult” database from the UCI machine learning.
In the first experiment, we display the capacity of the proposed technique in
eliminating redundant rules compared to traditional FARs by set minsup as 0.5 and
set minconf (0.2, 0.7, and 0.9). The number of the extracted rules dependent on the
minconf where the number of association rules increased when reduces the minconf
(see Table 4). The results presented that the proposed technique is able to delete
redundant rules better than the traditional FARs.
Figure 4 and Table 5 present the rule all extracted from proposed system and
number of deleted redundant rules by the proposed system. Table 6 shows the
performing the proposed system with traditional fuzzy in terms of pruning redundant
ARs. Results demonstrated that the proposed T2FARM has good performance and
more accurate to prune redundant rule than a traditional FARs (see Fig. 5).
No. of rules
No. of
150 Extracted
Rules
100
No. of Non-
50 Redundant
Rules
0
0.2 0.7 0.9
Miniconf
Table 6 No. of
Minconf Fuzzy-approach Proposed system
non-redundant rules
0.2 240 180
0.7 225 130
0.9 190 100
non-redundant ARs
250
200
150 Proposed System
100 Fuzzy-approach
50
0
0.2 0.7 0.9
Miniconf
In the second experimental, the execution time is used to compute time for the
proposed system and traditional FAR.as displayed in Table 7. The proposed system
is corresponds to highest time. This is because the computations of T2FLS are highly
complicated computations.
Reducing Redundant Association Rules Using Type-2 Fuzzy Logic 61
5 Conclusion
In this paper, we suggested a novel approach for extracting hidden information from
items. The submitted approach is based on the hybridization between ARs, and
T2FLS to recognize frequent itemsets that fulfill minsup and minconf values in
addition to minimize the redundant rules mined from the apriori algorithm.
The experimental results demonstrate that a proposed approach gives better perfor-
mance during compared with traditional FAR in all the sensitive parameters. Our
results on a real-world data set confirm that the proposed approach has excellent
potential in terms of accuracy. Our future work includes integrating genetic algorithm
with T2FLS to develop the prune redundant rule.
References
1. Shu, J., Tsang, E.: Mining fuzzy association rules with weighted items. In: 2000 IEEE
International Conference on Systems, Man, and Cybernetics, pp. 1906–1911 (2000)
2. Deepashri, K.S., Kamath, A.: Survey on techniques of data mining and its applications. Int. J.
Emerg. Res. Manag. Technol. 6(2), 198–201 (2017)
3. Delgado, M., Marín, N., Sánchez, D., Vila, M.: Fuzzy association rules: general model and
applications. IEEE Trans. Fuzzy Syst. 11–2 (2003)
4. Zhao, Y., Zhang, T.: Discovery of temporal association rules in multivariate time series. In: Inter-
national Conference on Mathematics, Modeling and Simulation Technologies and Applications
(MMSTA 2017), pp. 294–300 (2017)
5. Helm, B.L.: Fuzzy Association Rules an Implementation in R, Master’s thesis, Vienna
University of Economics and Business Administration (2007)
6. Darwish, S.M., Amer, A.A., Taktak, S.G.: A novel approach for discovery quantitative fuzzy
multi-level association rules mining using genetic algorithm. Int. J. Adv. Res. Artif. Intell.
(IJARAI) 5(6), 35–44 (2016)
7. Sowan, B., Dahal, K., Hossain, M.A., Zhang, L., Spencer, L.: Fuzzy association rule mining
approaches for enhancing prediction performance. Expert Syst. Appl. 40(17), 6928–6937
(2013)
8. Suganya, G., Paulraj Ananth, K.J.: Analysis of association rule mining algorithms to generate
frequent itemset. Int. J. Innov. Res. Sci. Eng. Technol. 6(8), 15571–15577 (2017)
9. Xu, Y., Li, Y., Shaw, G.: Concise representations for approximate association rules. In: IEEE
International Conference on Systems, Man and Cybernetics, Singapore, pp. 94–101 (2008)
10. Watanabe, T.: Mining fuzzy association rules of specified output field. In: IEEE International
Conference on Systems, Man and Cybemetics, pp. 5754–5759 (2004)
11. Watanabe, T.: Fuzzy association rules mining algorithm based on output specification and
redundancy of rules. In: IEEE International Conference on Systems, Man and Cybemetics,
pp. 283–289 (2011)
62 E. A. E. Reheem et al.
12. Romero, C., Ventura, S.: Educational data mining: a survey from 1995 to 2005. Expert Syst.
Appl. 33(1), 135–146 (2007)
13. Romero, C., Espejo, P.G., Zafra, A., Romero, J., Ventura, S.: Web usage mining for predicting
final marks of students that use moodle courses. J. Comput. Appl. Eng. Educ. 27(3), 135–146
(2013)
14. Huang, Y., Wu, C.H.: Mining generalized association rules using pruning techniques. In: IEEE
International Conference on Data Mining, pp. 227–234 (2002)
15. Antonie, L., Zaïane, O.R.: Mining positive and negative association rules: an approach for
confined rules. In: 8th European Conference on Principles and Practice of Knowledge Discovery
in Databases Cite this publication, pp. 1–13 (2004)
16. Kim, M.W., Lee, J.G., Min, C.H.: Efficient fuzzy rule generation based on fuzzy decision tree
for data mining. In: IEEE Conference on International Fuzzy Systems, South Korea (1999)
17. Rajeswari, A.M., Sridevi, M. Chelliah, C.D.: Outliers detection on educational data using fuzzy
association rule mining. In: Proceedings of International Conference on Advanced in Computer
Communication and Information Science (ACCIS-14), pp. 1–9 (2014)
18. Chen, Y., Yao, L.: Robust type-2 fuzzy control of an automatic guided vehicle for wall-
following. In: International Conference of Soft Computing and Pattern Recognition, pp. 172–
177 (2009)
19. Darwish, S.M., Mabrouk, T.F., Mokhtar, Y.F.: Enriching vague queries by type-2 fuzzy
orderings. Lecture Notes on Information Theory, vol. 2(2), pp. 177–185 (2014)
20. Jerry, M.: Type-2 fuzzy sets and systems: an overview. IEEE Comput. Intell. Mag. 2(1), 20–29
(2007)
Identifiability of Discrete Concentration
Graphical Models with a Latent Variable
Mohamed Sharaf
1 Introduction
This work is insprired by the work of Stanghellini and Vantaggi’s criteria for identifi-
ability in undirected graphical models presented in [20]. However our paper presents
a detalied algorithmic prespective along with an implementation of the algorithm in
MATLAB, see [19] for more details. Also, our paper is a self-contained presentation
of the algorithm upon which we built iugm. Latent models [5] have been used in sev-
eral fields, e.g. econometrics, statistics, epidemiology, psychology, sociology, etc. In
these models, there exist hidden variables which are not observed and could have an
effect (sometimes causal) on the observed variables. We consider models in which
there is a single hidden variable with no parents, the hidden variable is the parent
of all the other variables, and all variables are binary, i.e. have exactly two states or
levels. While the probability distribution of the observed variables is known, it is also
important to assess (learn) the probability distribution of all variables including the
hidden one. Within the scope of this paper, we call this problem the identifiability
problem. (Note that this is different from the problem of identifiability of causal
effects, which is the focus of, for example, [9, 17]. Identifiability comes in different
forms: global, local, and generic. Quoting from [4].
Definition 1 A parameterization is called identifiable if it is one-to-one. That is, let
ξ1 and ξ2 be two parameter values with their corresponding distributions Pξ1 and Pξ2 ,
then ξ1 = ξ1 implies Pξ1 = Pξ2
In a globally identifiable model [7], for every pair of instantiations of the parameter
vector θ , θa and θb , the distributions of the observable variables, P(θa ) and P(θb )
M. Sharaf (B)
Computer and Information Sciences, Jouf University, Sakaka, Saudi Arabia
Computer and Systems Dept, Al-Azhar University, Cairo, Egypt
e-mail: [email protected]
© The Editor(s) (if applicable) and The Author(s), under exclusive license 63
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_4
64 M. Sharaf
are the same iff θa = θb (e.g., no two distinct sets of parameter values yield the same
distribution of the data). This implies that, once the parameter space is defined, we
cannot find two different vectors of parameter values that lead to the same moments
of the distribution of the observed variables, i.e., the parameter values are uniquely
identified by the distribution of the observed variables. This one-to-one mapping
holds in the whole domain.
In contrast, local identification is a weaker form of identification than global
identification. In order to define local identifiability, the concept of neighborhood
of parameter vectors is introduced. Conceptually, in locally identifable models the
uniqueness property is preserved only in a neighborhood of parameter vectors. Given
the parameter vector θ , the model is locally identifiable if for a vector of values θa ,
there is no vector θb in the neighborhood of θa , such that P(θa ) = P(θb ) unless
θa = θb . Therefore, global identifiability implies local identifiability but not vice
versa. Generic identifiability is weaker than local identifiabilty, because there may
exist a set of null measure of non-identifiable parameters. Generically identifiable
models are the weakest form of identifiable models and are introduced in [2].
A ⊥ B | C, (2)
66 M. Sharaf
X A ⊥ X B | XC (3)
It follows form the definition that pairwise Markov property (PM) is weaker than
local Markov property (LM) which is weaker than the global Markov property (GM).
Syntactically, G M =⇒ L M =⇒ P M.
Decomposable graph
A graph G = (V, E) is decomposable if either:
1. G is complete, or
2. V can be expressed as v = A ∪ B ∪ C where
a. A, B and C are disjoint,
b. A and C are non-empty,
c. B is complete,
d. B separates A and C in G, and
e. A ∪ B and B ∪ C are decomposable.
So, graph G is said to be decomposable if it is complete or admits a proper
decomposition into its decomposable subgraphs. Any decomposable graph can be
recursively decomposed into its maximal prime subgraphs. In other words, a graph
is said to be prime if no proper decomposition exists. Subgraphs that are not decom-
posable anymore are called maximal prime subgraphs [12].
Lauritzen [12] shows that a graph G = (V, E) is decomposable if there exists a
binary rooted tree, called a decomposition tree, whose vertices are labelled by some
non empty subsets of V with the following properties:
Theorem 1 According to Lauritzen and Jensen [10, 12], the following properties
of G are equivalent.
1. G is chordal.
2. G is decomposable.
3. G is recursively simplicial.
4. G has a junction tree.
5. A ∪ B and B ∪ C are decomposable.
Factorization and decomposable graph
The unique prime components resulting from recursive decomposition of a graph
G yield a probability distribution that factors with respect to G. This probability
distribution is written as the product of factors, one for each prime component, i.e.,
maximal clique C ∈ G:
PX = P(X 1 , X 2 , ...., X N ) = K φC (X C ), (5)
C∈G
The log-linear model is a special case of the generalized linear models (GLM). For
variables that have Poisson distribution, the analysis of contingency tables [1] is one
of the well-known applications of log-linear models. The conditional relationship
between two or more discrete categorical variables is analyzed by taking the natural
logarithm of the cell frequencies within a contingency table. For a two-variable
model, this is shown in the following equation:
If only the terms λiX 1 + λ Xj 2 + λkX 3 (called the main effects of the model or zero
order terms) are nonzero, this model represents complete independence among
X 1 , X 2 and X 3 .
In this section, we described the relation between the factorization of the undi-
rected graph according to its prime, undecomposable components and the distribu-
tion. This relationship is important since we will search for a generalized identifying
sequence and calculate the rank deficiency of the Jacobian matrix D(β) if the model is
unidentifiable. In the next section, we will describe model formulation of undirected
graphs with the purpose of studying their identifiability property.
Identifiability of Discrete Concentration Graphical Models … 69
In this section, we give the model on which the iugm is built. The model is presented
concisely in [20]. We assume that the undirected graphical model satisfies the global
Markov property and the joint probability distribution is factorized according to the
undirected graph G.
Assume that we have a sample of v observations x1 , x2 , ..., xv which can be per-
ceived as independent Poisson random variables, with interactions described by the
undirected graph G. For example, the saturated model in Eq. 7 would be described
by a clique on X 1 , X 2 , X 3 in a graphical model, G.
Since we require positive means, the general form of such models (called log-
linear models) is:
log(μ X ) = Zβ (8)
μY = Lμ X (9)
μY = L exp Zβ (10)
4 Algorithm
In this section, we introduce the algorithm on which iugm is based. The algorithm
decides whether an undirected graph G is identifiable or not. Moreover, if the model
is unidentifiable, the algorithm decides where the identifiability breaks and computes
the rank deficiency of the matrix D(β).
The input to the main program consists of three arguments: A which is the rep-
resentation of the undirected graph, t ype which is the type of representation the
aforementioned graph, i.e., ‘incidence’ for incidence matrix, ‘sparse’ for sparse rep-
resentation, and the empty string for the default, which is adjacency matrix, and
v, verbose mode, which is true or false. For graph visualization, the drawGraph
function, iugm uses the Fruchterman-Rheingold force-directed layout algorithm [8]:
each iteration computes O(|E|) attractive forces and O(|V |2 ) repulsive forces.
If the observed graph is composed of two complete components, this graphical
model is unidentifiable, because clumping each component will result in a graph
consisting of the hidden variable and two observable nodes, with no edge between
the observable variables. It is well known and easy to show, by a parameter counting
argument, that a graphical model with two observable nodes and one hidden node is
not identifiable from the observed distribution alone. Hence, the resulting reduced
graph is not identifiable. The function maximalCliques finds maximal cliques for
the complement graph, compGraph, using the Bron-Kerbosch algorithm [6], the
fastest known, with worst-case running time O(3n/3 ).
Even if condition 1 is not satisfied, i.e., there is no clique of size 3 or more in the
complement graph, we calculate the rank of the Jacobian matrix D(β). Hence, we
loop on the cliques of the graph, and calculate the boundary nodes in the complement
graph Ĝ for that clique. Then we check for condition 2, which states that ∀i ∈ Ss , ∃ j ∈
Identifiability of Discrete Concentration Graphical Models … 71
5 Experimental Results
In this section, we give two graphical models. We represent the graph as an adjacency
matrix (A). This matrix is passed as an argument to the main program, mainProg.
For the cases of unidentifiability, the main program returns the constraints and prints
72 M. Sharaf
the deficiency in the Jacobian matrix. For the identifiable models, the main program
returns the generalized identifying sequence (GIS).
5.1 Example 1
In Fig. 1a, we show the adjacency matrix representation for the graphical model. The
program draws the graphical model G as shown in Fig. 1b and its complement Ĝ as
shown in Fig. 1c (cf. lines 2 and 3 in Algorithm 1).
The main program follows the execution path decribed in Algorithm 1. Since
the model is unidentifiable, we look for rank deficiency and where identifiability
breaks down. The program computes the Jacobian matrix and the constraints as
shown in Fig. 2. For the graphical model in Fig. 1b, there is no GIS, because the
boundary for C0 = {1, 4, 5} in the complementary graph of the observed variables
is bdGˆ0 (C0 ) = {2, 3, 6}, which is complete in GˆO as shown in Fig. 1c. The relevant
output of iugm on this graphical model is shown in Fig. 2. The set of constraints
is stored in the MATLAB cell array variable constraints. The constraints can be
represented in terms of the interaction coefficients in β; in this case, the constraints
state that identifiability is lost when the interaction terms satisfy the following system
of linear equations:
Fig. 1 A simple graphical model as a case study: a adjacency matrix A, b graph G and c complement
graph Ĝ
Identifiability of Discrete Concentration Graphical Models … 73
⎧
⎨ β{0,6} + β{0,1,6} = 0
β{0,3} + β{0,3,4} = 0 .
⎩
β{0,2} + β{0,2,5} = 0
6 Complexity Analysis
Let N be the number of cliques in the graphical model. The dimension of the Z
matrix is 2l+1 × K , where l is the number of the observed random variables and K
is computed by the following Eq. (12):
N
K =2∗ (2 S j − 1 − 2 P j ) + 1, (12)
j=1
where S j is the size (number of nodes) of clique j and P j is the number of nodes in
clique j that are also in cliques numbered less than j.
The hierachical log-linear model is used to build the design matrix. We provide a
formula that helps in listing all the interactions.
I nteractions = ∪i=1
N
[(∪Cl Cli Cli Cli
j=1 C j ) ∪ (∪ j=1 [C j ∪ {H }])],
i
(13)
Fig. 3 A dense graphical model tested for identifiability: a adjacency matrix A, b graph G and c
complement graph Ĝ
6.1 Example 2
In the graphical model shown in Fig. 3, both condition 1 and condition 2 are satisfied.
For the GIS C0 = {1, 4, 6, 8}, C1 = {3, 7}, C2 = {1, 6} or {1, 8}, etc, the cardinality
of C2 is equal to that of C1 GIS. As claimed in Proposition 1, bd(3) ∩ bd(7) = φ.
The generically identifiable sequences are computed by iugm as shown in Fig. 4.
In conformance with the existence of GIS, the rank of the Jacobian matrix is full
everywhere, and iugm verifies that.
Fig. 4 The GIS and the rank deficiency for graph G at Fig. 3b
program explores when looking for a GIS. The GIS for any graphical model starts
with a S0 of size n elements, and continues looking for an identifiability sequence if
it exists.
7 Conclusion
and fair judgement on the algorithm that is designated to decide the identifiability of
undirected graphical models with binary variables and exactly one hidden variable
connected to all other (observed) variables.
References
1. Agresti, A.: An Introduction to Categorical Data Analysis, 2nd edn. Wiley-Interscience (2007)
2. Allman, E.S., Matias, C., Rhodes, J.: Identifiability of parameters in latent structure models
with many observed variables. Ann. Stat. 37(6A), 3099–3132 (2009)
3. Besag, J.: Spatial interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. Ser.
B (Methodological) 36(2), 192–236 (1974)
4. Bickel, P.J., Doksum, K.A.: Mathematical Statistics: Basic Ideas and Selected Topics, vol. 1,
2 edn. Prentice-Hall (2001)
5. Bollen, K.A.: Structural Equations with Latent Variables, p. 0471011711. Wiley-Interscience,
ISBN (1989)
6. Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun.
ACM 16, 575–577 (1973)
7. Drton, M., Foygel, R., Sullivant, S.: Global identifiability of linear structural equation models.
The Annals of Statistics, 1003.1146 (2011). http://www.imstat.org/aos/future_papers.html
8. Fruchterman, T., Reingold, E.: Graph drawing by force-directed placement. Softw.-Pract. Exp.
21(11), 1129–1164 (1991)
9. Huang, Y., Valtorta, M.: Pearl’s Calculus of intervention is complete. In: Proceedings of the
Twenty-second Conference on Uncertainty in Artificial Intelligence (UAI-06), pp. 217–224
(2006)
10. Jensen, F.V., Nielsen, T.D.: Bayesian networks and decision graphs, 2nd edn. Springer Pub-
lishing Company, Incorporated (2007). ISBN 9780387682815
11. Jordan, M., Bishop, C.: An Introduction to Graphical Models. MIT Press (2002)
12. Lauritzen, S.L.: Graphical Models. Oxford University Press (1996). ISBN 0-19-852219-3
13. Meek, C.: Strong completeness and faithfulness in Bayesian networks. In: Proceedings of
Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 411–418. Morgan Kaufmann
(1995)
14. Moon, J., Moser, L.: On cliques in graphs. Israel J. Math. 3(1), 23–28 (1965). https://doi.org/
10.1007/BF02760024
15. Pearl, J., Paz, A.: Graphoids: a graph-based logic for reasoning about relevance relations.
Technical Report (R–53–L), Los Angeles (1985)
16. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1988). ISBN 0-934613-73-7
17. Pearl, J.: Causality: Models, Reasoning, and Inference, 2 edn. Cambridge University Press
(2009)
18. Peña, J.M.: Faithfulness in Chain graphs: the discrete case. Int. J. Approx. Reason. 50(8),
1306–1313 (2009)
19. Sharaf, M.: Identifiability of discrete concentration graphical models with a latent variable.
Ph.D. thesis, University of South Carolina, South Carolina (2014)
20. Stanghellini, E., Vantaggi, B.: Identification of Discrete Concentration Graph Models with One
Hidden Binary Variable. Bernoulli (to appear) (2013)
21. Stromberg, K.R.: An Introduction to Classical Real Analysis. Wadsworth International, Bel-
mont, CA (1981)
An Automatic Classification of Genetic
Mutations by Exploring Different
Classifiers
1 Introduction
Before even humans learned about the existence of diseases in their body, they
prevailed. Man became smarter and civilized soon. It was then that they knew the
cause of most of their deaths. Ever since then, they took precautions in whatever way
they could. With the advent of time, this became a profession and researchers kept
searching for the treatment of diseases. Cancer is one such disease that dominates a
vast number of other diseases.
Scientists had been researching on cancer treatment but there is not hundred
percent success yet. Being in the 21st Century and not able to solve a problem,
increases the gravity of the problem. A complete check up of the body is often
done nowadays by most individuals as a precaution for life threatening diseases. In
case of cancer, scientists find it easier to treat the disease in early stage. With the
advent of technology, mere observations have changed to a large expansion of every
problem. Technology has opened new doors in every field and when it comes to
cancer treatment, it has helped in a mesmerizing way by decreasing the amount of
deaths.
Accurate prediction of the type of cancer from the early symptoms had proved to
be the most efficient and is hence focused upon, so that the treatment could be done
in the early stage itself. This introduces the concept of training Machine Learning
© The Editor(s) (if applicable) and The Author(s), under exclusive license 77
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_5
78 B. Soni et al.
2 Literature Survey
In this section we discuss many related papers with their advantages and limitations.
Asri et al. [1], assesses the correctness in classifying data with respect to efficiency
and effectiveness of each algorithm in terms of accuracy, precision, sensitivity and
specificity. Experimental results show that SVM gives the highest accuracy (97.13%)
with lowest error rate. Kuorou et al. [2], used classification algorithms and techniques
of feature selection, outlining three integral case studies which includes prediction
of susceptibility to cancer, recurrence of cancer and its survival using apt Machine
Learning concepts.
Sheryl et al. [3], developed a comprehensive pipeline that performs the classi-
fication of genetic mutations based on the clinical literature. The text features in
their solution was mined from three views, namely entity name, entity text and orig-
inal document view. Different ML algorithms had been used to get text features
from the perspectives of document level, knowledge, sentence and word level. The
results obtained convey that multi view ensemble classification framework gives good
accuracy.
Bruno et al. [4], presents the possibility of using KNN algorithm with TF-IDF
method and framework for text classification. The results of testing showed the
good and bad features of algorithm, providing guidance for the further development
of similar frameworks. Jiang et al. [5], succeeded to reduce computation on text
similarity and improve KNN, thereby outperforming popular classifiers like Naive
Bayes, SVM and KNN. His work was based on clustering algorithms.
Wan et al. [6], uses SVM classifier at the training stage of KNN. It has a low
influence on implementing K parameter, so the accuracy of the KKN classification
retained. However, it needs higher time complexity.
Zhang et al. [7], separated a class of document vectors and its compliment with
the help of hyperplanes. Naive Bayes seemed to be worse. Performance of LLSF
was as close as state-of-art. Performance LR and SVM was quite equivalent.
Leopold et al. [8], TF-IDF weighting scheme had greater impact on the perfor-
mance of SVM than Kernel Functions alone. For SVM Classification, selection of
features and preprocessing is not required. Aurangzeb et al. [9], comparing with
ANN, SVM capture the inherent characteristics of the data better. Bekir et al. [10],
presents detection of drug therapy of leukemia cancer by using Naive Bayes Classi-
fier. Proposed study supports use of personalized drug therapy in clinical practices.
This tool can be used for treatment of variety diseases with similar characteristics.
Watters et al. [11], after performing their own experiments, arrived to a conclusion
that the performance of SVM Classifier was way more good than Artificial Neural
Network for IQ57 as well as IQ87. Not only the performance, but also SVM is less
expensive in terms of computation. So, for datasets that has less categories with doc-
uments, these authors suggested the use of SVM rather than ANN. Mertsalov et al.
[12], says that the automatic classification of documents is much handy for larger
enterprises. The man-made machines for document classification can now perform
80 B. Soni et al.
even better than man himself and hence its use in broader domains will keep enhanc-
ing with time. Image and Speech processing based applications have been discussed
in [13–15].
Li et al. [16], arrived to a conclusion that all of the four classifiers, namely Decision
Tree, Naive Bayes (NB), Subspace method for classification and the nearest neigh-
bour classifier had performed quite good on the data-set of Yahoo. Here, NB gave
the maximum accuracy. Moreover, they observed that combining various classifiers
does not lead to a significant improvement than using a single classifier. Karel et al.
[17], derived that reducing the number of features proved better than not reducing
them. Moreover, this algorithms of feature extraction lead to better performance than
other feature selection methods. Quan et al. [18], showed that methods of smoothing
enhanced the accuracy of NB for the classification of shorter texts. Two stage (TS)
as well as Absolute Discounting (AD) proved to be the best smoothing methods.
The method of N-gram frequency by William et al. [19], gives a less expensive and
more effective way for classifying the documents. It is achieved by the use of sam-
ple of required categories instead of choosing costly and complicated methods like
assembling detailed lexicons or parsing of natural language.
Liu et al. [20], proposed SVM-based classification model. Different classification
techniques like KNN, decision tree, NB, and SVM have been presented. In SVM
by using non linearity mapping the author has changed inseparable sample of low-
dimensional sample space to high-dimensional feature space to make it linearly
separable. Thorsten et al. [21], the author of this paper highlighted the merits of
Support Vector Machine (SVM) classifier for categorization of text, On the Reuters
dataset, KNN had proved to be the best among all conventional methods. SVM, on
the other hand, gave the best results of classification, thereby leaving behind all of
the conventional methods by good margins.
3 Datasets Description
There are nine different classes a genetic mutation can be classified on. This is not
a trivial task since interpreting clinical evidence is very challenging even for human
specialists. Therefore, modeling the clinical evidence (text) will be critical for the
success of your approach. The nine mutation classes are as follows:
• Gain of function
• Likely gain of function
• Loss of function
• Likely loss of Function
• Neutral
• Likely_Neutral
• Switch of Function
• Likely Switch of Function
• Inconclusive.
An Automatic Classification of Genetic Mutations by Exploring Different Classifiers 81
We have provided the dataset for training and testing in two separate files. The first file
(training/test_variants) provides information regarding the genetic mutations while
the second file (training/test_text) provides clinical literature (text) that pathologists
use for the classification of mutations.
The data we used was collected from a competition in Kaggle, titled Personalised
medicine Re-defining cancer treatment [22]. This data was prepared by Memorial
Sloan Kettering Cancer Center (MSKCC). The size of the merged dataset is 528 MB.
Training_variants: It is a CSV file that contains genetic mutation descriptions that
are used in training. The fields are ID (this id of each row links a mutation to clinical
literature), Gene (where a mutation is present), Variation (change in the amino acid
sequence for the mutation), Class (genetic mutations are classified into 9 different
classes) (Fig. 1 and Table 1).
Training_text: This is one double pipe (||) delimited file containing the clinical
literature (in text format) which classifies the genetic mutations. The fields include
ID (this id of each row links a mutation to clinical literature), and Text (this contains
clinical evidence which contributes to the classification of the genetic mutations)
(Fig. 2 and Tables 2, 3).
500 454
400 361
300
220
194
200
100 71
15 30
0
1 2 3 4 5 6 7 8 9
Class
Test_variants: The Test_variants is a unlabelled data file where the file contains
all the fields as mentioned in the Training_variants except the Class. The fields are
ID (the id of the row used to link the mutation to the clinical evidence), Gene (the
gene where this genetic mutation is located), Variation (the aminoacid change for
this mutations)
Test_text: The Test_text data file is similar to our Training_text. The fields include
ID (this id of each row links a mutation to clinical literature), and Text (this contains
clinical evidence which contributes to the classification of the genetic mutations).
An Automatic Classification of Genetic Mutations by Exploring Different Classifiers 83
4 Proposed Methodology
Without preprocessing the raw data, we cannot imagine to implement Machine Learn-
ing algorithms unless we do not care for the model accuracy, which definitely is our
prime concern. Working on raw data is many a time as good as not working. Hence,
preprocessing becomes one of the most integral initial steps to support different algo-
rithms. In our model, the accuracy would have been significantly less had we not
done any preprocessing. An example of preprocessing includes reducing the dimen-
sion of the dataset, i.e. the selection of those features that lead to better accuracy.
Preprocessing of raw data is characterized by a few steps which are explored as
follows.
Data Cleaning forms the most initial step of Preprocessing. Sometimes, the dataset
may not contain some data in between, while at sometimes we have noise in the
data. Moreover, there are outliers in most classes which are preferred to be detected
and removed. We would also want to minimize the duplication of the same data. To
tackle all these problems, we have used Data Cleaning.
Sometimes, people collect data from various sources since the entire dataset is not
always available in one place. Different data lie scattered in different places and
hence, after collection of all these data, they have to be made consistent, so that after
cleaning the data, it can be useful for analysis.
The machine does not understand words or strings. Hence, in order to implement
various ML algorithms on our dataset, we first need to convert every word to numeri-
cal form so that it is understood by the machine. This conversion of text to numerical
form is done by encoding and embedding. The different encoding and embedding
techniques we have used are discussed below.
Term Frequency (TF): This gives more significance to the most frequently appearing
word in the document. For example, if a word appears once in a document and total
number of words in that document is five, then TF value of that word is 1/5.
Using a neural network language model, Word2Vec computes the vector represen-
tation for every word present in a document. Every word is trained in the model to
maximize log probability with the words in its neighbour. So, from the words that
is embedded, we get the vector representations which help us to find the similarities
among them.
Word2Vec is not an algorithm for text classification but an algorithm to com-
pute vector representations of words from very large datasets. A characteristic of
Word2Vec is that those words with similar context have their vectors in the same
space. For example, countries would be close to each other in the vector space.
Another property of this algorithm is that some concepts are encoded as vectors.
Using word representations provided by Word2Vec we can apply math operations to
words and so can use algorithms like Support Vector Machines (SVM) or the deep
learning algorithms (Tables 5, 6, 7, 8, 9 and Fig. 3).
86 B. Soni et al.
5 Classification Models
probability of the classes can be done in several different ways. When training the
algorithms, one can use different optimization methods to fit the data to a model.
In the Naive Bayes Classifier, the probability classifier is simplified by the assumption
of class conditional independence among features and words of a given class. The
classifier works according to Bayes Theorem where the predictors are assumed to be
independent. In a nutshell, a given feature is considered independent of other features.
Let us take the following example: an animal can be considered as carnivorous if it
eats meat.
The Naive Bayes model is quite easier to implement. It is very useful for datasets
that are large enough. Besides its simplicity, it is renowned for outperforming newer
and sophisticated classifiers.
In Naive Bayes, the discriminant function is given by its posterior probability,
which is obtained from the Bayes theorem. The posterior probability is given by
(Fig 5).
ea+bX
P=
1 + ea+bX
P(x/c)P(c)
P(c/x) =
P(x)
Here,
– P(c/x) denotes posterior probability
– P(x/c) denotes Class Conditional Probability
– P(c) denotes Class Prior Probability
– P(x) denotes predictor prior probability.
90
This is the most used classification algorithm. The working principal Support vector
machine is mainly based on marginal calculations. By a hyper plane it divides the data
points into different subclass. In case of linear SVM, the mathematical expression
of the hyperplane is written below (Fig. 6):
xi w + b = 0. (3)
For minimal classification error the distance of the hyper plane and classes should
be maximum. The hyperplane to have the most extreme edge, which can amplify the
separation of the hyper-plane and closest focuses from the two classes.
The mechanism of optimal hyper plane for division of training data without error
can be examined using soft margin which will permit an interpretive technique of
learning with errors. The optimal hyper plane can be describe as below, if there are
n number of training patterns say
Than the optimal hyper plane Eq. 4 should divide the training data with ultimate
margins.
w0 × x + b0 = 0. (6)
Development of the mechanism of Support vector machines were done for train-
ing data separation with minimal or no error. Later the support vector approach is
extended to overlay the idea of division without error on the training vectors is not
achievable. As robust and ground breaking as neural networks the support vectors
can be considered as a new learning machine with this new extension.
Random forest algorithm can use both for classification and the regression kind of
problems. RF is basically a supervised learning algorithm which generates a forest
for the evaluation of the results. It constructs many decision trees by choosing K data
points from the data set. Then, they are all merged together for obtaining more stable
as well as accurate predictions. For each K data points decision tree we have many
predictions and then we take the average of all the predictions.
RF is one Ensemble learning algorithm, because many models are merged together
to predict a single result. RF generally use ID3 algorithm for decision tree training.
There are three principle decisions to be made while constructing a random tree.
These are the strategy for part the leafs, the kind of indicator to use in each leaf, and
the technique for infusing haphazardness into the trees. And Gini measure is used
for calculating the utilization of split feature. The mathematical explanation of Gini
can be given here (Fig. 7),
Gini(Pm ) = Qmn (1 − Q m n ). (7)
n
94 B. Soni et al.
Training
Variants Embedding
Logistic
Prediction
Regression
Training
Text Word
Embedding
Random
Prediction
Forest
Testing Mean
Variants Embedding
Support
Vector Prediction
Training Encoding Machine
Text
BOW, TF-IDF
Training and Prediction
Input Data
Feature Generation
Our proposed architecture consists of three phases. They are: Input Data, Feature
Generation, Training and Prediction. Each of them are explained as follows (Fig. 8).
Input Data: The dataset that we have collected from the Kaggle competition
was splitted into Training and Testing Data, which served as input for encoding and
embedding. The input data was a set of three features: Gene, Variation and Text.
Each of these features is of the form of strings. Hence, they ought to be converted to
numerical form for the implementation of classification algorithms.
An Automatic Classification of Genetic Mutations by Exploring Different Classifiers 95
70 BOW
TFIDF
WORD2VEC
DOC2VEC
60
50
Score of Model
40
30
20
10
Classification Models
classes. Hence, Word2Vec and Doc2Vec Embedding performed poorer in our dataset
(Fig. 11).
The next issue with our dataset is that it contains myriad words that are totally
insignificant for classification. For example the text entry of the Text feature contains
lot of bibliography content. Thus, the dataset has noisy information leading to useless
vectors of a document. Hence, as the vectors start to change from what it should be, the
probability of misclassification increases. So, Word2Vec and Doc2Vec Embedding
did not work well on our dataset.
• False negative (FN), the algorithm classifies 0 where the correct class is 1 (Figs. 12,
13, 14 and 15).
Our work has contributed in developing a comprehensive pipeline for performing the
classification of genetic mutations based on clinical evidence. Different ML algo-
rithms are explored to get features of the text. Different combinations of classifiers,
100 B. Soni et al.
Fig. 14 WORD2VEC
embedding with RF
Fig. 15 DOC2VEC
embedding with RF
encoding and embedding techniques have been tried to give us the best possible score
in classifying a test mutation automatically as a driver or a passenger.
Logistic Regression with TF-IDF encoding gave the best score, despite of using
Word2Vec and Doc2Vec embedding to capture semantics. The issues with our dataset
as discussed could be removed using a better dataset. So, our model should give much
more accuracy when the potential of Word2Vec and Doc2Vec are used to their full
potential. Future work of this work is discuss below.
• Million Gigabytes of data are produced every day. In our case of detection and
diagnosis of cancer also, generation of new data and better observations are going
An Automatic Classification of Genetic Mutations by Exploring Different Classifiers 101
on. So, the clinical literature would eventually be improved for proper analysis
and classification of the test mutations. As soon as we get another dataset, we
can make use of all our encoding and embedding techniques to its full potential,
thereby giving a hike to the accuracy.
• Use of XgBoost and Ensemble methods like multiple Gradient Boost Decision
Trees classifier.
• Entity text view for genes and variation can be utilized to distinguish the different
process as there is the presence of similar text entry in different data points. This
could improve Word2Vec and Doc2Vec embedding technique.
Acknowledgments This work was supported by Multimedia and Image Processing Laboratory,
Department of Computer Science and Engineering National Institute of Technology Silchar, India.
References
1. Asri, H., Mousannif, H., Al Moatassime H., Noel, T.: Using machine learning algorithms for
breast cancer risk prediction and diagnosis. Procedia Computer Science, vol. 83, pp. 1064–1069
(2016)
2. Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine learning
applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)
3. Zhanga, X.S., Chena, D., Zhua, Y., Chea, C., Suc, C., Zhaod, S., Mina, X., Wanga, F.: A
multi-view ensemble classification model for clinically actionable genetic mutations (2018).
arXiv:1806.09737
4. Trstenjak, B., Mikac, S., Donko, D.: Knn with tf-idf based framework for text categorization.
Procedia Eng. 69, 1356–1364 (2014)
5. Jiang, S., Pang, G., Wu, M., Kuang, L.: An improved k-nearest-neighbor algorithm for text
categorization. Expert Syst. Appl. 39(1), 1503–1509 (2012)
6. Wan, C.H., Lee, L.H., Rajkumar, R., Isa, D.: A hybrid text classification approach with low
dependency on parameter by integrating k-nearest neighbor and support vector machine. Expert
Syst. Appl. 39(15), 11880–11888 (2012)
7. Zhang, T., Oles, F.J.: Text categorization based on regularized linear classification methods.
Inf. Retr. 4(1), 5–31 (2001)
8. Leopold, E., Kindermann, J.: Text categorization with support vector machines. how to repre-
sent texts in input space? Mach. Learn. 46(1–3), 423–444 (2002)
9. Khan, A., Baharudin, B., Lee, L.H., Khan, K.: A review of machine learning algorithms for
text-documents classification. J. Adv. Inf. Technol. 1(1), 4–20 (2010)
10. Karlik, B., Öztoprak, E.: Personalized cancer treatment by using naive bayes classifier. Int. J.
Mach. Learn. Comput. 2(3), 339 (2012)
11. Basu, A., Walters, C., Shepherd, M.: Support vector machines for text categorization. In:
Proceedings of the 36th Annual Hawaii International Conference on System Sciences, pp. 7–
pp. IEEE (2003)
12. Mertsalov, K., McCreary, M.: Document classification with support vector machines. ACM
Computing Surveys (CSUR), pp. 1–47 (2009)
13. Soni, B., Das, P.K., Thounaojam, D.M.: Keypoints based enhanced multiple copy-move forg-
eries detection system using density-based spatial clustering of application with noise clustering
algorithm. IET Image Process. 12(11), 2092–2099 (2018)
14. Soni, B., Das, P.K., Thounaojam, D.M.: Improved block-based technique using surf and fast
keypoints matching for copy-move attack detection. In: 2018 5th International Conference on
Signal Processing and Integrated Networks (SPIN), pp. 197–202. IEEE (2018)
102 B. Soni et al.
15. Soni, B., Debnath, S., Das, P.K.: Text-dependent speaker verification using classical lbg, adap-
tive lbg and fcm vector quantization. Int. J. Speech Technol. 19(3), 525–536 (2016)
16. Li, Y.H., Jain, A.K.: Classification of text documents. Comput. J. 41(8), 537–546 (1998)
17. Fuka, K., Hanka, R.: Feature set reduction for document classification problems. In: IJCAI-01
Workshop: Text Learning: Beyond Supervision (2001)
18. Yuan, Q., Cong, G., Thalmann, N.M.: Enhancing Naive Bayes with various smoothing methods
for short text classification. In: Proceedings of the 21st International Conference on World Wide
Web, pp. 645–646 ACM (2012)
19. Cavnar, W.B., Trenkle, J.M. et al.: N-gram-based text categorization. In: Proceedings of
SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval,
vol. 161175. Citeseer (1994)
20. Liu, Z., Lv, X., Liu, K., Shi, S.: Study on SVM compared with the other text classification
methods. In: 2010 Second International Workshop on Education Technology and Computer
Science, vol. 1, pp. 219–222. IEEE (2010)
21. Joachims, T.: Text categorization with support vector machines: learning with many relevant
features. In: European Conference on Machine Learning, pp. 137–142. Springer (1998)
22. https://www.kaggle.com/c/msk-redefining-cancer-treatment/data (2017)
Towards Artificial Intelligence: Concepts,
Applications, and Innovations
Abbreviations
AADRP Agency for Advanced Defense Research Projects
AAOD American Air Operations Division
AI Artificial Intelligence
AIVA Artificial Intelligence Virtual Artist
ATC Air-Traffic-Control
DAAI Design Assisted by Artificial Intelligence
HVAC Heating, Ventilation and Air Conditioning
ITS Intelligent Tutoring Systems
KBS Knowledge Based Systems
KE Knowledge Engineering
MEC Mobile Edge Computing
ML Machine Learning
MOOC Massive Open Online Courses
ZAML Zest Automated Machine Learning
1 Introduction
AI was appeared right after the first uses of the computers, more precisely at the
Dartmouth congress in 1956, following the contribution of Alan Mathison Turing
[1]. The four participants in this congress concluded that AI is: “the possibility of
designing an intelligent machine”. However, the science of “making machines work
intelligently” is generally called AI. Indeed, AI does not has a common definition.
Marvin Lee Minsky is the first definer for IA, it presents the IA as “the construction of
computer programs which engage in tasks which are for the moment, accomplished
in a way which can equalize human intelligence” [2]. Then, Entwistle defined AI as
the intelligence presented by machines [3]. Furthermore, one of the first textbooks
defined it as “the study of ideas that allow computers to be intelligent”. This definition
was updated in the following years, and the AI was then seen as an attempt to
“make the computer do things for which people are better at the moment” [4]. It is
also important to note that the term AI remains controversial, and left this question
unanswered: “if a machine can ever be intelligent”? [5].
The first use of this innovative technology (AI) was back to John McCarthy [2]
for effecting rather complex calculations. Subsequently, AI has been used to develop
and advance many areas, including finance, health, education, transportation…etc.
Among the most important applications of AI are the automaton and the robot.
These two applications have the same function, but their designs are different. The
automaton represents a mechanical concept, while the robot is much more electronic,
perfected in particular with the aid of computers [6]. Some robots are automata
and others are not. An automatic robot is a system that executes the tasks to be
performed automatically. Even after finishing his tasks, he/she will continue them
without stopping until the intervention of the human being. Some machines that could
be considered as intelligent are not in reality. For example, the calculators are not AI
concepts, because they perform requested calculations without being aware of them.
They are equipped with programs that allow them to quickly solve the requested
calculations.
At the beginning of the 21st century, the concept of AI has better developed and it is
finding more applicability in the military and civil fields. Afterwards, it is considered
to be a simulation of the processes of human intelligence by machines, in particular,
computer systems. These processes include learning (acquiring information and rules
for using information), reasoning (using rules to reach approximate or definitive
conclusions) and self-correction. Specific AI applications include expert systems,
voice recognition, and machine vision [1]. Hence, AI refers to any computer system
that uses a logical process to learn and improve taking into account the system
environment and previous errors. Therefore, one could argue that AI is a generic
term encompassing intelligent robotics, ambient intelligence, machine automation,
autonomous agents, reactive and hybrid behavioral systems, and big and small data [7,
8]. It can be said that developments in robotic systems have occurred in parallel with
developments in the field of AI. During the initial stages of AI-based technological
developments, robotics was more seen as a technology involving automatic systems
Towards Artificial Intelligence: Concepts, Applications … 105
where the machine would perform the preprogrammed activity. Current robotics is
known for its decision-making capabilities. For example, speech recognition systems
first interact with the user to gather information about the characteristics of their voice.
For the design and implementation of projects based on AI technology, several
approaches are necessary in the realization, such as the domain ontology which is
used as a representative form of knowledge about a world or a certain part of this
world [9, 10]. They also describe [10]: individuals (basic objects), classes (sets,
collections, or types of objects), attributes (properties, functionalities, characteris-
tics or parameters that objects can possess and share), relationships (the links that
objects can have them), events (changes to attributes or relationships), meta-class
(collections of classes that share certain characteristics) [11]. The use of ontologies
for knowledge management has proven to be advantageous in the field of research
in AI where knowledge management is based on the representation of knowledge
to simulate human reasoning to model knowledge in a usable way by machine [12].
They allow the representation of knowledge and the modeling of reasoning which
are fundamental characteristics of KBS (Knowledge-Based Systems) [12]. The idea
of distributed problem solving dates back to the mid-1970s with, the languages of
the actors and the architecture model or blackboard, initially proposed for automatic
understanding [13]. Therefore, a distributed AI system includes a set of autonomous
entities, called agents that interact with them to complete a task that helps solve a
complex problem [14]. The concept of agent has been studied not only in AI but also
in philosophy or psychology [15]. Then, with the development of the Internet, there
are several names appearing: resource agent, broker agent, personal assistant, mobile
agent, web agent, interface agent, avatar, etc. In general, an agent is an informatics
entity, locates in an environment, and acts autonomously to achieve the objectives for
that it was designed. Its agents can also be physical entities (machines, manipulating
robots, etc.): the domain is then that of multi-robot systems.
The remainder of this paper is organized as follows. Section 2 presents Intelligence
and AI. Section 3 explains the evolution of Big Data and Data Mining. Section 4
explains the research paradigms of AI. Section 5 details each application’s examples
of AI. Section 6 clarifies the advantages and disadvantages of AI. Section 7 provides
a discussion about our research. Finally, Sect. 8 concludes the paper.
2 Intelligence and AI
The word “intelligence” is derived from the Latin intellegere or intelligere (“to choose
between”) [16]. It should be noted that the Internet, by amplifying the mechanisms for
reinforcing information between human beings, has greatly contributed to installing
the concept of collective intelligence. Intelligence is the dynamic ability to be able
to make inferences between stimuli, to deduce abstractions, to create a language that
allows naming, exchange and make connections between these abstractions. Intel-
ligence has made it possible to define the concept of context, thus being able to
explain that the links are not necessarily repetitive. It is in all these capacities that
106 D. Saba et al.
humans are distinguished from other mammals. Not only can a dog not say “tomor-
row”, but it is likely that the concept of “tomorrow” is not developed in its cognitive
capacities. Now have tools to understand the brain and know that the development of
the cerebral cortex has allowed humans to define abstractions. Language functions,
integral to intelligence, have been identified in certain areas of the brain, the Broca
and Wernicke areas. Some animals have these areas, but they are highly developed in
humans. It should be noted that this development of the brain explains an interesting
phenomenon: most mammals are born almost adults, except humans. A foal can walk
after a few hours, for example. On the other hand, it is because the development of
the brain cannot take place in utero (because it would be too large to cross the cervix)
that humans are born “too early”. In doing so, they are quickly subjected to all stimuli
from a more varied environment than the mother’s womb. The impact on intelligence
development may be better understood with the discovery of the fundamental role of
epigenetics, a science studying the relationships between genes and the environment.
It is therefore not easy to define intelligence with language as the only tool. A
dictionary has the particularity of being self-referencing since words are defined
using words. On the other hand, you have to understand artificial intelligence as an
attempt to understand intelligence using computer programs.
2.1 AI in History
levels, such as MacHack’s Richard Greenblatt [1]. The 1970s research in this field
was distinguished by the idea of providing the machine with the ability to implement
advanced strategies that develop dynamically through play (as in the works of Hans
Berliner). However, it is primarily a computer computing power capable of exploring
the giant harmonic spaces that conquer the discipline world champion (Deep Blue
computer victory over Gary Kasparov, in 1997) [1]. AI’s 1970s were also the first
experiments to be carried out on portable robots (such as the Shakey Robot from
SRI at Menlo Park in California), which jointly formed computer vision problems,
knowledge representation, and planning [22]. A decade or so later, Rodney Brooks of
the Massachusetts Institute of Technology interested in robot companies that respond
to their environment [23].
As this brief historical outline illustrates, AI was first developed on a large scale
in the United States, before the mid-1970s, and then researchers from Europe and
Asia came after them.
a rather inductive approach based on human models, and “hard sciences”, which,
conversely, started from the data and refused all anthropomorphism.
• Imitation of nature: the third school wanted to model nature and the human
brain. It was the start of neural networks, a category of algorithms that use the
brain paradigm to solve problems. Neural networks are currently dominant in
all applications related to pattern recognition: language comprehension, machine
translation, image analysis, etc. They are, for example, the basis of autonomous
cars.
Big data is the big break of our time, which has allowed artificial intelligence to
make huge leaps with a world of mass data production. Some are public, others are
private. In February 20, 2020, could observe in a single second: 8,856 messages sent
on Twitter, download of 972 photos on Instagram, 1,658 messages sent on Tumblr,
4,437 calls on Skype, 87,429 GB of Internet traffic, 80,772 searches on Google,
82,767 videos viewed on YouTube, 2,884,128 Emails sent [25, 26] (Table 1).
The possession of this data gives enormous power to GAFA [27] and their Chinese
equivalents. Nevertheless, Google offers us programming interfaces allowing us to
query its own. Google Maps, for example, allows access to maps of the entire planet.
Google offers a site where can compare trends on several keywords, with the possi-
bility of zooming in on geographic areas, on time scales or even filtering on cate-
gories. As an example, can thus compare the evolution of interest in “Macron” versus
“Trump” research between February 2019 and February 2020 (Fig. 1).
Google, Apple, Facebook, and Amazon couldn’t help but invest heavily in AI
when they have huge amounts of data. Here are a few examples.
• Google: this is very involved in AI, usually proceeds by the takeover. In 2014,
Google bought the British company DeepMind, which had developed neural
networks for playing video games [28]. DeepMind’s stated goal today is to “under-
stand what intelligence is.” DeepMind is famous for its AlphaGo program, which
beat the world champion of go. In October 2017, the program took an additional
step: by playing against itself, not only was its learning shorter but above all, it
became stronger than the previous version [29]. The first example of unsuper-
vised learning, facilitated by the fact that the context, namely the rules of the
game of go, is perfectly mathematizable. Google also has its own recommenda-
tion engine called Google Home, a speakerphone and a voice assistant available
in three different versions.
• Amazon: uses AI in its recommendation engine, named Echo, and in its assistants
based on its voice recognition system, Alexa, available in seven different versions
[30]. Through its cloud service offering, Amazon also offers services based on
AI, such as speech recognition or chatbots, the famous chatbot.
Towards Artificial Intelligence: Concepts, Applications … 109
Table 1 Rate of use of informatics tools linked with big data and data mining
Date Tool Task Utilization rate (in Utilization rate
seconds) (since the opening
web page)
January 20, 2020 Twitter Send messages 8856 1,132,273 Tweets
since opening this
page 0:02:08 s ago
[25, 26]
Instagram Download photos 972 436,478 New photos
since opening this
page 0:05:30 s ago
[25, 26]
Tumblr Send messages 1658 1,797,367 Tumblr
posts since opening
this page 0:08:44 s
ago [25, 26]
Skype Calls (voice 4437 3,112,981 Calls
communications) since opening this
page 0:11:42 s ago
[25, 26]
Internet Internet traffic 87429 76,145,595
Gigabytes since
opening this
page 0:14:31 s ago
[25, 26]
Google Google searches 80772 101,686,086
Searches since
opening this
page 0:20:59 s ago
[25, 26]
YouTube The vision of the 82767 92,237,136 Videos
videos since opening this
page 0:18:34 s ago
[25, 26]
Internet Send Mails 2884128 4,209,690,894
Emails since
opening this
page 0:23:24 s ago.
≈ 67% of email is
spam [25, 26]
• Facebook: is a huge user of AI. It chooses the messages it displays using a recom-
mendation engine type engine. Facebook recently implemented an artificial intel-
ligence engine to detect suicidal tendencies. As Joaquin Candela, director of the
applied AI department says, “Facebook would not exist without AI”. According
to Terena Bell [31] there are several ways in which Facebook uses AI among
110 D. Saba et al.
Fig. 1 Evolution in all countries of interest in “Macron” versus “Trump” research between February
2019 and February 2020
4 AI Research Paradigms
To attack a problem, it can solve it into sub-problems and then analyze it, etc. A set
of possible decompositions can be represented by a ‘sub-problem graph’. Replacing
a problem with an equivalent problem can be seen as a particular form of decomposi-
tion. In this case, looking for a sequence of equivalence operations which leads to an
already resolved problem. By representing the problems by vertices and the equiva-
lence operations by edges, then, come back to the search for a path, possibly optimal
or close to the optimum, in a “state graph”. In general, there are so many possible
routes that one cannot examine all the alternatives. On the other hand, it sometimes
happens that research can be guided by knowledge of more or less analytical or
experimental or empirical origin that is called “heuristics”: this is called “Heuristi-
cally Ordered Resolution”. So to build an itinerary, trying to reduce the distance as
the crow flies between the objectives is a familiar and often effective heuristic. The
heuristics and the various algorithms which are capable of exploiting them are inter-
esting if they favor the discovery of satisfactory, even optimal, solutions in relation
to the difficulty of the problem and the resources mobilized.
After the success of programs playing backgammon, the victory of the Deep Blue
chess player system in 1997 against reigning world champion Gary Kasparov is a
spectacular testament to the progress made over the past half-century to explore vast
areas sub problem graphs [37]. Heuristically Ordered Resolution and Resolution by
Satisfaction of Constraints can be used, possibly together, in various contexts: iden-
tification, diagnosis, decision, planning, demonstration, understanding, learning…
These approaches are today able to tackle problems of greater size or more complex,
possibly in dynamic environments with incomplete or uncertain information. Their
development goes hand in hand with advances in the power of computer systems.
Towards Artificial Intelligence: Concepts, Applications … 113
4.3 Collective AI
Since its inception, AI has mainly focused on the theories and techniques allowing
the realization of individual intelligence. There is another form of intelligence—
collective that one - like simple multi-cellular beings, colonies of social insects,
human societies. These sources of inspiration show that a higher form of intelligence
can result from the correlated activity of simpler entities. In artificial systems, this
field is called Distributed AI or Multi-Agent Systems, which include in the term
of Collective Artificial Intelligence (CAI) [15]. Then, in the mid-1970s, part of AI
explored the potentials of knowledge distribution leading to distributed problem-
solving. The distribution of control is the main characteristic of the second generation
of systems in which a single entity does not have control of the others and does not
have a global view of the system [38]. But therefore, it is not enough to have a bunch
of agents to obtain a system of the same name; just like a pile of bricks is not a
house [39]. Thus, the field of study currently relates to the theories and techniques
allowing the realization of a coherent collective activity, for agents who are by nature
autonomous and who pursue individual objectives in an environment of which they
have only one perception partial [7].
These agents have the ability to interact with other agents and their environment.
They are social in the sense that they can communicate with others [11]. When
they react to perceived events, they are called reactive. They are proactive if they
have the capacity to define objectives, to take initiatives… Today; the concepts of
autonomy, interaction, dynamics, and emergence are increasingly taken into account
for the specification of these systems, leading to specific design methodologies. In
addition, any agent that requires ‘intelligence’ can benefit from the contribution of
the other AI techniques. However, the specificity of the field relates on the one hand
to organizational theories allowing the realization of a ‘collective intelligence’ under
the previous constraints, and on the other hand to the techniques of interactions
between agents to manage all the unforeseen (conflict, concur open system or in a
dynamic environment.
The CAI has developed in particular to facilitate the simulation of applications
distributed geographically, logically, semantically or temporally [12]. Multi-agent
platforms are privileged tools for simulating companies of individuals immersed in a
common environment, such as a team of robot footballers. This is particularly the case
with environmental simulations where wish to observe the influence of individual
behavior on global phenomena (pollution, forecasting of floods, etc.). In addition,
the CAI is aimed at applications for which it is difficult to design and validate large-
scale software with conventional development techniques. Thus, the CAI is also
concerned with the design of software components which will be intended to be
reused. Software “Agentification” should make it possible—in the long term—to
assemble them from a library without knowing their specification finely: they are
responsible for adjusting their individual behaviors in the prior framework of the
task defined by the application. Agent and multi-agent technology aims to become
a new programming paradigm in the years to come. This field is also motivated by
114 D. Saba et al.
The common sense reasoning allows him to reconcile situations and transpose solu-
tions and to take advantage of both general facts and examples, or even to capture
quantitative information [41]. Such reasoning processes exceed the capacities of
representation and inference of classical logic (specially developed in the 21st century
in relation to the question of the foundations of mathematics), and of the theory
of probabilities with regard to certain aspects of uncertainty. This does not mean,
however, that the reasoning that man makes outside of mathematical deduction is
completely devoid of rigor, or of practical value. Thus AI is interested in formal-
izing different forms of reasoning studied for a long time by philosophers: deductive
Towards Artificial Intelligence: Concepts, Applications … 115
better than another. In addition, these models often assume the knowledge of numer-
ical functions to evaluate the choices, and of probability distributions to describe the
uncertainty on the outcome of the actions. However, the preferences of the agents are
not always known in a complete (and coherent!) manner in the form of evaluation
functions.
The problem of decision-making was not one of the central concerns of AI until the
early 90s: AI, being turned towards symbolic and logical modeling of reasoning, was
indeed quite far from the numerical operations of compromises used in the decision
[44]. However, it has appeared for a few years that AI can bring tools to the decision-
making problem allowing a more flexible and more qualitative representation of
information, preferences, and goals, and proposes formulations which then lend
themselves more easily explanatory possibilities of the proposed decisions. Such
approaches are useful in guiding a user in his choices, by offering him solutions that
meet his preferences, which he can change if necessary. A decision support system
can also be based on the known results of decisions made previously in situations
similar to that where a decision must be proposed. Finally, planning (which includes
monitoring the execution of tasks) is a decision problem studied for thirty years
in AI. It is a question of determining a chain of actions (the simplest possible)
which makes it possible to achieve a goal from a given situation. The planning of
elementary operations for a robot (or more generally a set of robots) in order to
achieve a more global task is the classic example in AI of this problem, which can
also be encountered more generally in the organization of complex activities. One of
the difficulties lies in the fact that a series of actions leading to the partial satisfaction
of the desired goal may not constitute a piece of a solution making it possible to
achieve the goal completely. Since the early 1990s, the use of probabilities or other
models of representation have made it possible to extend this problem to uncertain
environments (where, for example, an action can fail), taking into account cost and
the possibility of including informative actions (which provide information on the
state of the world) at each decisional stage.
The previous situations concerned only an isolated agent (faced with a possibly
uncertain environment). For the past ten years, AI researchers have also been
concerned with decision-making concerning a group of individuals. Note that this is
a centralized decision (as opposed to the “Collective Artificial Intelligence”‘ section
where it is a distributed decision): a group of agents, each with their own preferences,
must reach a joint decision [45]. The quality criterion of this joint decision depends
on the type of application considered: thus, for voting problems or more generally
the search for compromise, it is important to obtain a fair solution; whereas for so-
called combinatorial auction problems (increasingly used in electronic commerce),
it is a question of maximizing the total gain of a set of sellers, by determining
an optimal allocation of goods to agents (buyers), knowing that each agent has
previously expressed the amount of money he is willing to pay for each possible
combination of goods.
Towards Artificial Intelligence: Concepts, Applications … 117
An intelligent system has two basic capabilities. On the one hand, it is the possibility
of perceiving the outside world, and on the other hand the ability to act on it. The
perception is done through sensors, while the action is accomplished through motor
control (the effectors of a robot) [46]. In computer language and control theory, it
is these sensors that provide the inputs, while the effectors provide the outputs. An
instance of this perception/action scheme is for example communication between
agents. From this analysis, obtain the basic architecture of an intelligent system if
add as a third component of the internal state in which the agent is at a given instant.
The actions taken by the agent are then a function of this internal state, which is
modified according to perceptions. To be considered intelligent, the internal state
of this system must allow it to reason. This is why in the standard approach, this
internal state is based on a symbolic representation of the external world, consisting
of “facts”, that is to say, assertions corresponding to what the agent believe to be true.
In the world these internal facts are often called the agent’s beliefs [10]. Of course,
these beliefs are generally imperfect (incomplete, mistaken), and new perceptions
frequently lead to a change, or update, of beliefs. At an elementary level, an agent
performs an action, most often within a sequence of actions, also called a plan. At
a higher level, the agent who performs an action seeks to achieve a goal, which he
believes to achieve by executing a previously generated plan. It is, therefore, his
beliefs that come into play when he plans, that is to say, he reasons to generate a plan
of elementary actions. Note that a plan can also call on perceptual capacities through
communicative actions or test actions. In addition, the formalisms which have been
proposed in the literature to take into account the evolution of knowledge bases
(the formalisms for updating and reasoning about actions) come up against several
difficulties. The best known is the famous problem of the persistence of information
during a change (the problem of “decor”): how to avoid describing the non-effects
of a change? As surprising as it may seem, many formalisms force us to point out
that turning on my computer will not change my neighbor’s state. Today generally
consider that the decor problem has been solved in its simple version, namely that the
actions have no “side effects”: turning on the central unit of my PC will not turn on
its screen. It is, therefore, the interaction of this last problem called “branching” and
the problem of decor in all its generality which constitutes a formidable challenge
not yet resolved to date.
The evolution of an agent is punctuated by the perception of objects, events, and
facts, and by the execution of actions. On the other hand, generating a plan means
determining the actions to be carried out as well as the sequence they constitute, that
is to say, their temporal scheduling. Time, a fundamental parameter in the reasoning
of agents, must, therefore, be represented. As with so many fields, the ontological
nature of time is not unquestionably imposed on us, and different approaches to
the representation of time have been proposed in the literature [13]. One factor that
has motivated many theoretical developments concerns the notion of the duration
of temporal entities. Either assume that the various events (actions and perceptions)
118 D. Saba et al.
are instantaneous (or that their temporal component can be expressed in terms of
instants), or consider their temporal extent as being non-zero and irreducible. This
second approach is the only valid one in general if the agent does not have an internal
clock determining an “absolute” time, and can only situate various events in relation
to each other, such as the opening of the tap, flowing water, filling the glass and
drowning the fly. Studies have shown that temporal reasoning systems known as
“interval calculation” (although the term “interval” in mathematics denotes convex
sets of instants, these are extended primitive temporal entities) are unfortunately more
complex than those based on instants. All scheduling problems expressed in terms
of intervals and which are “calculable” have not yet been identified and classified.
A distinction is made between epistemic (effect on knowledge) and ontic (effect on
the physical world) actions, considering that the former have effect only on internal
states, while the latter have effects on the state of the world and therefore fit into
the space surrounding the system [47]. Being able to reason about these last actions,
therefore, requires space modeling. As with time, they are then faced with choices
of representation. Although it is possible to resort to classical mathematical analysis
(Euclidean space, Cartesian coordinates), many approaches have preferred modeling
which is not only closer to human cognition but also more robust in the presence
of incomplete information or imprecise. This is why the spatial reasoning called
“qualitative” is most often based on a space-based on objects, bodies or volumes
having a perceptible spatial extension, rather than on the primitive abstract entities
that are the points and the figures of the classic geometry. If the works have advanced
a lot with regard to the modeling of merotopological concepts (inclusion, contact) on
objects or bodies, at present, reason qualitatively on geometric concepts (distance,
alignment, shape…) with extended entities remains a difficult problem [48].
distinguishes what can be calculated without what can be calculated! Then realize
that for the concept to be “learnable”, it must be able to summarize it (the computer
scientist says “compressed”) but this is clearly not enough [49]. “Learn” not just
“summarize”, it’s more than that! Indeed, what have learned should also allow us to
provide answers to questions other than those used in the learning process: in other
words, “learning” is also “generalization”.
Therefore, it is possible to search for algorithms and therefore to build programs
that “learn”… Their execution on a machine then gives the idea that the machine is at
school. A new question then arises: learn, yes, but from what? What information does
have? In general, the learning process involves at least two agents: an apprentice and a
teacher. Again, are inspired by the human model. And a question and answer process
begins. By simplifying a bit, learn from examples and counterexamples. A wide
range of techniques was then developed to learn from examples. Statistical methods,
historically the oldest, are very effective [50]. But techniques specific to Computing
were developed from the 80 s: neural networks which simulate the architecture of
the human brain, genetic algorithms which simulate the natural selection process of
individuals, inductive logic programming which makes “ walk backward “the usual
process of deduction, Bayesian networks which are based on probability theory to
choose, among several hypotheses, the most satisfactory. There is no connection
between designing machines that learn and using computers to teach (this is part
of Computer-Aided Instruction or CAI) [51]. Today, machines are starting to learn,
but the results, although often remarkable, are still far from the performance of an
elementary school child who learns a lot without understanding how. This means
that there is still work to clone the man by a machine.
Human beings are able to understand and speak one or more languages. This language
ability is inseparable from human intelligence [52]. It is therefore legitimate to ask the
question of the relationships which unite AI and language. The texts emphasize that
a central task of AI is to formalize human knowledge and reasoning. Language is the
main means at our disposal for expressing our reasoning and conveying our knowl-
edge. Messages or exchanges in natural language, therefore, represent, for some AI
researchers, a privileged field of observation. Thus, for example, the study of the
expression of space (through prepositions, verbs, localization names,…) in language
allows it to advance studies on spatial reasoning or does the systematic study of a
corpus of specialized texts make it possible to acquire knowledge specific to a domain
through the study of the organization of the specialized lexicon of this domain. On
the other hand, if one seeks to endow the computer with capacities resembling those
of humans, it is natural to consider making the computer capable of understanding,
or in any case of automatically processing information in natural language, and make
it able to generate natural language messages wisely. This is where AI and linguistic
computing come together. To illustrate the mutual fertilization of AI and linguistic
120 D. Saba et al.
computing, let’s take the fields of semantics and pragmatics. It is about building
a representation of messages in natural language that can be manipulated by the
machine. This implies the modeling of a very large number of semantic phenomena,
such as reference mechanisms (pronominal, spatial, temporal: from there, I saw
you with his binoculars), quantification, complex operators (negation, modalities,
epistemic operators, argumentative because, because of), the generative dynamics of
language (production of metaphors, metonymies, shifts or fluctuations in established
meanings: rusty ideas, selling a Picasso, etc.). In addition to the phenomena related
to the meaning and the relationships that the statements establish with the world, it
is necessary to be interested in phenomena concerning the contextualization of the
statement in a situation of communication. Thus, for human-machine dialogue, seek
to formalize the acts of language, to take into account the model of the user, speaker
or listener, to represent the evolution of knowledge and beliefs during the dialogue,
and to be able to generate responses, model how to present, argue and be cooperative
with a user [53].
A part of linguistic computing is becoming a technological discipline, with major
challenges in the industry and in society. It imposes requirements of efficiency,
simplicity, and expressiveness in the mechanisms and in the representation of consid-
erable volumes of data. From this point of view, the techniques of automatic knowl-
edge acquisition and modeling are precious. While the applications that revolve
around automatic indexing, information retrieval, and even automatic translation,
still make moderate use of AI techniques, other fields such as dialog, automatic
summary or Cooperative human-machine interfaces must and call heavily on the
work of AI in formalizing reasoning [54].
From a fundamental point of view, language analysis and linguistic computing
have repositioned themselves [55]. Starting from idealized representations and
reasoning, in the tradition of Cartesian thought or of the mind - pure logic, numerous
works, inscribed in a double cognitive and anthropological perspective, have shown
that our concepts, even the most abstract, are largely metaphorical, that is to say,
constructed by abstraction from concrete representations. Other work, in particular
generativist, has shown that language appears more like an instinctual component
than an acquired aptitude. The relationships between AI and linguistic computing are
therefore evolving. An axis which includes the paradigms proposed, among others,
by Darwin, Turing and Chomsky is constructed and is singularly highlighted: evolu-
tion and creative dynamic of language, calculation, and modeling by principles and
parameters postulated to be instinctual, implemented by constraints at the expense
of heavy systems of rules, postulated acquired [56].
Multimedia resources available today on the computer are numerous; they often
remain difficult to access due to the absence of a systematic index listing relevant
information on their content. It is not always possible to manually associate a text
Towards Artificial Intelligence: Concepts, Applications … 121
Virtual Reality offers new forms of interaction between humans and systems. The
advent of networked workstations with very strong 3D graphics capabilities coupled
with new visualization and interaction devices whose use is intuitive (headphones,
glove…) makes it possible to provide several users with sensory information neces-
sary to convince them of their presence in a synthetic world [60]. In addition, the
possibility of manipulating certain aspects of these virtual worlds almost as in real
life offers stakeholders the possibility of using their experience and natural capaci-
ties to work in a cooperative manner. The synthetic world created and managed by a
Distributed Virtual Reality system can, therefore, be virtually populated by numerous
users who interact, through specialized peripherals, with these virtual worlds and in
particular with autonomous entities, animated and endowed with complex behaviors:
collaborative behaviors, but also adaptive and not only reactive, that is to say with the
ability to reason and learn, or even seeking to anticipate the dynamics of the world in
which they find themselves, or even emerging behavior [60]. Work in virtual reality
is thus at the border of many fields: distributed systems (and applications), networks,
geometric modeling and image synthesis, human-system interaction, for the systems
themselves, but also, techniques from classical artificial intelligence or artificial life,
for the management of autonomous or semi-autonomous entities (agents) endowed
with behaviors.
An important aspect of work in image synthesis, in CAD and therefore in virtual
reality, concerns the modeling of the virtual universe. The goal of this first step, crucial
for the creation of complex 3D worlds, is to provide a 3D geometric description of the
universe [61]. To facilitate and automate this complex task, but also to control it, it is
necessary to manage constraints or check properties on objects or on the Stage. Thus,
declarative modeling and constraint modeling are interested in issues of Artificial
Intelligence, such as natural language and the semantics of spatial properties and in
their translation into a large set of complex constraints supplying resolvers that can
use an exploration of the space of solutions either exhaustive or stochastic.
One of the most promising motivations for the modeling of these synthetic actors
is to artificially reproduce living properties linked to adaptation. The first form of
adaptation considers a point of view rather centered on the individual and on the
way in which he can learn during his life, according to his different environmental
conditions. The second is more focused on the species and its genetic characteristics
and focuses on modeling theories of evolution and how they can improve the perfor-
mance of individuals by making them evolve. Finally, the last one emphasizes the
collective and social phenomena that can appear in groups of individuals, such as
the processes of communication or cooperation. Some of these concerns are also at
the heart of current research conducted in Artificial Intelligence. Finally, studies and
developments around Virtual Reality therefore often make significant and original
contributions to the various fields mentioned, through all these many interactions.
Towards Artificial Intelligence: Concepts, Applications … 123
5 AI Applications
The concept of a smart home is not new and has existed since long before the birth
of IoT [62]. The idea of the smart home is to monitor, manage, optimize, remotely
access, and fully automate the home environment while minimizing human effort.
The IoT provide the underlying ecosystem that supports, and the easy realization of
a smart home application.
The IoT-based smart home uses both local storage and processing units (such as a
gateway or hub) as well as cloud infrastructure [63]. With the increase in the size of
the environment of study, informatics performance should be improved significantly.
The smart home is sometimes seen as an extension of the concept of the smart
grid [64] From this point of view, the main objective of the smart home is to optimize
energy consumption by taking into account various inputs such as the mode of
use and the real-time presence of residents, the external environment (for weather
conditions) [9].
The main elements of the smart home application are the surveillance systems,
the intelligent HVAC (heating, ventilation and air conditioning) system, the intel-
ligent personalization of the environment according to the user profile, the intelli-
gent management of the environment and object (Fig. 3). However, the key factors,
or relevant challenges for the smart home, are high security, high reliability, high
interoperability, strong adaptation to a wireless environment with multiple risks, etc.
b. Aviation
Aircraft simulators use AI to process data from flights. In the field of wars using
aircraft, computers can find the best success scenarios [65]. Computers can also create
strategies based on the location, size, speed, and strength of forces and counter-forces.
Pilots can receive help in the air during combat by computer. Artificial intelligent
programs can sort information and provide the pilot with the best possible maneuvers,
without forgetting to get rid of certain maneuvers that would be impossible for a
human being to perform. Several aircraft are necessary to obtain good approximations
of certain calculations so that the computer-simulated pilots are used to collect data
[66]. Computer simulated pilots are also used to train future air traffic controllers [67].
c. Smart city
Many countries have already started their smart city project plans, including
Germany, the United States, Belgium, Brazil, Italy, Saudi Arabia, Spain, Serbia,
the United Kingdom, Finland, Sweden, China, and India [68]. The main elements of
a smart city based on IoT are smart hygiene, traffic, management, smart governance,
smart switching, smart monitoring, and smart management of public services,…
(Figure 4). In addition, the main elements in the development of the existing Smart
City based on the Internet of Things are high scalability, high reliability, high avail-
ability and high security. Also, a range of technologies such as RFID, ZigBee,
Bluetooth, Wi-Fi and 5G can be used in the future.
d. Computer science
AI researchers have created many tools to solve the most difficult computer prob-
lems. Then, Many of their inventions have been adopted by mainstream computing
and are no longer considered part of AI. According to Ligeza and Antoni [69], all of
the following were developed: rapid development environments, symbolic program-
ming, time-sharing, interactive interpreters, lists of data structures, functional
programming, object-oriented programming.
AI can be used to create another AI. For example, in November 2017, Google’s
AutoML project is develop new neural network topologies, and a system optimized
for ImageNet [70]. According to Google, NASNet performance has exceeded all
previously published ImageNet performance.
e. Education
There are several companies that create robots to teach children, although these tools
are not yet widely used. There has also been an increase in intelligent tutoring systems,
or STIs, in higher education. For example, an STI called SHERLOCK teaches air
force technicians to diagnose electrical system problems in aircraft [71]. Another
example is DARPA, the Defense Advanced Research Projects Agency, which used
AI to develop a digital tutor to train its navy recruits in technical skills in a shorter
time.
Data sets collected from e-learning systems have enabled analysis, which will be
used to improve the quality of large-scale learning. Examples of learning analytics
can be used to improve the quality of learning [72].
f. Smart energy management
The tasks involved in the energy sector are the production of electricity, the trans-
mission of electricity, distribution to end consumers, monitoring and maintenance
including detection and rectification of faults [73]. However, the early use of IoT is
fairly visible, since counters intelligent multi-purpose and intelligent thermostats are
already deployed (Fig. 5). Continuing its momentum, the IoT has an important role
to play both for the public service provider and the consumer.
g. Finance
develop consumer profiles and associate them with the wealth management prod-
ucts they would like [76]. Goldman Sachs uses Kensho, a market analysis platform
that combines statistical computing with the processing of big data and natural
language. Its machine learning systems extract data accumulated on the Internet
and assess the correlations between world events and their impact on asset prices
[77]. Information retrieval, which is part of AI, is used to extract information from
the live news feed and to facilitate investment decisions.
• Personal finance: several emerging products are using AI to help people with their
personal finances. For example, DIGIT is an application based in AI, which auto-
matically helps consumers to optimize their spending and their savings according
to their own habits and personal goals [78]. The application can analyze factors
such as monthly income, current balance, and spending habits, then make its own
decisions and transfer money to the savings account.
• Portfolio management: Robo-advisers provide financial advice and portfolio
management with minimal intervention from people [79]. This category of finan-
cial adviser works on the basis of algorithms designed to automatically develop
a financial portfolio according to the investment objectives and the risk tolerance
of the clients. It can adapt to real-time market changes and calibrate the portfolio
accordingly.
• Subscription: an online lender, Upstart, analyzes vast amounts of consumer data
and uses machine learning algorithms to develop credit risk models that predict
the likelihood of consumer default. So that technology is presented to banks so
that they can take advantage of their underwriting processes [80]. In addition,
the ZestFinance application it is the ZAML platform (Zest Automated Machine
Learning) specifically intended for credit underwriting [81]. This platform uses to
128 D. Saba et al.
h. Heavy industry
Robots have become common in many industries. Robots are effective in very repet-
itive tasks that can lead to errors or accidents due to poor concentration and other
jobs. In 2014, China, Japan, the United States, the Republic of Korea and Germany
together accounted for 70% of the total volume of robot sales [82]. In the automotive
industry, a particularly automated sector, Japan has the highest density of industrial
robots in the world: 1,414 per 10,000 employees [82].
such as Smartwatches, shoes, clothing, bracelets, etc. can remotely detect the vital
signs of a patient, etc.
Artificial neural networks and clinical decision support systems are used for
medical diagnosis, as is the concept processing technology in EMR software [85].
Other tasks in medicine that can potentially be performed by AI include:
• Computer-aided interpretation of medical images. Such systems help to digitize
digital images.
• Analysis of the sound of the heart.
• Robot for the care of the elderly.
• Extraction of medical records to provide more useful information.
• Design treatment plans.
• Help with repetitive tasks, including medication management.
• Provide consultations.
• Creation of medicines
• Use avatars instead of patients for clinical training
• Predict the probability of death from surgical procedures.
• Predict the progression of HIV (Human immunodeficiency virus).
Some policies are currently developing AI-based solutions launched in recent years in
the field of crime prevention [86]. AI is considered as a public safety resource in many
ways. A specific AI-based application can be found to identify people. Intelligence
analysts, for example, rely on portraits of the face to help determine the identity and
whereabouts of an individual. In addition, examining the possibly relevant images
and videos in timely manner is a time-consuming, painstaking task, with the potential
for human error due to fatigue and other factors. Unlike humans, machines do not tire.
Through initiatives such as the Intelligence Advanced Research Projects Activity’s
Janus computer-vision project, analysts are performing trials on the use of algorithms
that can learn how to distinguish one person from another using facial feature in the
same manner as a human analyst [87]. However, video analytics can prevent the
detection of individuals in multiple locations across multiple cameras, the detection
of crimes through motion and pattern analysis, the identification of crimes in progress,
and assistance to investigators in identifying suspects. With technology like cameras,
video and social media generating massive amounts of data, Amnesty International
can reveal crimes that cannot be discovered by traditional methods that will help
ensure greater public safety and criminal justice. Amnesty International also has the
ability to assist crime labs in areas such as complex DNA analysis [88].
k. IoT Automotive
By 2040, it is estimated that 90% of global vehicle sales will be either highly auto-
mated vehicles or fully automated vehicles [89]. Many companies, for example,
Tesla, Google, BMW, Ford, Uber, etc. work in this direction [90, 91]. The IoT in
130 D. Saba et al.
collaboration with cloud computing will play an important role in the creation of
remote and autonomous connected vehicles (also known as autonomous or driver-
less vehicles or robots) a significant development in the near future. Today, there are
more than 200 sensors integrated into a single vehicle [92]. This trend corresponds
to the fundamental prerequisite of IoT compatible vehicles. Various underlying tech-
nologies for vehicle communications are Bluetooth, ZigBee, dedicated short range
communication (DSRC), Wi-Fi and 4G cellular technology [92].
l. Law
The issue of “the search for the law” has both conceptual and practical importance.
Theoretically, what distinguishes legal thinking from other forms of thinking (such
as practical or moral thinking) is precisely its dependence on legal materials [93].
Although AI research tools may not be quite suitable for answering basic questions in
jurisprudence, they can be used in various ways. In practice, the ability to recognize
and search for relevant legal authorities is one of the most fundamental skills for
legal professionals [94]. If this skill is easily accomplished by a machine, this may
significantly reduce the cost of legal services and democratize access to the law.
The application of AI to the legal field promises considerable efficiency gains
in the legal and judicial spheres. Two technological approaches contribute to the
computerization of law: it is first possible to model legal knowledge using rules-
based systems, or expert systems, within the framework of deterministic algorithmic,
to render account for the logical articulation of certain rules of law [95]. In addition,
other technologies, which rely on statistical processing tools for natural language,
make it possible to explore large quantities of documents such as digital court deci-
sions, to more or less automatically identify rules of law, or the answers generally
were given.
m. Human resources and recruitment
AI is used in three ways by human resources (HR) and recruiting professionals. The
AI is used to filter CVs and classify candidates according to their level of qualification
[96]. AI is also used to predict the success of candidates in given roles through job
search platforms, and now the AI is deploying discussion robots that can automate
repetitive communication tasks [97]. Typically, CV selection involves a recruiter
or other HR professional who analyzes a CV database. Now, startups are creating
machine learning algorithms to automate CV selection processes.
In the area of recruitment and talent development through assessment, AI has five
key advantages [98, 99]:
• Accuracy: the power of today’s computers makes it possible to precisely evaluate
very large quantities of candidate data to make better choices during selection.
• Efficiency: AI for recruitment allows human resources teams to assess job-specific
data in a consistent, objective manner and this very early in the process.
• Fight against bias: prejudices are often the source of poor selection choices. In
theory, AI helps to eliminate the conscious and unconscious biases of recruiters.
Towards Artificial Intelligence: Concepts, Applications … 131
You have to be very careful when programming the system because the quality of
the algorithm is based on the data that feeds it. To guarantee a non-discriminatory
process, several evaluators must be included when defining the model and not just
one, for example.
• Transparency: Allowing candidate assessment models to learn for themselves
by following the best practices of human recruiters remains the best guarantee for
a fair and objective process.
• Commitment to the process: the AI allows recruiters to offer immediate help, for
example via interactive chatbots which can answer candidates’ questions about
the selection process or certain tests. AI also makes it possible to make faster
decisions and therefore to be more reactive as well as limiting biases.
Beyond technological considerations, four key points must be taken into account
[98, 99]:
• Legitimacy: standardized plug-and-play AI models will not allow you to stand
out as an employer. If your competitors use the same solutions, you will all target
the same talents. This promising feature is problematic to justify the selection or
rejection of a candidate. Only custom AI models allow you to make transparent
and defensible choices.
• Deadline for implementation: custom AI systems replicate the behavior and best
practices of your recruiters. It is therefore necessary to pre-populate the tool with
relevant data.
• Ethics: defining the scope of actions of an AI model raises ethical questions.
• Data processing: If AI excels at analyzing large volumes of data, the results can
be misinterpreted or misused. Good data processing practices are essential for
privacy issues, but also for preserving your company’s reputation.
n. Media
o. Smart Agriculture
Various tasks involved in smart agriculture based on IoT are monitoring and acquiring
ambient data which are of great variety and enormous volume, followed by their
132 D. Saba et al.
aggregation and exchange on the network, taking short and long term decisions
based on data analysis and AI and, sometimes, remote actuation of decisions using
field robots. It can also help predict yield to ensure the economic value to be gained,
as well as the early detection of diseases spread in crops, to enable timely preventive
measures [101]. Various technologies used are Bluetooth, RFID, Zigbee, GPS as
well as other technologies which are gaining popularity in this application, namely
SigFox, LoRa, NB-IoT, edge computing and cloud computing [102].
Some of the main components of IoT-based smart agriculture are smart plowing,
smart irrigation, smart fertilization, smart harvesting, smart stock maintenance,
smart livestock management that deals with smart tracking of animals, intelligent
health monitoring, intelligent feeding and fodder management, etc. An IoT solu-
tion for smart farming has different requirements and therefore takes into account
the following key factors; low cost, low energy consumption, highly reusable,
interoperative, highly efficient, resource-efficient, scalable and gradually scalable
solutions.
p. Music
The evolution of music has always been affected by technology, AI has made it
possible through scientific progress, to imitate, to a certain extent, the human compo-
sition [103]. Among the first remarkable efforts, David Cope created an AI called
Emily Howell which has managed to make itself known in the field of algorithmic
music [104]. Other projects, such as AIVA (Artificial Intelligence Virtual Artist),
focus on the composition of symphonic music, mainly classical music for film music
[105]. He achieved a world first by becoming the first virtual composer to be recog-
nized by a professional musical association. With the efforts of Smaill A, they used
AI to produce computer-generated music for different purposes [106]. Also, initia-
tives such as Google Magenta, led by the Google Brain team, want to know if AI can
be able to create an irresistible art [107].
AI is used in automated online assistants that can be seen as avatars on web pages
[114]. It can benefit companies by reducing their operating and training costs. A major
underlying technology in such systems is natural language processing. Pypestream
uses automated customer service for its mobile application designed to streamline
communication with customers. In addition, large companies invest in AI to manage
customers, have worked on different aspects to improve different services. Among
the companies operating in this field, Digital Genius, an AI start-up, searches the
information database more efficiently (based on past conversations and frequently
asked questions) and provides instructions to agents to help them resolve queries more
efficiently [115]. IPSoft creates technology with emotional intelligence to adapt the
interaction of the client which is characterized by adaptation with different languages
[116]. Inbenta focuses on the development of natural language. In other words, to
understand the meaning behind what someone is asking and not just look at the words
used, using context and natural language processing [117]. One element of customer
service that Ibenta has already achieved is its ability to respond en masse to e-mail
queries.
s. Telecommunications maintenance
The name “smart grid” was formulated in the 1980s, with real premonition [118].
Today, this concept encompasses technological trends such as AI (with machine
learning) or predictability solutions. Who says “intelligence” in communication and
information systems, says expert systems and now AI with its machine learning
approaches (“automatic learning”) or its deep learning subset (“neural networks”))…
According to the Gartner study “The road to enterprise AI, 2017”, a proportion
of 10% of emergency interventions in the field will be triggered and programmed by
AI by 2020 [119].
According to the Kambatla, K. et al. combined with analytics or big data, the
research firm evokes a “considerable impact of these new technologies” on whole
sections of the industry: monitoring of production machines, prevention of equipment
failure and maintenance predictive [120].
t. Games
In the game field, AI is used to provide a quality gaming experience, for example by
giving non-player characters behavior similar to that of humans, or by limiting the
skills of the program to give the player a sense of fairness.
At the beginning of the field, there has long been a gap between classical AI
research and the AI implemented by developers in video games [121]. Indeed, there
were between the two major differences in knowledge, the problems encountered but
also the ways of solving these problems. However, classical AI research generally
aspires to improve or create new algorithms to advance the state of the art. On the
other hand, the development of an AI in a video game aims to create a coherent
134 D. Saba et al.
system which integrates as best as possible in the design of the game to be fun for
the player. Thus, a high-performance AI that is not well integrated into the gameplay
can serve the game more than it will improve it.
Developing AI for a video game therefore often requires finding engineering
solutions to problems little or not at all addressed by conventional AI research [122].
For example, an AI algorithm in a game is very tight in terms of computing power,
memory and execution time. Indeed, the game must run through a console or an
“ordinary” computer and it must not be slowed down by AI. This is why some state-
of-the-art AI-intensive solutions could not be implemented in video games until
several years after their use in classic AI.
American researchers have just developed an AI capable of solving the famous
game of Rubik’s Cube in just 1.2 s, a time inaccessible to humans. Just recently,
Facebook and Carnegie Mellon University announced that a joint team of researchers
had successfully developed AI software capable of defeating some of the best poker
professionals. The progress of AI continues to increase and beat, one after the other,
all the games that humans have sometimes taken years to master. However, the link
between AI and gaming is not new. Already in 1979, the robot “Gammanoid” beat
Luigi Villa, a backgammon champion, in Monte-Carlo [123]. Today, it is mainly
thanks to the impressive progress of deep learning that many records are broken, as
recently in poker.
u. Transports
Fuzzy logic controllers have been developed for automatic transmissions in auto-
mobiles. For example, the 2006 Audi TT, VW Touareg, and VW Caravell have the
DSP transmission which uses Fuzzy Logic. Several variants of Škoda (Škoda Fabia)
also include a Fuzzy Logic controller [124]. However, today’s cars now have AI-
based driver assistance functions such as automatic parking and advanced cruise
controls. AI has been used to optimize traffic management applications, reducing
wait times, energy consumption and emissions by up to 25% [125]. In the future,
fully autonomous cars will be developed. Transport AI should provide safe, efficient
and reliable transport while minimizing the impact on the environment and commu-
nities. The main challenge to develop this AI is the fact that transport systems are
intrinsically complex systems involving a very large number of different components
and parts, each with different and often contradictory objectives.
Will see in this part “Advantages and disadvantages of AI”. What are the positive
and negative points of AI in the “Important areas of applications” seen previously as
well as in other areas.
Towards Artificial Intelligence: Concepts, Applications … 135
6.1 Advantages of AI
• Save (or kill): this segment was seen in the military field as a robot with the
advantage of being able to save people at risk without putting themselves at risk
or to kill them if necessary during the war.
• Games: AI in video games allows players to face strong opponents increasingly.
The game looks more realistic and the player really feels playing against human.
It also allows you to play against the computer when there is no opponent to play
with it.
• In a company: in a business that works with AI-based systems, there are several
advantages:
– Better use of human resources (employees become less and less productive in
the face of routine tasks, with machines equipped with AI, work is carried out
more suited to the skills of the employees)
– Reduction of employees (less staff for routine tasks like making photocopies)
– Reduction of personnel costs (with several machines to replace several
employees, the company makes a big saving, it has more than a qualified
maintenance worker to employ)
– Learning (the AI is able to learn automatically and therefore progress quickly,
so it can do the work of several people and have contact with the client).
• New humanoid form: robots are now represented as a human called a humanoid.
The advantage facing this new form is the fact that the person in front does not
have the impression of having to do with a machine and that gives a more pleasant
side especially when this machine is permanently in your daily life.
All the advantages mentioned above also have drawbacks or limits as to their opera-
tion. The most conceivable drawback is the presence of an error in the programming
of a robot whatever it is which would be fatal for the good functioning of this robot.
This drawback is present in all areas without exception. It has been seen previously
that the computer (or any robot) did not know how to detect an error that would have
been made in its program. The consequences of such an error could be catastrophic
on a large scale. In addition, what limits AI research is the very high price of this
research. To make robots capable of being autonomous in space, an astronomical
sum would have to be spent, which limits the progress of research for the moment.
In medicine, patient robots are an advantage for students, but it is very expensive to
manufacture, which is why not all schools can equip them. This is also the case for
all the prostheses which are expensive to develop (they also have the other disadvan-
tage, seen previously, that is to say, that a person wearing a prosthesis should not be
charged disturbances and if it breaks down, the patient would suddenly find himself
unable to move his hand in which he was holding something).
Towards Artificial Intelligence: Concepts, Applications … 137
In companies, in particular, AI and new mechanized robots are causing job cuts.
Human beings are gradually being replaced by robots, thereby increasing the already
high unemployment rate.
Can go to war with robots? What if the robot suddenly turns against its camp?
Is it human to send machines to kill men? Is it okay to use this science for this
purpose? Since it is the military field that funds research in AI the most, it has the
“power” to develop what it wants. However, imagine if the robots took control of
the world. This is the idea that many filmmakers have taken up. In “ I Robot “ or
“ Terminator “ human-made robots take over and destroy humanity. All of these
fictional assumptions have been the subject of science fiction novels like those of
Isaac Asimov for example. This one having written 3 laws in 1947 on the behavior
which must have the robot:
• A robot cannot harm a human being, not remain passive or allow a human to be
exposed to danger.
• A robot must obey the orders given to it by the human being unless such orders
conflict with the first law.
• A robot must protect its existence as long as this protection does not conflict with
the first or second law.
7 Discussion
With what was previously shown, the benefits of AI are great. We, therefore, find
it interesting to use this form of intelligence as long as it is beneficial to humans.
Indeed, the robot can replace the Man in difficult tasks (because it has no physical
constraints), in annoying tasks or even in places where the Man cannot go. The robot
will then be faster, more precise and more efficient. However, it should not be misused
because a robot cannot replace a human. On the other hand, the use of this technology
by companies increases more, which creates an increase in the unemployment rate.
In addition, AI must be used for justified and morally correct purposes (for example:
developing a robot that can help people with disabilities in everyday life and help
students pursue their lessons and perform their scientific research) and in return not
to develop robots that cause harm to individuals. Finally, it is not possible for anyone
to succeed in developing a human person with a brain that evolves as a human being,
because scientists have been trying for decades to understand the workings of the
human brain that can reproduce and implant it into a robot but without result.
At the same time, it should not be thought that a robot is equal to humans; AI
is only the mechanical reproduction of the human brain. In fact, what is missing in
AI and robots, androids is awareness, intuition and expression of feelings, so how is
it presented in a robot?! Especially since there are many forms of intelligence and
our minds use different areas to solve all problems. Hence, AI can replace humans
to the point where it will be necessary to create very accurate thinking. This is also
what Albert Einstein defends in his expression, “Machines will one day be able
to solve all problems, but none of them will ever be able to solve them!” Robots
138 D. Saba et al.
exist only to improve humanitarian action and to solve problems. Finally, AI will
never replace that of humans. Indeed, the intelligence of the person, his brain is
much more prominent than that of an intelligent device. The speed of reflection of
human intelligence is more important than that of a machine. However, the human
brain has many more neurons than an artificial machine. In addition, it should not
be forgotten that man is distinguished from machines, by the fact that man reflects
by himself as long as the machine will use the data they have been transmitted to
him by human. However, AI will not be able to replace humans in the future. And
therefore, Technology is progressing more and more and will occupy a place almost
everywhere. Unlike humans, AI does not need to eat, sleep or rest because it is a
machine. This may be very beneficial for men, but at the same time it may also
mean an obstacle. Take an example of factory robots: the number of robots to collect
objects is increasing. It is also very useful because it is a great job with routine and
repetitive work.
8 Conclusion
artificial humans can not only be at our service, but they can also speak to us and
have a real conversation with them. These robots in the future may be able to feel and
love. In the age of robots, prosthetics become robots by themselves and combine AI
to aid their owners’ movements. Indeed, it perfectly adapts to the human condition:
whether the latter is rising or ascending to the basement. The prostheses contain
sensors, motors and an integrated calculator.
Can say that AI, by modeling and simulating cognitive functions part-tasks
deemed intelligent to, in a generally both more reliable and faster machines than
would make the man, by exploiting generally large quantities information. This
gives rise to achievements which can be spectacular, but which can also disturb the
idea that have of our intelligence.
Great progress still remains to be made in AI to create systems with wider ranges
of cognitive capacities and also more user-friendly for users. Without a doubt, it is
perfectly conceivable that the AI realizes systems capable of a certain introspection
(i.e. to observe itself in tasks of reasoning and thus to acquire meta-knowledge),
to analyze or to simulate emotions, or even to write poems or create graphic works
obeying certain constraints or principles. But all of this will remain far enough from an
autonomous though, conscious of itself, capable of juggling with its representations
of the world, of behaving in a playful way, not purely reactive, of creating in a way
directed, to dream.
Appendix
This section is dedicated to presenting the terms which are mainly related to the
technology of AI (see Table 2).
Table 2 (continued)
Term Description
Evolutionary algorithm Evolutionary algorithms or evolutionary algorithms are
algorithms inspired by the theory of evolution to solve various
problems. That is to say that they record the results previously
found and thus constitute a database. They thus develop a set of
solutions to a given problem, with a view to finding the best
results
Catheter A catheter is a medical device consisting of a tube, of variable
width and flexibility, and manufactured in different materials
according to the models or the uses for which they are intended.
The catheter is intended to be inserted into the lumen of a body
cavity or blood vessel and allows drainage or infusion of fluids,
or even access for other medical devices
Deep Blue Deep Blue is a computer specializing in the game of chess by
adding specific circuits, developed by IBM in the early 1990s
and which lost a match (2-4) against the world chess champion
Garry Kasparov in 1996, then defeated the world champion
(3.5–2.5) in the rematch in 1997
Dofus and Sims Video games of character simulations
Drone The Drone is an unmanned human aircraft on board, which
carries a payload, intended for surveillance, intelligence or
combat type missions.
ELIZA ELIZA is a computer program written by Joseph Weizenbaum in
1966, which simulated a Rogerian psychotherapist by rephrasing
most of the “patient’s” statements in questions, and asking them
Ethics Ethics is the science of morals and mores. It is a philosophical
discipline which reflects on the aims, on the values of existence,
on the conditions of a happy life, on the concept of “good” or on
questions of morals or morals
G.P.S The Global Positioning System (GPS) is a geolocation system
operating on a global level
IBM International Business Machines, exactly International Business
Machines Corporation, known by the abbreviation IBM, is an
American multinational company active in the fields of
computer hardware, software and IT services
John McCarthy (1927–2011) John McCarthy is the main pioneer of AI with Marvin Minsky;
he embodies the current emphasizing symbolic logic
LISP (List Procesor) Functional programming language designed for list processing,
used in the fields of AI and expert systems
Louise Bérubé Quebec psychologist, specialized in behavioral neurology
Marvin Minsky Marvin Lee Minsky is an American scientist. He works in the
field of cognitive sciences and AI. He is also a co-founder, with
computer scientist John McCarthy of the AI Group of the
Massachusetts Institute of Technology (MIT) and author of
numerous publications in both AI and philosophy such as, for
example, The Spirit Society (1986)
Microswitch Electric component allowing to cut the current or to reverse it
(continued)
Towards Artificial Intelligence: Concepts, Applications … 141
Table 2 (continued)
Term Description
MIT The Massachusetts Institute of Technology, MIT, is a famous
American university in the state of Massachusetts, in the
immediate vicinity of Boston. Founded in the 19th century to
meet a growing demand for engineers, the institute will become
multidisciplinary but will keep a teaching in science giving a
large place to experience as well as to technological and
industrial applications
Nanorobot Microscopic robot
Pathology Diseases, sets of symptoms and their origins
Rotor Movable part of an engine
Haptic sensations Adjective which indicates interfaces which give feelings by the
touch (pressure, movements…)
Stator Fixed part of an engine
Helical Adjective designating the form of helix
References
1. Flasiński, M., Flasiński, M.: History of artificial intelligence. In: Introduction to Artificial
Intelligence (2016)
2. O’Regan, G., O’Regan, G.: Marvin Minsky. In: Giants of Computing (2013)
3. Entwistle, A.: What is artificial intelligence? Eng. Mater. Des. (1988). https://doi.org/10.1007/
978-1-4842-3799-1_1
4. Copeland, B.J.: Artificial intelligence | Definition, Examples, and Applications | Britannica
(2020). https://www.britannica.com/technology/artificial-intelligence. Accessed 26 Apr 2020
5. Murphy, R.R.: Introduction to AI robotics. BJU Int. (2000). https://doi.org/10.1111/j.1464-
410X.2011.10513.x
6. McConaghy, E.: Automaton. West. Hum., Rev (2012)
7. Saba, D., Berbaoui, B., Degha, H.E., Laallam, F.Z.: A generic optimization solution for
hybrid energy systems based on agent coordination. In: Hassanien, A.E., Shaalan, K., Gaber,
T., Tolba, M.F. (eds.) Advances in Intelligent Systems and Computing, pp. 527–536. Springer,
Cham, Cairo—Egypte (2018)
8. Saba, D., Degha, H.E., Berbaoui, B., et al.: Contribution to the modeling and simulation
of multi-agent systems for energy saving in the habitat. In: Djarfour, N. (ed.) International
Conference on Mathematics and Information Technology, p. 1. IEEE, Adrar-Algeria (2017)
9. Saba, D., Sahli, Y., Abanda, F.H., et al.: Development of new ontological solution for an
energy intelligent management in Adrar city. Sustain. Comput. Inform. Syst. 21, 189–203
(2019). https://doi.org/10.1016/J.SUSCOM.2019.01.009
10. Saba, D., Laallam, F.Z., Degha, H.E., et al.: Design and development of an intelligent ontology-
based solution for energy management in the home. In: Hassanien, A.E. (ed.) Studies in
Computational Intelligence, 801st edn, pp. 135–167. Springer, Cham, Switzerland (2019)
11. Saba, D., Maouedj, R., Berbaoui, B.: Contribution to the development of an energy manage-
ment solution in a green smart home (EMSGSH). In: Proceedings of the 7th International
Conference on Software Engineering and New Technologies—ICSENT 2018, pp. 1–7. ACM
Press, New York, NY, USA (2018)
12. Saba, D., Zohra Laallam, F., Belmili, H. et al.: Development of an ontology-based generic
optimisation tool for the design of hybrid energy systems. Int. J. Comput. Appl. Technol. 55,
232–243 (2017). https://doi.org/10.1504/IJCAT.2017.084773
142 D. Saba et al.
13. Degha, H.E., Laallam, F.Z., Said, B., Saba, D.: Onto-SB: Human profile ontology for energy
efficiency in smart building. In: Larbi Tebessi university A (eds.) 2018 3rd International
Conference on Pattern Analysis and Intelligent Systems (PAIS). IEEE, Tebessa, Algeria
(2018)
14. Saba, D., Laallam, F.Z., Berbaoui, B., Fonbeyin, H.A.: (2016) An energy management
approach in hybrid energy system based on agent’s coordination. In: The 2nd international
conference on advanced intelligent systems and informatics (AISI’16). Advances in Intelligent
Systems and Computing, Cairo, Egypt
15. Saba, D., Laallam, F.Z., Hadidi, A.E., Berbaoui, B.: Contribution to the management of energy
in the systems multi renewable sources with energy by the application of the multi agents
systems “MAS”. Energy Procedia 74, 616–623 (2015). https://doi.org/10.1016/J.EGYPRO.
2015.07.792
16. Cockcroft, K.: Book review: international handbook of intelligence. South African J. Psychol.
(2005). https://doi.org/10.1177/008124630503500111
17. Mcculloch, W.S., Pitts, W.: A logical calculus nervous activity. Bull. Math. Biol. (1990).
https://doi.org/10.1007/BF02478259
18. Wiener, N.: Norbert Wiener, 1894–1964. IEEE Trans. Inf. Theory (1974). https://doi.org/10.
1109/TIT.1974.1055201
19. Chiu, E., Lin, J., Mcferron, B., et al.: Mathematical Theory of Claude Shannon. Work Pap
(2001)
20. Gass, S.I.: John von Neumann. In: International Series in Operations Research and
Management Science (2011)
21. Newell, A., Shaw, J.C., Simon, H.A.: Elements of a theory of human problem solving. Psychol.
Rev. (1958). https://doi.org/10.1037/h0048495
22. Nilsson, N.J.: Shakey The Robot (1984)
23. Brooks, R.A.: New approaches to robotics. Science (80) (1991). https://doi.org/10.1126/sci
ence.253.5025.1227
24. Li, B.H., Hou, B.C., Yu, W.T., et al.: Applications of artificial intelligence in intelligent
manufacturing: a review. Front. Inf. Technol. Electron. Eng. (2017)
25. Internetlivestats: Internet Live Stats—Internet Usage & Social Media Statistics (2020). https://
www.internetlivestats.com/. Accessed 20 Feb 2020
26. Internetlivestats: 1 Second—Internet Live Stats (2020). https://www.internetlivestats.com/
one-second/#tweets-band. Accessed 20 Feb 2020
27. Trends.google.com: Macron, Trump—Découvrir - Google Trends (2020). https://trends.goo
gle.com/trends/explore?q=Macron,Trump. Accessed 20 Feb 2020
28. Powles, J., Hodson, H.: Google DeepMind and healthcare in an age of algorithms. Health
Technol (Berl) (2017). https://doi.org/10.1007/s12553-017-0179-1
29. DeepMind: AlphaStar: mastering the real-time strategy game StarCraft II. DeepMind (2019)
30. Lau, J., Zimmerman, B., Schaub, F.: Alexa, are you listening? Proc. ACM Hum.-Comput.
Interact (2018). https://doi.org/10.1145/3274371
31. Bell, T.: 6 ways Facebook uses AI | CIO (2018). https://www.cio.com/article/3280266/6-
ways-facebook-uses-artificial-intelligence.html. Accessed 26 Apr 2020
32. Hoy, M.B.: Alexa, Siri, Cortana, and more: an introduction to voice assistants. Med. Ref.
Serv. Q. (2018). https://doi.org/10.1080/02763869.2018.1404391
33. Apple: Optimizing Siri on HomePod in Far-Field Settings—Apple, vol. 1, Issue 12
34. Reddy, R.: Foundations and grand challenges of artificial intelligence. AI Mag. (1988)
35. Van Remoortere, P.: Computer-based medical consultations: MYCIN. Math Comput. Simul.
(1979). https://doi.org/10.1016/0378-4754(79)90016-8
36. Saba, D., Laallam, F.Z., Hadidi, A.E., Berbaoui, B.: Optimization of a multi-source system
with renewable energy based on ontology. Energy Procedia 74, 608–615 (2015). https://doi.
org/10.1016/J.EGYPRO.2015.07.787
37. Campbell, M., Hoane, A.J., Hsu, F.H.: Deep blue. Artif. Intell. (2002). https://doi.org/10.
1016/S0004-3702(01)00129-1
Towards Artificial Intelligence: Concepts, Applications … 143
38. Saba, D., Laallam, F.Z., Berbaoui, B., Abanda, F.H.: An energy management approach in
hybrid energy system based on agent’s coordination. In: Hassanien, A., Shaalan, K., Gaber, T.,
Azar, A.T.M. (eds.) Advances in Intelligent Systems and Computing, 533rd edn, pp. 299–309.
Springer, Cham, Cairo, Egypte (2017)
39. Saba, D., Degha, H.E., Berbaoui, B., et al.: Contribution to the modeling and simulation of
multiagent systems for energy saving in the habitat. International Conference on Mathematics
and Information Technology (ICMIT 2017), pp. 204–208. IEEE, Adrar, Algeria (2018)
40. Saba, D., Degha, H.E., Berbaoui, B., Maouedj, R.: Development of an ontology based solution
for energy saving through a smart home in the city of Adrar in Algeria, pp. 531–541. Springer,
Cham (2018)
41. Kerber, M., Lange, C., Rowat, C.: An introduction to mechanized reasoning. J. Math. Econ.
66, 26–39 (2016). https://doi.org/10.1016/J.JMATECO.2016.06.005
42. Siekmann J (2014) Computational Logic, pp. 15–30
43. Peng, H.G., Wang, J.Q.: Hesitant uncertain linguistic Z-Numbers and their application in
multi-criteria group decision-making problems. Int. J. Fuzzy Syst. (2017). https://doi.org/10.
1007/s40815-016-0257-y
44. Huitt, W.G.: Problem solving and decision making: consideration of individual differences
using the myers-briggs type indicator. J. Psychol. Type (1992). https://doi.org/10.1017/CBO
9781107415324.004
45. Wilson, D.R.: Hand book of collective intelligence. Soc. Sci. J. (2017). https://doi.org/10.
1016/j.soscij.2017.10.004
46. Upadhyay, S.K., Chavda, V.N.: Intelligent system based on speech recognition with capability
of self-learning. Int. J. Technol. Res. Eng. ISSN (2014)
47. Herzig, A., Lang, J., Marquis, P.: Action representation and partially observable planning using
epistemic logic. In: IJCAI International Joint Conference on Artificial Intelligence (2003)
48. Mezzadra, S., Neilson, B.: Between inclusion and exclusion: on the topology of global space
and borders. Theory Cult. Soc. (2012). https://doi.org/10.1177/0263276412443569
49. Copeland, B.J., Proudfoot, D.: Alan Turing’s forgotten ideas in computer science. Sci. Am.
(1999). https://doi.org/10.1038/scientificamerican0499-98
50. Berbaoui, B., Saba, D., Dehini, R., et al.: Optimal control of shunt active filter based on
Permanent Magnet Synchronous Generator (PMSG) using ant colony optimization algo-
rithm. In: Proceedings of the 7th International Conference on Software Engineering and New
Technologies—ICSENT 2018. ACM Press, New York, NY, USA, pp. 1–8 (2018)
51. Barrow, L., Markman, L., Rouse, C.E.: Technology’s edge: the educational benefits of
computer-aided instruction. Am. Econ. J. Econ. Policy (2009). https://doi.org/10.1257/pol.1.
1.52
52. Gibson, K.R.: Evolution of human intelligence: the roles of brain size and mental construction.
In: Brain, Behavior and Evolution (2002)
53. Minker, W., Bennacef, S.: Speech and human—machine dialog. Comput. Linguist (2005).
https://doi.org/10.1162/0891201053630309
54. Bengler, K., Zimmermann, M., Bortot, D., et al.: Interaction principles for cooperative human-
machine systems. It—Inf. Technol. https://doi.org/10.1524/itit.2012.0680
55. Rodríguez, R.M., Martínez, L.: An analysis of symbolic linguistic computing models in
decision making. Int. J. General Syst. (2013)
56. Chomsky, N.: Language and Mind, 3rd edn.
57. Mantiri, F.: Multimedia and technology in learning. Univers. J. Educ. Res. (2014). https://doi.
org/10.13189/ujer.2014.020901
58. Miranda, S., Ritrovato, P.: Automatic extraction of metadata from learning objects. In:
Proceedings—2014 International Conference on Intelligent Networking and Collaborative
Systems, IEEE INCoS 2014 (2014)
59. O’Leary, D.E.: Artificial intelligence and big data. IEEE Intell. Syst. (2013). https://doi.org/
10.1109/MIS.2013.39
60. Gutiérrez-Maldonado, J., Alsina-Jurnet, I., Rangel-Gómez, M.V., et al.: Virtual intelligent
agents to train abilities of diagnosis in psychology and psychiatry. Stud. Comput. Intell.
(2008). https://doi.org/10.1007/978-3-540-68127-4_51
144 D. Saba et al.
61. Appan, K.P., Sivaswamy, J.: Retinal image synthesis for CAD development. In: Lecture Notes
in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics) (2018)
62. Saba, D., Sahli, Y., Berbaoui, B., Maouedj, R.: Towards smart cities: challenges, compo-
nents, and architectures. In: HassanienRoheet, A.E., BhatnagarNour E.M., KhalifaMohamed
H.N.T. (eds.), Studies in Computational Intelligence: Toward Social Internet of Things (SIoT):
Enabling Technologies, Architectures and Applications, pp. 249–286. Springer, Cham (2020)
63. Cyril Jose, A., Malekian, R.: Smart home automation security: a literature review. Smart
Comput. Rev. (2015). https://doi.org/10.6029/smartcr.2015.04.004
64. Alam, M.R., Reaz, M.B.I., Ali, M.A.M.: A review of smart homes—past, present, and future.
IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. (2012). https://doi.org/10.1109/TSMCC.
2012.2189204
65. Jones, R.M., Laird, J.E., Nielsen, P.E., et al.: Pilots for Combat Flight Simulation. AI Mag
(1999). https://doi.org/10.1609/aimag.v20i1.1438
66. Gallagher, S.: AI bests Air Force combat tactics experts in simulated dogfights | Ars Tech-
nica (2016). https://arstechnica.com/information-technology/2016/06/ai-bests-air-force-com
bat-tactics-experts-in-simulated-dogfights/. Accessed 13 Jan 2020
67. Jones, R.M., Laird, J.E., Nielsen, P.E., et al.: Automated intelligent pilots for combat flight
simulation. AI Mag. (1999)
68. Adapa, S.: Indian smart cities and cleaner production initiatives—integrated framework and
recommendations. J. Clean. Prod. (2018). https://doi.org/10.1016/j.jclepro.2017.11.250
69. Ligeza, A.: Artificial intelligence: a modern approach. Neurocomputing (1995). https://doi.
org/10.1016/0925-2312(95)90020-9
70. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: NASNet. Proc. IEEE Comput. Soc. Conf.
Comput. Vis. Pattern Recogn. (2018). https://doi.org/10.1109/CVPR.2018.00907
71. Farr, M.J., Psotka, J.: Intelligent Instruction by Computer : Theory and Practice
72. Horvitz, E.: One Hundred Year Study on Artificial Intelligence. Stanford University (2016)
73. Treleaven, P., Galas, M., Lalchand, V.: Algorithmic trading review. Commun. ACM (2013)
74. Greenwood, J.: Why BlackRock is investing in digital—the platforum. Corp Advis (Online
Ed) (2016)
75. Crosman, P.: Beyond robo-advisers: how AI could rewire wealth management | American
Banker. In: American Banker (2017). https://www.americanbanker.com/news/beyond-robo-
advisers-how-ai-could-rewire-wealth-management. Accessed 14 Jan 2020
76. Antoine, G.: Kensho’s AI for investors just got valued at over $500 million in funding round
from wall street. In: Forbes.com (2017). https://www.forbes.com/sites/antoinegara/2017/02/
28/kensho-sp-500-million-valuation-jpmorgan-morgan-stanley/#2598a9305cbf. Accessed
14 Jan 2020
77. ERIC, R.: The 8 best AI Chatbot apps of 2020. In: Thebalancesmb (2019). https://www.the
balancesmb.com/best-ai-chatbot-apps-4583959. Accessed 14 Jan 2020
78. Gofer, E.: Machine Learning Algorithms with Applications in Finance. Thesis (2014)
79. Obermeyer, Z., Emanuel, E.J.: Predicting the future-big data, machine learning, and clinical
medicine. New Engl. J. Med. (2016)
80. AM, E.: ZestFinance introduces machine learning platform to underwrite millennials
and other consumers with limited credit history | Business wire. In: Business wire
(2017). https://www.businesswire.com/news/home/20170214005357/en/ZestFinance-Introd
uces-Machine-Learning-Platform-Underwrite-Millennials. Accessed 14 Jan 2020
81. World Robotics Organization: Executive Summary—World Robotics (Industrial {&} Service
Robots) 2014. World Robot Rep (2014)
82. Adhikary, T., Jana, A.D., Chakrabarty, A., Jana, S.K.: The Internet of Things (IoT) Augmen-
tation in healthcare: an application analytics. In: ICICCT 2019—System Reliability, Quality
Control, Safety, Maintenance and Management (2020)
83. Yin, Y., Zeng, Y., Chen, X., Fan, Y.: The internet of things in healthcare: an overview. J. Ind.
Inf., Integr (2016)
Towards Artificial Intelligence: Concepts, Applications … 145
84. Kiah, M.L.M., Haiqi, A., Zaidan, B.B., Zaidan, A.A.: Open source EMR software: profiling,
insights and hands-on analysis. Comput. Methods Programs Biomed. (2014). https://doi.org/
10.1016/j.cmpb.2014.07.002
85. Sukhodolov, A.P., Bychkova, A.M.: Artificial intelligence in crime counteraction, predic-
tion, prevention and evolution. Russ. J. Criminol. (2018). https://doi.org/10.17150/2500-4255.
2018.12(6).753-766
86. Rigano, C.: Using Artificial Intelligence to Address Criminal Justice Needs (NIJ Journal 280)
(2019)
87. Škrlec, B.: Eurojust and External Dimension of EU Judicial Cooperation. Eucrim—Eur Crim
Law Assoc Forum (2019). https://doi.org/10.30709/eucrim-2019-018
88. Milakis, D., Snelder, M., Van Arem, B., et al.: Development and transport implications of
automated vehicles in the Netherlands: scenarios for 2030 and 2050. Eur. J. Transp. Infrastruct.
Res. (2017). https://doi.org/10.18757/ejtir.2017.17.1.3180
89. Andrea, M.: Some of the companies that are working on driverless car technology—
ABC News (2018). https://abcnews.go.com/US/companies-working-driverless-car-techno
logy/story?id=53872985
90. Richtel, M., Dougherty, C.: Google’s Driverless Cars Run Into Problem: Cars With Drivers—
The New York Times. New York Times (2015)
91. Guerrero-Ibáñez, J., Zeadally, S., Contreras-Castillo, J.: Sensor technologies for intelligent
transportation systems. Sensors (Basel) 18 (2018). https://doi.org/10.3390/s18041212
92. Dadgosari, F., Guim, M., Beling, P.A., et al.: (2020) Modeling law search as prediction. Artif.
Intell. Law 1–32. https://doi.org/10.1007/s10506-020-09261-5
93. Walker-Osborn, C.: Artificial intelligence automation and the law. ITNOW (2018). https://
doi.org/10.1093/itnow/bwy020
94. Alarie, B., Niblett, A., Yoon, A.H.: How artificial intelligence will affect the practice of law.
Univ. Tor. Law J. (2018)
95. Tambe, P., Cappelli, P., Yakubovich, V.: Artificial intelligence in human resources manage-
ment: challenges and a path forward. Calif. Manag. Rev. (2019). https://doi.org/10.1177/000
8125619867910
96. Radevski, V., Trichet, F.: Ontology-based systems dedicated to human resources management:
an application in e-recruitment. In: Lecture Notes in Computer Science (including subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006)
97. Upadhyay, A.K., Khandelwal, K.: Applying artificial intelligence: implications for recruit-
ment. Strateg. HR Rev. (2018). https://doi.org/10.1108/shr-07-2018-0051
98. Raviprolu A (2017) Role of Artificial Intelligence in Recruitment. Int J Eng Technol
99. Sophie, C.: Intelligence artificielle (IA) dans les médias: beaucoup de fantasmes (2019).
https://www.samsa.fr/2019/12/02/intelligence-artificielle-ia-dans-les-medias-beaucoup-de-
fantasmes-quelques-realites-et-pas-mal-de-questions/. Accessed 7 Feb 2020
100. Muangprathub, J., Boonnam, N., Kajornkasirat, S., et al.: IoT and agriculture data analysis for
smart farm. Comput. Electron. Agric. 156, 467–474 (2019). https://doi.org/10.1016/J.COM
PAG.2018.12.011
101. FT: Smart agriculture based on cloud computing and IOT. J. Converg. Inf. Technol. (2013).
https://doi.org/10.4156/jcit.vol8.issue2.26
102. Lopez-Rincon, O., Starostenko, O., Martin, G.A.S.: Algoritmic music composition based
on artificial intelligence: A survey. In: 2018 28th International Conference on Electronics,
Communications and Computers, CONIELECOMP 2018 (2018)
103. Cope, D.: Algorithmic music composition. In: Patterns of Intuition: Musical Creativity in the
Light of Algorithmic Composition (2015)
104. Norton, D., Heath, D., Ventura, D.: Finding creativity in an artificial artist. J. Creat. Behav.
(2013). https://doi.org/10.1002/jocb.27
105. Smaill, A.: Music and Artificial Intelligence (2002)
106. Kamhi, G., Novakovsky, A., Tiemeyer, A., Wolffberg, A.: Magenta (2009)
107. Brian, S.: Narrative science, the automated journalism startup—technology and operations
management. In: HBS Digital Initiaitve (2018). https://digital.hbs.edu/platform-rctom/submis
sion/narrative-science-the-automated-journalism-startup/. Accessed 23 Jan 2020
146 D. Saba et al.
108. Brian, S.: Automated Insights: Natural Language Generation (2020). https://automatedins
ights.com/. Accessed 23 Jan 2020
109. Spreitzer, G.M., Garrett, L.E., Bacevice, P.: Should your company embrace coworking? MIT
Sloan Manag. Rev. (2015)
110. Echobox: Echobox—Social Media for Publishers (2020). www.echobox.com. https://www.
echobox.com/. Accessed 23 Jan 2020
111. Yseop: Advanced Natural Language Generation (NLG) AI automation | Yseop (2020). www.
yseop.com. https://www.yseop.com/. Accessed 23 Jan 2020
112. Boomtrain Software: Boomtrain Software—2020 reviews, pricing & demo. In: Boomtrain
Software (2020). https://www.softwareadvice.com/marketing/boomtrain-profile/. Accessed
23 Jan 2020
113. D’Alfonso, S., Santesteban-Echarri, O., Rice, S., et al.: Artificial intelligence-assisted online
social therapy for youth mental health. Front Psychol. (2017). https://doi.org/10.3389/fpsyg.
2017.00796
114. Digitalgenius: DigitalGenius | Customer Service Automation Platform (2020). www.digita
lgenius.com, https://www.digitalgenius.com/. Accessed 23 Jan 2020
115. Ipsoft: IPsoft Inc., Global Leader in AI and Cognitive Tech Systems (2020). https://www.ips
oft.com/. https://www.ipsoft.com/. Accessed 23 Jan 2020
116. Bloomberg: Inbenta Technologies Inc.: Private Company Information—Bloomberg. In:
Bloomberg (2019)
117. Raza, M.Q., Khosravi, A.: A review on artificial intelligence based load demand forecasting
techniques for smart grid and buildings. Renew. Sustain., Energy Rev (2015)
118. Gartner: The Road to Enterprise AI (2017)
119. Kambatla, K., Kollias, G., Kumar, V., Grama, A.: Trends in big data analytics. J. Parallel
Distrib. Comput. (2014). https://doi.org/10.1016/j.jpdc.2014.01.003
120. Safadi, F., Fonteneau, R., Ernst, D.: Artificial intelligence in video games: towards a unified
framework. Int. J. Comput. Games Technol. (2015). https://doi.org/10.1155/2015/271296
121. Frutos-Pascual, M., Zapirain, B.G.: Review of the use of AI techniques in serious games:
decision making and machine learning. IEEE Trans. Comput. Intell. AI Games (2017)
122. Frutos-Pascual, M.: Les robots deviennent-ils plus intelligents que les humains ?—Maddy-
ness—Le Magazine des Startups Françaises (2019). https://www.maddyness.com/2019/10/
18/maddyfeed-robots-plus-intelligents-humains/. Accessed 7 Feb 2020
123. Anderson, J.R., Law, E.H.: Fuzzy logic approach to vehicle stability control of oversteer. SAE
Int. J. Passeng. Cars—Mech. Syst. (2011). https://doi.org/10.4271/2011-01-0268
124. Abduljabbar, R., Dia, H., Liyanage, S., Bagloee, S.A.: Applications of artificial intelligence
in transport: an overview. Sustain (2019)
125. Saba, D., Laallam, F.Z., Belmili, H., Berbaoui, B.: Contribution of renewable energy hybrid
system control based of multi agent system coordination. In: Souk Ahres University (ed.)
Symposium on Complex Systems and Intelligent Computing (CompSIC). Souk Ahres
University, Souk Ahres (2015)
Big Data and Artificial Intelligence
Applications
In Depth Analysis, Applications
and Future Issues of Artificial Neural
Network
1 Introduction
The brain is a highly complex, non-linear, distributed and parallel processing sys-
tem. It is known that the conventional digital computers existing today are far from
achieving the capability of motor control, pattern recognition, perception and lan-
guage processing of the human brain. This can be due to several reasons. Firstly,
despite having a memory unit and several processing units which helps to perform
the complex operations rapidly, it lacks the capability to adapt. Secondly, it lacks
he capability to learn. Thirdly, we do not understand how to simulate the number of
neurons and their interconnections as it exists in biological systems. Comparison of
brain and computer indicate that the brain has 1011 neurons with switching time of
10−3 s while computer has 109 transistors with switching time 10−9 only. Though
it seems like a flawed comparison since response time and quantity do not clearly
measure the performance of the system but it indicates the advantage of parallelism
in terms of processing time. The largest part of the brain is continuously working
while the largest part of the computer is only a passive data storage. Thus the brain is
perform parallel computations up to its theoretical maximum while the computer is
magnitudes away. Moreover, computer is static and lacks the capability to adapt. On
the other hand, the brain possesses biological neural network that can reorganize and
© The Editor(s) (if applicable) and The Author(s), under exclusive license 149
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_7
150 B. Soni et al.
recognize itself to learn from the environment. Also, we are posed with the challenge
of performing the operations in the natural asynchronous mode.
The study of the ANN is motivated by their similarity to successfully work the
biological systems. The main characteristics that the computer systems try to adapt
are the learning capability which can be supervised or unsupervised, generalization
to find results for similar problems, and fault tolerance.
Artificial Neural Network (ANN) is an information-processing parallel processor
developed as a generalization to the mathematical model of neuro-biology. ANN is
made up of following units:
1. Information processing units called nodes
2. Interconnection links between nodes
3. Each connection link having strengths called weights which are manipulated using
learning from training patterns.
4. Each neuron having an activation function to get an output signal
Mathematically, a neural network is a sorted triple represented as (N , V, w) where
set N represents a set of neurons or nodes and V is a set given as
w:V →R (2)
defines the synapse weights, where wi j represents the weight between node i and
node j.
The paper, begins with a brief history and time-line of the development and
research of ANN. Then it gives the various levels of structural description which
is followed by the overview of the classification of ANN in terms of ANN models.
They are based on connectivity among the neurons and their corresponding architec-
tures. Various neuron models like Perceptron, Multilayer Perceptron, Hopfield Net-
works, Adaptive Resonance Theory (ART) Networks, Kohonen Networks, Radial
Belief Function (RBF) Networks, Self-Organizing Maps and Boltzmann Machine
are discussed. Since ANN is a major constituent of Artificial Intelligence and has its
roots in learning and acquiring knowledge from the environment to mimic the human
brain, training algorithms for various neuron models have been presented. The later
part of the paper covers several real time applications ANN finds its extensive use
in such as pattern recognition, clustering, fuzzy logic, soft computing, forecasting
and neuro-sciences. This paper also discusses the various issues that need to be
considered before ANN can be used for any application. Towards the end, a brief
analysis of ANN as a development project is presented which talks about the steps of
the ANN development cycle beginning from problem definition, design, realization,
verification, implementation and maintenance. It also briefly touches the issues in
the ANN development with the solutions on how it can improve or minimized. To
In Depth Analysis, Applications and Future Issues of Artificial Neural Network 151
conclude, this paper discusses the future scope of development in the field of ANN
and its prominence as a technology. Thus, and its prime importance in this era of
technology advancement and why it is of a considerable significance.
2 Related Work
This section is a literature survey of the development of ANN from the year 1943 to
2013. It discusses the ANN timeline from the time the first step to ANN was taken
to the recent domains where ANN is being currently used.
• 1943–1947: The first step towards development of ANN began with the article
written by Warren McCulloch and Walter Pitts which introduced neural network
modeled using electronic circuits which could compute arithmetic and logical
operations [31]. Later, they also presented the spacial pattern recognition by ANN.
During the same time, Norbert Wiener and John Von Neumann also suggested that
design of brain-inspired computers might be fascinating.
• 1949: A book titled The Organization of Behavior [16] was written by Hebb, which
emphasized on learning by the neuron synapses. Hebb went about proposing a
learning law which said that neural connection is strengthened each time they are
simultaneously activated. The change in strength is proportional to the product of
the activities of the neurons.
Due to absence of funds and lack of conferences and events there were not enough
publications. But, the research continued by independent researchers which led to
the development of various paradigms though it didn’t achieve any recognition.
• 1972: Teuvo Kohonen laid the foundation of a model of the linear associator that
is a model of an associative memory [25]. It follows rules of linearity: (i) if the net
gives a pattern X of outputs for a pattern P of inputs, it will give for k P for k X
and (ii) Suppose that for pattern Q of inputs, the net gives pattern X of outputs;
and that for pattern R, it gives pattern Y . Then for inputs P + Q, we get outputs
X + Y . In the same year, a similar model was presented independently by James
A. Anderson.
• 1973–1974: A non-linear model of a neuron was proposed by Christoph von der
Malsburg. Paul Werbos developed the back-propagation method.
• 1976–1980 and thereafter: Stephen Grossberg developed various neuron models
commonly known as Additive and Shunting Models. He discovered equations
In Depth Analysis, Applications and Future Issues of Artificial Neural Network 153
for Short Term Memory, Medium Term Memory and Long Term Memory. He
partnered with Gail Carpenter for the formulation of models of adaptive resonance
theory (ART). These derivation showed, for the first time, that brain mechanisms
could be derived by analyzing how behavior adapts autonomously in real time to
a changing world [13].
• 1982: The self-organizing feature maps (SOM) also known as Kohonen networks
were developed by Teuvo Kohonen in paper [26]. It is a non-linear map based on
competitive learning and ties to mimic the self organization mechanisms in the
brain. During the same year, the Hopfield network came into picture which was
developed by John Hopfield. It is a recurrent network which has loops. It brings
together the ideas of neurobiology and psychology and present a model of human
memory, known as an associative memory.
• 1983: Fukushima, Miyake and Ito gave a Neuro-cognitron neural model which
is an extension of cognitron developed in 1975 [11] that recognized handwritten
characters.
2.4 Rejuvenation
tem called a reservoir and the dynamics of the reservoir map the input to a higher
dimension. Then a simple readout mechanism is trained to read the state of the
reservoir and map it to the desired output.
• 2010–12: The extension of ANN-Deep Learning became prominent. Dan Claudiu
Ciresan and Jurgen Schmidhuber pioneered the very fist implementation of GPU
ANNs. Both forward and backward ANN was implemented on NVIDIA GTX 280
processor with nine layers. Alex Krizhevsky extended the GPU ANNs which were
an extension of the LeNet (1983).
The paper [30], described ANN in terms of structure or architecture at three different
levels which can be given as-
• Micro-Structure: It deals with node-characteristics of ANN. It is the lowest level.
• Meso-Structure: It deals with network organization in the network. It is a form
related to the function.
• Macro-Structure: Deals with interconnection between networks to accomplish
complex tasks.
3.1 Micro-Structure
Neuron accepts a set of input signals from other neurons, summates the received
signal, adds a threshold value and passes the summated value to the transfer function.
The value obtained are interdependent on the previous value. Neurons differs from
each other mainly on two parameters, changing the transfer function and adding new
parameters such as bias, gain or threshold.
A threshold function is used by ANN which is summed with the summated input
signal. This is known as the bias and is like a constant output independent of the
other input signals. Bias reduces the burden off the training of weights from lower
layer to upper layer to avoid providing an appropriate scaling value. A gain term is
In Depth Analysis, Applications and Future Issues of Artificial Neural Network 155
multiplied by the summated output in some ANNs. Gain can be fixed or adaptive. If
the gain is adaptive then neuron does not produce a constant value but may produce
some signal with stronger strength than others.
3.2 Meso-Structure
Topology
Single layer
feedforward Hybrid
Network Network
Bilayer
Multilayer feedforward
feedforward /Feedback
Network Network
Single layer
Recurrent Topologically
Network Organized
Network
Multilayer Network consists of one input layer, one or more hidden layers (not
visible) and one output layer. Nodes of one layer has directed connections to the
nodes of the next layer only (Figs. 2 and 3).
Fig. 2 a Single layer feed forward network. b Multilayer feed forward network
In Depth Analysis, Applications and Future Issues of Artificial Neural Network 157
Recurrent Network
The feature of recurrent network is that it consists of at least one loop. It may or may
not be a self-feedback loop. It impacts the learning capability and the performance
of the network significantly and is responsible for the nonlinear dynamic behavior
of neural network. Assuming that the neural network consists of the nonlinear units.
Network 1a
It is a two layered network which passes information using both feed-forward con-
nection from first to second layer and feedback connection from second to the first
layer. This type of a network is used for pattern hereto-association where pattern
in one layer is associated with a pattern in another layer. It is commonly used in
Bidirectional Associative Memory (BAM) and Adaptive Resonance Theory (ART)
(Fig. 4).
3.3 Macro-Structure
Often a single type of neural network may not be adequate to solve complex tasks.
Thus, there is a need to develop interacting ANNs. Two or more interacting networks
are usually present in true macro-structures. Macro-structures are aimed to handle
such complex tasks in modular way so that the interactions are isolated. Macro-
structures are the two main types: strongly coupled and loosely coupled networks.
• Strongly coupled networks is a fusion of two or more networks into a single
structure. It tries to eliminate the weaknesses of the interacting systems.
• Loosely coupled networks retain their structural properties and distinctness.
ANN learns when its components or free parameters adapt to the changes in the
environment. It can undergo changes if
In Depth Analysis, Applications and Future Issues of Artificial Neural Network 159
Altering the synaptic weights is the most common procedure to train the network.
Furthermore, deletion and development of interconnections can be realized by ensur-
ing that the connection weight is no longer trained when set to zero and setting a
non-existent connection weight to other than zero respectively. Threshold values can
be modified by adjusting synaptic weights. Changing the network architecture is
difficult to perform. ANN learns by modifying weights according to algorithm rules
The training can be done in the following ways.
In Supervised learning the ANN is trained by providing it with input as well as the
desired or output patterns for the corresponding inputs so that the network can receive
the error vector precisely. It can be self supervised or supervised by a teacher IT is
the easiest and most practical form of training the ANN. The algorithm is shown in
Fig. 8.
The training set consists of input patterns only. The learning machine performs some
action and gets the result/feedback of the performance from the environment. The
value returned indicates how good or bad is the result was right or output. Based on
this the network parameters are adjusted Jha [22].
160 B. Soni et al.
5 Neuron Models
The most commonly used Neuron models to solve real-life problems are-
5.1 Perceptron
and
yk = φ (u k + bk ) (7)
where wk1 , wk2 , ...., wkm are synaptic weights of neuron k, x1 , x2 , ..., xm are input, u k
is the linear combiner output due to inputs, bk is bias, φ (.) is activation function and
yk is output signal of neuron. Perceptron has a disadvantage that it does not converge
if the patterns are linearly separable [15, 53] (Fig. 5).
One of the first generation ANN model was proposed by Widrow and Hoff in 1959.
Adaline was mainly developed to recognize binary patterns. Adaline is a single layer
neuron which uses bipolar inputs and outputs. The training of Adaline happens based
on Delta Rule or least mean square (LMS) [53] (Fig. 6).
Adaline has the output as a linear combination of inputs i.e the weighted sum
of inputs. The weights are essentially continuous and may have negative or positive
values. During the training process, patterns corresponding to the desired output are
presented. An adaptive algorithm automatically adjusts the weights to minimize the
squared error. The flowchart of the training process is shown in Fig. 7.
Perceptron and Adaline have several similarities which makes adaline an extension
of Perceptron and differences which distinguishes it from perecetron. The similarity
and differences are stated as follows:
Similarity
• Architecture is the same
• Both are binary pattern classifier
In Depth Analysis, Applications and Future Issues of Artificial Neural Network 161
and
w0 + w1 xi1 + w2 xi2 + · · · + wn xin < 0 wher e yi = −1 (9)
where x in input vector and w is an adjustable weight vector. The separation between
the hyperplane and the closest data point i.e. d1 for H1 and d2 for H2 is called the
Margin of Separation. The goal of the SVM is to find the hyperplane for which
margin of separation is maximized i.e. d1 = d2 shown in Fig. 8. Such a hyperplane
is called Optimal Hyperplane. The main aim of SVM is to maximize the distance
given by
wT x + b
(11)
||w||
Since the input vector Support Vectors are those data points that lie closest to the
decision boundary and most difficult to classify.
RBF was first introduced by Powell in 1985 as a solution of real multivariate interpo-
lation problem and later it was also worked upon by Broomhead and Lowe in 1988.
RBF finds its utility in classification of complex patterns [15]. RBF is based on the
idea of Covers Theorem which states that a complex pattern classification problem
can cast in high-dimensional space nonlinearly is more likely to be linearly separable
than in low-dimensional space. RBF network has 3 layers with different functions-
• Input Layer—It is mad up of sensory nodes which connects the network to external
environment by taking the input.
• Hidden Layer—They are also called RBF neurons. They allow non linear trans-
formations from the input space to output space. Generally, Hidden Layer is of
high dimension.
• Output Layer—Output neurons are linear and only contain the response of acti-
vation function and one weighted sum as propagation function.
Hopfield network was first introduced in the year 1982. It is a form of Associative
Memory Neural Network with the only change of being a recurrent network. The
network is fully connected with symmetric synaptic weights and no self feedback.
The feedback enabled the network to hold memories. In the Hopfield network only
one of the activation unit is updated at a time when signal is received from the other
unit. A neuron in the Hopfield network is governed by
du i
= Ti, j V j + Ii (12)
dt j=i
and
Vi = g(u i ) (13)
where Ti, j is the weighted connection between two neurons i and j which is a
symmetric matrix with zero diagonal since no self feedback, I j is the input of a
single neuron i and g is a monotonically non-decreasing activation function. The
neurons of Hopfield network tend towards a collective state which minimizes the
energy function
In Depth Analysis, Applications and Future Issues of Artificial Neural Network 165
n
n
E =− Ii Vi − Ti, j Vi V j (14)
i=1 i=1 j>i
The Hopfield network is used for optimization by constructing the Ti, j and Ii
connections such that the minimum points of the energy function correspond to the
optimal solutions of problem. The training algorithm is been given in Fig. 9.
The network is guaranteed to converge at the local minimum but to a specific
pattern is not guaranteed due to oscillations [34].
166 B. Soni et al.
where wk j is the weights connecting neuron j to neuron k and x j is the state of neuron
j. It does not contain any self loops. At some step and temperature T, a random neuron
is selected during the learning process and its state is flipped with probability:
1
P(xk → −xk ) = (16)
1 + ex p(−E/T )
Initially, the temperature is set to relatively high value and the over time, it is grad-
ually decreased according to some annealing schedule. The annealing schedule and
determining the activation of units is central to performance [15].
automatic cartography are some of the applications [6]. Adaline is used for Deter-
mination of the type of classroom namely-regular or special based on the score of
reading tests and written test which calculated into a final score. The calculation
of the final score and the class determination is still done manually which can
causes errors, so adaline is used to predict the correct classroom for each student
based on the final score [14]. SVM is often used to measure Predictive accuracy
which is used as an evaluation criterion for the predictive performance of classi-
fication or data mining algorithms. The recently developed SVM performs better
than traditional learning algorithms in accuracy [20]. Li et al. [29], have applied
SVMs by taking DWFT as input for classifying texture, using translation-invariant
texture features. They used a fusion scheme based on simple voting among mul-
tiple SVMs, each with a different setting of the kernel parameter, to alleviate the
problem of selecting a proper value for the kernel parameter in SVM training
and performed the experiments on a subset of natural textures from the Brodatz
album. They claim that, as compared to the traditional Bayes classier and LVQ,
SVMs, in general, produced more accurate classification results. Bio-informatics
uses SVM approach to represents data-driven method for solving classification
tasks. It has been shown to produce lower prediction error compared to classi-
fiers based on other methods. Small organic molecules that modulate the function
of protein coupled receptors are identified using the SVM. The SVM classifies
approximately 90% of the compounds. This classifier can be used for fast filtering
of compound libraries in virtual screening applications. Some of the applications
are given in [23, 45].
• Biological Modeling
ANN models are inspired paradigm that emulate the brain which is based on
neurons. Recent studies have shown that astrocytes (the most common glial cell)
contributes significantly to brain information processing. Astrocytes can signal to
other astrocytes and can communicate reciprocally with neurons, which suggests
a more active role of astrocytes in the nervous system physiology. The circuit is
designed based on fully connected, feed forward multilayer network and with-
out back-propagation or lateral connections. First phase deals with training which
uses non-supervised learning. This is followed by the training each pattern with
the supervised learning. The second phase uses the evolutionary technique of GA
to perform cross-overs and mutation. Multiple iterations of the above process is
performed to minimize the error. Thus, ANN is used to mimic the neurobiology
of the brain to understand the response mechanism [36].
Perceptron algorithm is used to find weighting function which distinguishes E.
coli translational initiation sites from all other sites in a library of over 78,000
nucleotides of mRNA sequence. Weighting function can be used to find transla-
tional initiation sites within sequences that were not included in the training set
[10].
• Medicine
A medical decision making system based on Least Square Support Vector Machine
(LSSVM) has been developed in paper [35]. Breast Cancer diagnosis task using
In Depth Analysis, Applications and Future Issues of Artificial Neural Network 171
the fully automatic LSSVM was done. WBCD dataset was used which strongly
suggest that LSSVM can aid in the diagnosis of breast cancer.
• Time Series Forecasting
ANN is commonly used for Time Series Prediction. Based on the past values,
with some error, it can predict the future values. The non-linear learning and noise
tolerance of the real time series cannot be sufficiently represented by other tech-
niques such as Box and Jenkins. ANN is significantly used and several works have
been done in the domain. However, the search for the ideal network structure still
remains a complex and crucial task. In year 2006, a hybrid technique constituting
autoregressive model and ANN model with a single hidden layer was developed
which allows you to specify parsimoniously models at a low computational cost.
Hassan, Nath and Kerley proposed the hybrid model of HMM and GA to predict
the financial market behavior. Several other models such as AR* model and gen-
eralized regression model, hybrid of fuzzy and ANN model was also proposed.
In year 2013, a hybrid using ARIMA model was proposed in order to integrate
the advantages of both models. In 2017, a hybrid model based on Elman recur-
rent neural networks (ERNN) with stochastic time effective function (STNN) was
produced which displayed the best results as compared to linear regression, com-
plexity invariant distance (CID), and multi-scale CID (MCID) analysis methods.
Thus, ANN plays a significant role in Time series Forecasting [48].
• Data Mining
Data mining is aimed at gaining insight into large collections of data. The data
mining based on neural network is composed by data preparation, rules extracting
and rules assessment three phases namely, Data preparation, Rule Extraction and
Rule Assessment. ANN’s ability to classify patterns and estimate functions make
its use in Data mining more significant. It is used to map complex input-output
relationships and performs manipulation and cross fertilization to find patterns in
data. This helps users to make more informed decisions. ANN is trained store,
recognize, and retrieve patterns to filter noisy data, for combinatorial optimization
problems and to control ill-defined problems [12, 41].
• Servo Control
ANN has emerged as a tool for controlling the response of parameters which are
varying greatly in comparison to the PID controller due to its dynamic an robust
control nature. Since ANN needs only input and output, it is easier to predict the
parameters more correctly. This is extensively used in robotic actuators for motion
control [3].
• Cryptography
Perceptron can used in cryptographic security. Based on the high-dimension
Lorenz chaotic system and perceptron model within a neural network, a chaotic
image encryption system with a perceptron model is proposed. The experimen-
tal results show that this algorithm has high security, and strong resistance to the
existing attack methods [50].
• Civil Engineering
ANN is commonly used in various domains of civil engineering including struc-
tural engineering, construction engineering and management, environmental and
172 B. Soni et al.
• Compiler Design
Incremental Parsing Approach where parameters are estimated using a variant of
the perceptron algorithm [9].
• Finance
ANNs can be applied in several areas of finance too. They include stock price
prediction, security trading systems, predicting bond ratings, modeling foreign
exchange markets, evaluating risk for applications of loans, mortgages and credit
cards and even prediction of financial distress.
Due to the capability to deal with unstable relations, ANNs can be designed and
trained to identify complex patterns and predict stock price trends. It can recognize
the stocks that outperform the market more accurately than regression models.
ANN for trading systems is commonly used to manage fidelity fund, pension funds
and market returns. Bond Rating Prediction using limited data sets is commonly
done. ANN applications for use in currency trading that are fee-based are not
uncommon. They ensure services to a range of customers [36].
1. Hardware Implementation
Neural nets are inspired from neurological architectures. They are increasingly
used to mimic the biological counterparts. It is indeed imperative to focus on the
fact that can such hardware and electronic equipment be developed to ensure high-
speed implementation of ANN in digital computers. It can be quite challenging
to implement the neural network using VLSI for real-time systems.
• The input current received by and output current fed to a typical transistor
is usually to 2–3 other transistors. The size of the neuro-network is typically
scaled to 3000 which is a considerable quantitative difference.
• Electronic circuits have time delays as well as the structure is fixed. This is
unlike the brain where the time delays are negligible due to intensive network.
• The circuit of biological network changes based on learning but the electronic
circuit structure remain fixed.
• Electronic circuits are mostly feed-forward network while the flow in biological
network largely contains feedback loops and is very much a two-way.
In Depth Analysis, Applications and Future Issues of Artificial Neural Network 175
• VLSI circuits are highly synchronized as they are governed by clock cycles
which is not the case in neural-biology. It is a highly dynamic system.
• Since CMOS ANN have own activation function which is different from any
activation function used in ANN, it is required to develop an approximation
function.
• Training of the network using this activation function is another challenge for
the designers.
• The discrete values of the channel length (L) and width (W) of MOS transistor
architecture causes the Quantization effect.
• Power consumption of circuit to mimic the biological network is high [48, 54].
2. Verification,Validation and Testing
The absence of a standard verification and validation procedure causes disparity
[47]. Different development stages have varied verification and validation proce-
dures. This leads to lack of trust and consistent assessment of the ANN used in
different domains. It consists of issues such as-
• Verification that the system learned the correct data and not something closely
related to the data?
• How many iterations to be performed on the network for training?
• Has the network converged to the global minimum or a local minima?
• How will the ANN handle the test data?
• Can the network show the same result each time the same set is given as input?
• Is there a quantifiable technique to describe the networks memor y or data
retention?
• Are the correct set of input variables chosen for the network as per the problem
domain?
3. Over-Learning, Generalizing and Training Set
The aim of the training of ANN is to minimize the error generated in categorization
of the sample data set. Lack of gathering or identifying the right (the data that can
correlate with the outcomes that need to be predicted) and sufficient training data
can hamper the training. This might lead to Excessive learning or over-adaptation
i.e. ANN performs well on the sample data but not on test data. This is due to
wrong noise data contained in training data. For example, if the data is for human
face recognition but all photos contain cat faces predominantly the system learns
to identify cat faces. Such a system is definitely not reliable. It is important to
ensure performance of the sample and selection data(test data) should almost be
similar. To avoid over-learning the number of hidden neurons can be reduced.
If the learning is weak, new training data can be constantly added to contribute
significantly. Thus, the goal of learning is to find an equilibrium between over-
learning and generalization ability where the training set plays a key role which
is challenging [49].
176 B. Soni et al.
6. Parameter Optimization
Parameters such as learning rate, momentum coefficient, number of epochs and
the training an test data size should be carefully selected since they have adverse
impact on the training of the network. It might cause slow training, increased risk
on local minima and reduce the generalizing capability of the network.
In Depth Analysis, Applications and Future Issues of Artificial Neural Network 177
ANN has spread so much that now it is part and parcel of computer learning. Through
filtration, this method has improvised itself to the extent that without Artificial Neu-
ral Network, no one can thing about training any machine learning, specially deep
learning models.
This issue is the main drawback of the limited calculation ability of computers.
But this technology has been developed so much. The capability of the electronic
components are increasing drastically and their size is shrinking day by day. However
the basic component of a hardware like transistor, it’s growing performance will reach
it’s limit due to the limitation of it’s size. Their sizes can not be reduced below a
certain level. To solve this issue, optical transistor are being developed. The optical
logic gates are much faster than conventional electric gates. If they are completely
developed, this will boos the processing power of hardware by a huge scale.
In recent times the cloud computing has grown so much. Google has developed
Tensor Processing Unit (TPU) [51], that boosts the computation power of hardware
by the factor no one has ever thought before. The TPU service is also provided by
google to it’s clients to perform tremendous amount of calculations in the cloud. So
hardware problem is not going to be the issue with huge calculation task in near
future. With each version of TPU, the speed is increasing tremendously. The peak
performance of TPU v3 was up to 420 TFLOPS. US Department of Energy’s Oak
Ridge National Laboratory announced the top speeds of its Summit supercomputing
machine, which nearly laps the previous record-holder, China’s Sunway TaihuLight
[55]. The Summit’s theoretical peak speed is 200 petaflops, or 200,000 teraflops. So
a single TPU chip capacity now can be compared with the world’s most powerful
supercomputer.
In the Table 3, a comparison between the performance between CPU and GPU
can be seen. Various tasks were performed on these both, and their measure was
compared. Multi Layer Perceptron(MLP), Long Short Term Memory(LSTM) and
Convolutional Neural Network(CNN) model was trained for two times. Their per-
formance are compared with when the same code was ran through TPU. The ratio
implies the significant increase in the processing power of newly developed hardware.
With the recent development of web and cloud services, the data that are being
created day by day is increasing by a very large amount. This large amount of data is
178 B. Soni et al.
Table 3 K80 GPU die and TPU die performance relative to CPU for the NN workload. GM and
WM are geometric and weighted mean [24]
Type MLP0 MLP1 LSTM0 LSTM1 CNN0 CNN1 GM WM
GPU 2.5 0.3 0.4 1.2 1.6 2.7 1.1 1.9
TPU 41.0 18.5 3.5 1.2 40.3 71.0 14.5 29.2
Ratio 16.7 60.0 8.0 1.0 25.4 26.3 13.2 15.3
that enables the ANN to perform significantly better than the early days. In present
days, the collection of data for dataset is not a big thing, millions of images can be
collected and trained on ANN, after which thousands of other images can be used to
validate, verify and test.
While training the model, the parameters that are being fetched determines the quality
of output the model will provide [8]. The new method of implementation are being
used today for better output that is gained using precise parameters. In recent times,
a model is also being developed that is used to tune the hyper parameters of a neural
network model [4]. This is a giant leap in the designing of neural network models.
The model does this with the help of recurrent neural networks that performs so well
and updates itself after each iteration. So the hyper parameter selection is not a issue
anymore, however this technology is not available for public at this time.
In Depth Analysis, Applications and Future Issues of Artificial Neural Network 179
There are another models being developed that can program it’s child neural
network model. It can learn from it’s experience gradually and implement it in the
design of it’s child neural network. These are the giant leaps that are around us
today. The image classifier models are getting very accurate day by day. There has
been numerous models for various computer vision task like pose estimation, text
recognition etc. The technology is not going to be a barrier in near future.
The designing method of ANN is developing day by day. Neural Network was made to
mimic human brain functions. Despite of the recent exponential growth, the hardware
is still not in the phase to take the load that a normal human brain takes throughout
the day. Trillions of neurons fire simultaneously making the brain a large parallel
processing device. A parallel processing of this scale is not possible today despite
of the improvements in technology. The technology related to Artificial Narrow
Intelligence has been growing rapidly day by day, but the construction of Artificial
General Intelligence in not possible in the near future. A drastic boost is needed
for this to happen. Researchers are doing their best each day to modify and update
the computation procedure. The designing of Artificial General Intelligence will
help humanity to reduce their effort in general day to day tasks and they can focus
in the task where resources should be focused. Artificial Super Intelligence might
be the last invention that humanity ever will invent. A super intelligent AI will
outperform every existing computer, human even combined together by a very huge
logarithmic scale. People are worried about the singularity phase that the Artificial
Super Intelligence will posses that no one will understand. However this scenario
does not seem possible in the far future. However development in technology might
lead humanity to a super intelligent AI one day [40]. This is becoming world’s
most exciting technology. Almost every industry has already started investing on it.
Various startups are actively associating themselves in this industry. The statistics is
showing very good results.
As can be seen in recent studies, the annual investment in the field of Artifi-
cial Intelligence has started to grow exponentially. The Venture Capital invested on
this field is currently way more than 3 Billion dollars. robotic Technology has also
started evolving with a very good momentum. Companies like Boston Dynamics
are producing both private as well as commercial robots with the help of artificial
intelligence. This is the reason why this field has become a golden opportunity for
the start-ups. They have also started to focus on this technology to invent something
new or to improvise the existing ones. Recent job count in this field has started to
increase significantly. Job opportunities in the field of data science, NLP, Computer
Vision etc. are increasing day by day. This is a very good sign for the AI industry.
180 B. Soni et al.
11 Conclusions
Acknowledgments This work was supported by Multimedia and Image Processing Laboratory,
Department of Computer Science and Engineering National Institute of Technology Silchar, India.
References
1. Adeli, H.: Neural networks in civil engineering: 1989–2000. Comput. Aided Civil Infrastruct.
Eng. 16(2), 126–142 (2001)
2. Ai, Q., Zhou, Y., Xu, W.: Adaline and its application in power quality disturbances detection
and frequency tracking. Electric Power Syst. Res. 77(5–6), 462–469 (2007)
3. Anderson, D., McNeill, G.: Artificial neural networks technology. Kaman Sci. Corporation
258(6), 1–83 (1992)
4. Bardenet, R., Brendel, M., Kégl, B., Sebag, M.: Collaborative hyperparameter tuning. In: Das-
gupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine
Learning, PMLR, Atlanta, Georgia, USA, Proceedings of Machine Learning Research, vol. 28,
pp. 199–207 (2013)
5. Basheer, I.A., Hajmeer, M.: Artificial neural networks: fundamentals, computing, design, and
application. J. Microbiol. Methods 43(1), 3–31 (2000)
In Depth Analysis, Applications and Future Issues of Artificial Neural Network 181
6. Basu, J.K., Bhattacharyya, D., Kim, Th: Use of artificial neural network in pattern recognition.
Int. J. Softw. Eng. Appl. 4(2) (2010)
7. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In:
Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152.
ACM (1992)
8. Caruana, R., Lawrence, S., Giles, C.L.: Overfitting in neural nets: backpropagation, conjugate
gradient, and early stopping. In: Advances in Neural Information Processing Systems, pp.
402–408 (2001)
9. Collins, M., Roark, B.: Incremental parsing with the perceptron algorithm. In: Proceedings
of the 42nd Annual Meeting on Association for Computational Linguistics, Association for
Computational Linguistics, p. 111 (2004)
10. Connally, P., Li, K., Irwin, G.W.: Prediction-and simulation-error based perceptron training:
Solution space analysis and a novel combined training scheme. Neurocomputing 70(4–6),
819–827 (2007)
11. Fukushima, K., Miyake, S., Ito, T.: Neocognitron: a neural network model for a mechanism of
visual pattern recognition. IEEE Trans. Syst. Man Cybern. 5, 826–834 (1983)
12. Gaur, P.: Neural networks in data mining. Int. J. Electron. Comput. Sci. Eng. (IJECSE, ISSN:
2277-1956) 1(03), 1449–1453 (2012)
13. Grossberg, S.: Adaptive pattern classification and universal recoding: I. Parallel development
and coding of neural feature detectors. Biol. Cybern. 23(3), 121–134 (1976)
14. Handayani, N., Aindra, D., A, Wahyulis, D.F., Pathmantara, S, Asmara, R.A.: Application
of adaline artificial neural network for classroom determination in elementary school. IOP
Conference Series: Materials Science and Engineering 434, 012030 (2018). https://doi.org/10.
1088/1757-899X/434/1/012030
15. Haykin, S.: Neural Networks: A Comprehensive Foundation. International edition, Prentice
Hall, URL https://books.google.co.in/books?id=M5abQgAACAAJ (1999)
16. Hebb, D.O.: The Organization of Behavior: A Neuropsychological Theory. Psychology Press
(2005)
17. Hoi, S.C., Jin, R., Zhu, J., Lyu, M.R.: Semisupervised svm batch mode active learning with
applications to image retrieval. ACM Trans. Inf. Syst. (TOIS) 27(3), 16 (2009)
18. Hopfield, J.J.: Artificial neural networks. IEEE Circuits Dev. Mag. 4(5), 3–10 (1988a)
19. Hopfield, J.J.: Artificial neural networks. IEEE Circuits Dev. Mag. 4(5), 3–10 (1988b)
20. Huang, J., Lu, J., Ling, C.X.: Comparing naive bayes, decision trees, and svm with auc and
accuracy. In: Third IEEE International Conference on Data Mining, IEEE, pp 553–556 (2003)
21. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol.
112. Springer (2013)
22. Jha, G.K.: Artificial neural networks and its applications. IARI, New Delhi, girish_iasri@
rediffmail com (2007)
23. Jha, R.K., Soni, B., Aizawa, K.: Logo extraction from audio signals by utilization of internal
noise. IETE J. Res. 59(3), 270–279 (2013)
24. Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia,
S., Boden, N., Borchers, A., et al.: In-datacenter performance analysis of a tensor processing
unit. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture
(ISCA), pp. 1–12. IEEE (2017)
25. Kohonen, T.: Correlation matrix memories. IEEE Trans. Comput. 100(4), 353–359 (1972)
26. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern.
43(1), 59–69 (1982)
27. Kriesel, D.: A Brief Introduction to Neural Networks. URL available at http://www.dkriesel.
com (2007)
28. Kuschewski, J.G,. Engelbrecht, R., Hui, S., Zak, S.H.: Application of adaline to the synthesis
of adaptive controllers for dynamical systems. In: 1991 American Control Conference, pp.
1273–1278. IEEE (1991)
29. Li, S., Kwok, J.T., Zhu, H., Wang, Y.: Texture classification using the support vector machines.
Pattern Recog. 36(12), 2883–2893 (2003)
182 B. Soni et al.
30. Maren, A.J., Harston, C.T., Pap, R.M.: Handbook of Neural Computing Applications. Aca-
demic Press (2014)
31. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull.
Math. Biophys. 5(4), 115–133 (1943)
32. Minsky, M., Papert, S.: An Introduction to Computational Geometry. Cambridge tiass, HIT
(1969)
33. Nilsson, N.J.: Learning Machines (1965)
34. Padhy, N., Simon, S.: Soft Computing: With MATLAB Programming. Oxford higher education,
Oxford University Press, URL https://books.google.co.in/books?id=lKgdswEACAAJ (2015)
35. Polat, K., Güneş, S.: Breast cancer diagnosis using least square support vector machine. Digital
Signal Process. 17(4), 694–701 (2007)
36. Rabuñal, J.R.: Artificial neural networks in real-life applications. IGI Global (2005)
37. Rojas, R.: Neural Networks: a Systematic Introduction. Springer Science & Business Media
(2013)
38. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization
in the brain. Psychol. Rev. 65(6), 386 (1958)
39. Schrauwen, B., Verstraeten, D., Van Campenhout, J.: An overview of reservoir computing:
theory, applications and implementations. In: Proceedings of the 15th European Symposium
on Artificial Neural Networks, pp. 471–482, pp. 471–482 (2007)
40. Shabbir, J., Anwer, T.: Artificial intelligence and its role in near future. CoRR abs/1804.01396
(2018)
41. Singh, Y., Chauhan, A.S.: Neural networks in data mining. J. Theor. Appl. Inf. Technol. 5(1)
(2009)
42. Soni, B., Debnath, S., Das, P.K.: Text-dependent speaker verification using classical lbg, adap-
tive lbg and fcm vector quantization. Int. J. Speech Technol. 19(3), 525–536 (2016)
43. Soni, B., Das, P.K., Thounaojam, D.M.: Improved block-based technique using surf and fast
keypoints matching for copy-move attack detection. In: 2018 5th International Conference on
Signal Processing and Integrated Networks (SPIN), pp. 197–202. IEEE (2018a)
44. Soni, B., Das, P.K., Thounaojam, D.M.: Keypoints based enhanced multiple copy-move forg-
eries detection system using density-based spatial clustering of application with noise clustering
algorithm. IET Image Process. 12(11), 2092–2099 (2018b)
45. Soni, B., Das, P.K., Thounaojam, D.M.: Multicmfd: fast and efficient system for multiple copy-
move forgeries detection in image. In: proceedings of the 2018 International Conference on
Image and Graphics Processing, pp. 53–58 (2018c)
46. Steinbuch, K.: Die lernmatrix. Biol. Cybern. 1(1), 36–45 (1961)
47. Taylor, B.J., Darrah, M.A., Moats, C.D.: Verification and validation of neural networks: a
sampling of research in progress. Intell. Comput.: Theory Appl. Int. Soc. Optics Photon. 5103,
8–17 (2003)
48. Tealab, A.: Time series forecasting using artificial neural networks methodologies: a systematic
review. Future Comput. Inf. J. (2018)
49. Vemuri, V.R.: Main problems and issues in neural networks application. In: Proceedings 1993
The First New Zealand International Two-Stream Conference on Artificial Neural Networks
and Expert Systems, p. 226. IEEE (1993)
50. Wang, X.Y., Yang, L., Liu, R., Kadir, A.: A chaotic image encryption algorithm based on
perceptron model. Nonlinear Dyn. 62(3), 615–621 (2010)
51. Wei, G.Y., Brooks, D., et al.: Benchmarking tpu, gpu, and cpu platforms for deep learning.
arXiv preprint arXiv:190710701 (2019)
52. Widrow, B., Hoff, M.E.: Adaptive Switching Circuits. Stanford Univ Ca Stanford Electronics
Labs, Tech. rep. (1960)
53. Widrow, B., Lehr, M.A.: 30 years of adaptive neural networks: perceptron, madaline, and
backpropagation. Proc. IEEE 78(9), 1415–1442 (1990)
54. Wilamowski, B.M., Binfet, J., Kaynak, M.O.: Vlsi implementation of neural networks. Int. J.
Neural Syst. 10(03), 191–197 (2000)
In Depth Analysis, Applications and Future Issues of Artificial Neural Network 183
55. Wolfson, E.: The US passed China with a supercomputer capable of as many calcula-
tions per second as 6.3 billion humans. https://qz.com/1301510/the-us-has-the-worlds-fastest-
supercomputer-again-the-200-petaflop-summit/. Accessed 01 Dec 2019 (2018)
56. Wooldridge, M., Jennings, N.R.: Intelligent agents: theory and practice. Knowl. Eng. Rev.
10(2), 115–152 (1995)
57. Yadav, N., Yadav, A., Kumar, M., et al.: An Introduction to Neural Network Methods for
Differential Equations. Springer (2015)
Big Data and Deep Learning in Plant
Leaf Diseases Classification
for Agriculture
Mohamed Loey
1 Introduction
M. Loey (B)
Faculty of Computers and Artificial Intelligence, Department of Computer Science, Benha
University, Benha 13511, Egypt
e-mail: [email protected]
© The Editor(s) (if applicable) and The Author(s), under exclusive license 185
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_8
186 M. Loey
helped humanity to advance and develop. Today, the most critical activities world-
wide are the husbandry and nutrition industry, due to the increasing population and the
increasing growth of their needs for eating for their life to continue. Husbandry and
farming not only providing eating and raw material but also provides employment
opportunities to a very large percentage of the population [8–11]. The increasing
popularity of sensing mechanisms in the farm includes RGB imaging, spectral,
thermal imaging, near-infrared imaging which can be ground-based or air-based
on airborne drones to capture big data images [12].
Nutrition losses and waste in medium-income countries occur mainly at an early
step of the nutrition value phase and can be affected by financial, managerial, and
technical constraints in harvesting mechanisms as well as stockpiling and refrigera-
tion facilities [13, 14]. The global nutrition supply is annually reduced demonstrating
that our collective battle against leaf plant malady and pests in plants is not won. Leaf
plant malady can be affected by different types of bacteria, viruses, fungi, pests, and
other agents. Diseased plant symptoms can include leaf spots, blights, fruit spots,
root, fruit rots, wilt, dieback, and decline. The major impact of the leaf plant malady
is decreasing the nutrition ready-made to peoples. This can outcome in unsuitable
nutrition to people or lead to hunger in some areas [15, 16].
Detecting and medicating different types of bacteria, viruses, fungi, pests were
done by the unclad eye of the farmer by manually examining the plant leaf on-site
and this stage is slow, and costly. The need for a partially or fully automated plant
disease detection frameworks is a major growing research area to detect diseased
plants. Plant Leaf disease detection is of extreme importance to recommend and
choose the proper medication for diseased plants to prevent infections of uninfected.
The plant leaf is the most common way to detect plant disease as it shows different
symptoms for different plant diseases.
Conv-Layer1
Conv-Layer2
Conv-Layer3
Conv-Layer4
Max-Conv4
Max-Pool1
Max-Pool2
Max-Pool3
Image
FC1
FC2
FC3
Fig. 3 Layers of AlexNet
Fungus or fungi lead to fungal leaf plant diseases. Fungi malady was responsible for
most of leaf biotic plant diseases. Moreover, pathogenic fungi such as anthracnose,
leaf spot, gall, canker, blight, rust, wilt, scab, coils, root rot, damping-off, mildew,
and dieback lead to leaf plant diseases. Fungal spores are travel through the air by
wind, water, insects, soil, and other invertebrates to infect other leaf plants [37].
Viruses lead to viral leaf plant diseases. It is the rarest type of plant leaf Malady.
Moreover, once plant infected, there are no chemical treatments to get rid of some
viral leaf plant diseases.
Big Data and Deep Learning in Plant Leaf … 189
Leaf plant biotic bacteria malady cause many serious maladies of plants throughout
the farms. Biotic bacteria malady is lead to more than 190 types of biotic bacteria.
Bacteria can dispersal through sprinkle water, insects, other tools.
This part summarized the latest leaf plant malady detection papers for applying
deep neural learning in the field of cultivation that shown in Table 1 that sum those
researches.
Table 1 Survey of research papers based on deep learning in leaf plant diseases detection
Year References Database Error rate (%)
Crop type Source # of classes No. of images
2019 [62] 5 crops Plant Village 9 15210 14
2019 [63] 2 crops Own 2 3663 13
2019 [64] 19 crops Plant Village 38 54,306 6.18
2018 [54] Maize Plant Village, 9 500 1.10
websites
2018 [55] 14 crops Plant Village 38 54300 0.25
2018 [56] Tomato Plant Village 7 13262 2.51
2018 [59] Cucumber Plant Village, 4 1184 6
websites
2018 [60] Wheat Own 4 8178 13
2018 [57] 19 crops Plant Village 38 56 k 8
2018 [58] Tomato Plant Village 5 500 14
2017 [38] Tomato Own 9 5k 14
2017 [45] Banana Plant Village 3 3700 0.28
2017 [46] Apple Plant Village 4 2086 9.60
2017 [44] Wheat Own 7 9320 2.05
2017 [49] Tomato Plant Village 10 18 k 4.35
2017 [50] Tomato Plant Village 10 14828 0.82
2017 [47] Rice Own 10 500 4.52
2017 [48] Apple Own 4 1053 2.38
2017 [51] Maize Own 2 1796 3.30
2017 [52] Cassava Own 6 2756 7
2017 [53] Olive Plan tVillage, 3 299 1.40
Own
2016 [41] Cucumber Own 7 7250 16.80
2016 [39] 5 crops Internet 15 3k 3.70
2016 [43] 14 crops Plant Village 38 54300 0.66
2015 [40] Cucumber Own 3 800 5.10
190 M. Loey
A deep transfer learning (DTL) detector has been introduced for tomato malady
and their epidemic detection in [38] using a database of tomato leaf plant maladies
that have 5 k pictures from filed cameras as shown in Fig. 4. The proposed detec-
tion framework used the ResNet50 with Region FCN. The framework achieved a
minimum error rate accuracy of 14%. In [39] prosaists used a ConvNet to classify
different types of leaf plant malady out of uninfected leaves. The database used was
downloaded from a global computer network and depended on more than 3 k original
images. The database representing infected leaves in different fields and two other
classes for uninfected leaves and background images. Using data augmentation, the
database enriched to more than 30 k images. Prosaists used a DTL Caffe-Net from
ConvNet paradigm attaining precision between 91 and 98%, for separate class tests
and an overall error rate of 3.7%.
In [40] prosaists introduced a classification framework for viral leaf plant malady.
Prosaists used their own convolutional neural network paradigm and improved the
database by rotating the pictures more than 30 times every ten degrees. The database
used to contain 8 × 100 pictures of cucumber leaf pictures representing 2 different
malady and also well leaves. The proposed paradigm achieved a maximum detection
error rate of 5.1% using a 4-fold cross-validation technique. In [41] Same prosaists
introduced another research for classifying some types of viral cucumber malady.
They have used a database of more than 7 k pictures including viral malady and
healthy leaves. They have further divided the pictures into duo databases of uninfected
and infected pictures. by data augmentation methods, 4-fold cross-validation strategy
by training two ConvNet paradigms for each database, the classifiers achieved an
average error rate of 17.7%.
Plant-Village [42] is a large public database whose samples are shown in Fig. 5
of roughly more than 54 k pictures was used in [43] where prosaists used a ConvNet
Big Data and Deep Learning in Plant Leaf … 191
to identify more than ten crop species and 26 unhealthy leaves with a total of 38
labels. Prosaists used Alex-Net and GoogLe-Net paradigms with and without DTL
gray and RGB pictures are segmented with different splitting ratios. Using GoogLe-
Net with RGB pictures, the minimum achieved error was 0.7%. But when testing
the introduced framework with pictures downloaded from the Internet the error was
68.6%.
In [44] prosaists used their wheat malady database that contains of more than 9 k
pictures annotated at picture level based on husbandry experts. The database is split
into seven wheat infirmity labels including an uninfected label. They have introduced
a new model that uses a fully deep connected network (FDCN) to produce a spatial
score that is collected by multiple instance learning (MIL) algorithm to recognize the
infection, then localizing infected area of the leaf plant. The framework introduced
4 different paradigms; the best paradigm achieved a minimum error rate of 2.05%.
LeNet architecture used in [45] to classify 2 types of malady out of uninfected
and infected banana leaf pictures collected from the Plant-Village database. The
192 M. Loey
database consists of 3700 pictures and the paradigm was trained by trying different
trainset/testset splits in gray and RGB scaled pictures. The introduced framework
achieved a minimum error of 0.28% based on a 50% train/test split. In [46] used also
PlantVillage database where prosaists used deep neural learning in classifying the
seriousness of the apple black rot contagion. The portion of the database they used
contains over nearly 160 pictures at 4 stages. Four different paradigms of ConvNet
paradigms (VGG16, 19, Inception-V3, and ResNet50) used to achieve a minimum
error rate of 9.6%.
Classification of ten common rice malady was introduced in [47] where prosaists
created a custom deep ConvNet paradigm with different convolution filter sizes
and pooling strategies. The proposed framework used a database consisted of 500
pictures of infected and uninfected rice leaves and stems captured from the exper-
imental field. Another machine learning method such as standard backpropagation
algorithm, particle swarm optimization, and support vector machines compared with
the proposed framework outcomes. The proposed framework with stochastic pooling
achieved an error rate 4.52% under the 10-fold cross-validation strategy. In [48]
also used deep ConvNet for detecting 4 common types of apple leaf malady. In
China, prosaists captured 1053 pictures from two apple experiment stations. The
database was enriched using digital image processing techniques to produce more
than 13 k pictures. The introduced ConvNet paradigm consisted of an adjusted
Alex-Net paradigm followed by two max-pooling and Inception layers. Outcomes
were compared with ConvNet transfer paradigms such as Alex-Net, GoogLe-Net,
VGGNet-16, and ResNet-20. The best recognition error of nearly 2.38% has achieved
using the adjusted proposed paradigm.
In [49] prosaists introduced a deep neural learning framework for malady clas-
sification on the leaves of the tomato leaf plant. Tomato pictures from the Plant-
Village database used that comprised about 18 k pictures of ten different infected
picture labels including an uninfected one. Prosaists used AlexNet and SqueezeNet
paradigms and achieved error rate outcomes of 4.35, and 5.70% respectively.
Comparing deep neural learning paradigms and traditional machine learning tech-
niques was done in [50] where writers used both AlexNet and GoogLeNet to identify
tomato infections. The proposed framework used leaf plant pictures from the Plant-
Village database. Outcomes showed that pre-trained deep paradigms significantly
perform better than shallow deep neural learning paradigms with a top error rate of
0.82% based on GoogLeNet.
The proposed ConvNets frameworks used in [51] to classify the northern-leaf
blight (NLB) malady in maize crop as have or not the NLB lesions to labeled
whether or not the entire picture contains malady leaf plant or not. The database
used 1796 pictures, divided into more than 1 k unhealthy and 750 healthy leaves of
maize pictures. The paradigm achieved a minimum error rate of 3.3% on a suggested
test set. DTL paradigm was used in [52] to detect five different uninfected of the
strategic crop of cassava which is considered in the world as the third-largest source
of carbohydrates for human nutrition. The database taken from experimental fields
consisted of 2756 pictures, that were manually cropped into individual leaflets to
produce 15 k pictures of the cassava leaflets. Prosaists used a pre-trained paradigm
Big Data and Deep Learning in Plant Leaf … 193
classification using apple and tomato leaf pictures of uninfected and infected leaf
plants. The paradigm consists of four convolutional layers each followed by pooling
layers. The database contains 3663 of leaf pictures based on apple and tomato leaf
picture. Training of the ConvNet paradigm was achieving a miss-classification of
13%. Finally, Based on NAS-Net a new paradigm for plant malady recognition
[64]. The NAS-Net was a trainset/testset using a publicly available Plant-Village
project database that contains varied pictures of plant leaves. Using the paradigm, a
miss-classification rate of 6.18% was achieved.
5 Conclusion
Convolutional neural networks (CNN) and deep learning have revolutionized image
processing and artificial intelligence research producing state of the art results, espe-
cially in the image detection field. After surveying different researches that have used
DTL in precision husbandry especially in detecting plant leaf malady, it has been
found that DTL has brought a massive advancement. It was also shown that DTL
allows the growth of precision husbandry and independent husbandry robots for the
cutting-edge of husbandry and nutrition production. Smart-phone frameworks are
also a new tool for using DTL for helping farmers.
Big Data and Deep Learning in Plant Leaf … 197
References
1. O’Leary, D.E.: Artificial intelligence and big data. IEEE Intell. Syst. 28, 96–99 (2013). https://
doi.org/10.1109/MIS.2013.39
2. Kibria, M.G., Nguyen, K., Villardi, G.P., et al.: Big data analytics, machine learning, and arti-
ficial intelligence in next-generation wireless networks. IEEE Access 6, 32328–32338 (2018).
https://doi.org/10.1109/ACCESS.2018.2837692
3. Allam, Z., Dhunny, Z.A.: On big data, artificial intelligence and smart cities. Cities 89, 80–91
(2019). https://doi.org/10.1016/j.cities.2019.01.032
4. Shang, C., You, F.: Data analytics and machine learning for smart process manufacturing:
recent advances and perspectives in the big data era. Engineering (2019). https://doi.org/10.
1016/j.eng.2019.01.019
5. Ngiam, K.Y., Khor, I.W.: Big data and machine learning algorithms for health-care delivery.
Lancet Oncol. 20, e262–e273 (2019). https://doi.org/10.1016/S1470-2045(19)30149-4
6. Elijah, O., Rahman, T.A., Orikumhi, I., et al.: An overview of Internet of Things (IoT) and data
analytics in agriculture: benefits and challenges. IEEE Internet Things J. 5, 3758–3773 (2018).
https://doi.org/10.1109/JIOT.2018.2844296
7. Aznar-Sánchez, J.A., Piquer-Rodríguez, M., Velasco-Muñoz, J.F., Manzano-Agugliaro, F.:
Worldwide research trends on sustainable land use in agriculture. Land Use Policy 87, 104069
(2019). https://doi.org/10.1016/j.landusepol.2019.104069
8. Ncube, B., Mupangwa, W., French, A.: Precision agriculture and food security in Africa BT.
In: Katerere, D., Hachigonta, S., Roodt, A., Mensah, P. (eds.) Systems analysis approach for
complex global challenges, pp. 159–178. Springer International Publishing, Cham (2018)
9. Gebbers, R., Adamchuk, V.I.: Precision agriculture and food security. Science (80-) 327, 828–
831 (2010). https://doi.org/10.1126/science.1183899
10. Erbaugh, J., Bierbaum, R., Castilleja, G., et al. Toward sustainable agriculture in the tropics.
World Dev. 121, 158–162 (2019). https://doi.org/10.1016/j.worlddev.2019.05.002
11. Liu, S., Guo, L., Webb, H., et al.: Internet of Things monitoring system of modern eco-
agriculture based on cloud computing. IEEE Access 7, 37050–37058 (2019). https://doi.org/
10.1109/ACCESS.2019.2903720
12. Ip RHL, Ang L-M, Seng KP, et al (2018) Big data and machine learning for crop protection.
Comput Electron Agric 151:376–383. https://doi.org/10.1016/j.compag.2018.06.008
13. Abu Hatab, A., Cavinato, M.E.R., Lindemer, A., Lagerkvist, C.-J.: Urban sprawl, food security
and agricultural systems in developing countries: A systematic review of the literature. Cities
94, 129–142 (2019). https://doi.org/10.1016/j.cities.2019.06.001
198 M. Loey
14. Pretty, J.N., Morison, J.I.L., Hine, R.E.: Reducing food poverty by increasing agricultural
sustainability in developing countries. Agric. Ecosyst. Environ. 95, 217–234 (2003). https://
doi.org/10.1016/S0167-8809(02)00087-7
15. Strange, R.N., Scott, P.R.: Plant disease: a threat to global food security. Annu. Rev.
Phytopathol. 43, 83–116 (2005). https://doi.org/10.1146/annurev.phyto.43.113004.133839
16. Loey, M., ElSawy, A., Afify, M.: Deep learning in plant diseases detection for agricultural
crops: a survey. Int. J. Serv. Sci. Manag. Eng., Technol (2020)
17. Rong, D., Xie, L., Ying, Y.: Computer vision detection of foreign objects in walnuts using deep
learning. Comput. Electron. Agric. 162, 1001–1010 (2019). https://doi.org/10.1016/j.compag.
2019.05.019
18. Brunetti, A., Buongiorno, D., Trotta, G.F., Bevilacqua, V.: Computer vision and deep learning
techniques for pedestrian detection and tracking: a survey. Neurocomputing 300, 17–33 (2018).
https://doi.org/10.1016/j.neucom.2018.01.092
19. Maitre, J., Bouchard, K., Bédard, L.P.: Mineral grains recognition using computer vision and
machine learning. Comput. Geosci. (2019). https://doi.org/10.1016/j.cageo.2019.05.009
20. Gogul, I., Kumar, V.S.: Flower species recognition system using convolution neural networks
and transfer learning. In: 2017 Fourth International Conference on Signal Processing,
Communication and Networking (ICSCN), pp. 1–6 (2017)
21. Hedjazi, M.A., Kourbane, I., Genc, Y.: On identifying leaves: A comparison of CNN with
classical ML methods. In: 2017 25th Signal Processing and Communications Applications
Conference (SIU), pp. 1–4 (2017)
22. Dias, R.O.Q., Borges, D.L.: Recognizing plant species in the wild: deep learning results and
a new database. In: 2016 IEEE International Symposium on Multimedia (ISM), pp. 197–202
(2016)
23. Abdullahi, H.S., Sheriff, R.E., Mahieddine, F.: Convolution neural network in precision agricul-
ture for plant image recognition and classification. In: 2017 Seventh International Conference
on Innovative Computing Technology (INTECH), pp. 1–3 (2017)
24. Gao, M., Lin, L., Sinnott, R.O.: A mobile application for plant recognition through deep
learning. In: 2017 IEEE 13th International Conference on e-Science (e-Science), pp. 29–38
(2017)
25. Ciregan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classifi-
cation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3642–3649
(2012)
26. Sutskever, I., Hinton, G.E., Krizhevsky, A.: A in neural information processing systems.
Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process
Syst. 1097–1105 (2012)
27. Cireşan, D.C., Giusti, A., Gambardella, L.M., Schmidhuber, J.: Mitosis detection in breast
cancer histology images with deep neural networks. In: Mori, K., Sakuma, I., Sato, Y.,
et al. (eds.) Medical Image Computing and Computer-Assisted Intervention–MICCAI 2013,
pp. 411–418. Springer, Berlin Heidelberg (2013)
28. El-Sawy, A., EL-Bakry, H., Loey, M.: CNN for handwritten arabic digits recognition based
on LeNet-5 BT. In: Hassanien, A.E., Shaalan, K., Gaber, T., et al. (eds.) Proceedings of the
International Conference on Advanced Intelligent Systems and Informatics 2016, pp. 566–575.
Springer International Publishing, Cham (2017)
29. Liu, S., Deng, W.: Very deep convolutional neural network based image classification using
small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition
(ACPR), pp. 730–734 (2015)
30. Szegedy, C., Wei, L., Yangqing, J., et al.: Going deeper with convolutions. In: 2015 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
31. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
32. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807 (2017)
Big Data and Deep Learning in Plant Leaf … 199
33. Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the inception architecture for computer
vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 2818–2826 (2016)
34. Huang, G., Liu, Z., Maaten, L.V.D., Weinberger, K.Q.: Densely connected convolutional
networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
pp. 2261–2269 (2017)
35. Sankaran, S., Mishra, A., Ehsani, R., Davis, C.: A review of advanced techniques for detecting
plant diseases. Comput. Electron. Agric. 72, 1–13 (2010). https://doi.org/10.1016/j.compag.
2010.02.007
36. Saleem, H.M., Potgieter, J., Arif, M.K.: Plant disease detection and classification by deep
learning. Plants 8 (2019)
37. Jain, A., Sarsaiya, S., Wu, Q., et al.: A review of plant leaf fungal diseases and its environ-
ment speciation. Bioengineered 10, 409–424 (2019). https://doi.org/10.1080/21655979.2019.
1649520
38. Fuentes, A., Yoon, S., Kim, S.C., Park, D.S.: A robust deep-learning-based detector for real-
time tomato plant diseases and pests recognition. Sensors (Switzerland) 17 (2017). https://doi.
org/10.3390/s17092022
39. Sladojevic, S., Arsenovic, M., Anderla, A., et al.: Deep Neural Networks Based Recognition
of Plant Diseases by Leaf Image Classification. Comput. Intell. Neurosci. (2016). https://doi.
org/10.1155/2016/3289801
40. Kawasaki, Y., Uga, H., Kagiwada, S., Iyatomi, H.: Basic study of automated diagnosis of viral
plant diseases using convolutional neural networks. In: International Symposium on Visual
Computing, pp. 638–645. Springer (2015)
41. Fujita, E., Kawasaki, Y., Uga, H., et al.: Basic investigation on a robust and practical plant
diagnostic system. In: Proceedings of 2016 15th IEEE International Conference on Machine
Learning and Applications ICMLA 2016, pp. 989–992 (2017). https://doi.org/10.1109/ICMLA.
2016.56
42. Hughes, D.P., Salathe, M.: An open access repository of images on plant health to enable the
development of mobile disease diagnostics (2015). https://doi.org/10.1111/1755-0998.12237
43. Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant disease
detection. Front. Plant Sci. 7 (2016). https://doi.org/10.3389/fpls.2016.01419
44. Lu, J., Hu, J., Zhao, G., et al.: An in-field automatic wheat disease diagnosis system. Comput.
Electron. Agric. 142, 369–379 (2017). https://doi.org/10.1016/j.compag.2017.09.012
45. Amara, J., Bouaziz, B., Algergawy, A.: A deep learning-based approach for banana leaf diseases
classification. BTW, pp. 79–88 (2017)
46. Wang, G., Sun, Y., Wang, J.: Automatic image-based plant disease severity estimation using
deep learning. Comput. Intell. Neurosci. (2017). https://doi.org/10.1155/2017/2917536
47. Lu, Y., Yi, S., Zeng, N., et al.: Identification of rice diseases using deep convolutional neural
networks. Neurocomputing 267, 378–384 (2017)
48. Liu, B., Zhang, Y., He, D., Li, Y.: Identification of apple leaf diseases based on deep
convolutional neural networks. Symmetry (Basel) 10, 11 (2017)
49. Durmus, H., Gunes, E.O., Kirci, M.: Disease detection on the leaves of the tomato plants by
using deep learning. In: 2017 6th International Conference on Agro-Geoinformatics, Agro-
Geoinformatics (2017). https://doi.org/10.1109/Agro-Geoinformatics.2017.8047016
50. Brahimi, M., Boukhalfa, K., Moussaoui, A.: Deep learning for tomato diseases: classification
and symptoms visualization. Appl. Artif. Intell. 31, 299–315 (2017)
51. DeChant, C., Wiesner-Hanks, T., Chen, S., et al.: Automated identification of northern leaf
blight-infected maize plants from field imagery using deep learning. Phytopathology 107,
1426–1432 (2017). https://doi.org/10.1094/PHYTO-11-16-0417-R
52. Ramcharan, A., Baranowski, K., McCloskey, P., et al.: Using transfer learning for image-based
cassava disease detection 8, 1–7 (2017). https://doi.org/10.3389/fpls.2017.01852
53. Cruz, A.C., Luvisi, A., De Bellis, L., Ampatzidis, Y.: Vision-based plant disease detection
system using transfer and deep learning, pp. 1–9 (2017). https://doi.org/10.13031/aim.201
700241
200 M. Loey
54. Zhang, X., Qiao, Y., Meng, F., et al.: Identification of maize leaf diseases using improved deep
convolutional neural networks. IEEE Access 6, 30370–30377 (2018). https://doi.org/10.1109/
ACCESS.2018.2844405
55. Too, E.C., Yujian, L., Njuki, S., Yingchun, L.: A comparative study of fine-tuning deep learning
models for plant disease identification. Comput. Electron. Agric. 0–1 (2018). https://doi.org/
10.1016/j.compag.2018.03.032
56. Rangarajan, A.K., Purushothaman, R., Ramesh, A.: Tomato crop disease classification using
pre-trained deep learning algorithm. Proc. Comput. Sci. 133, 1040–1047 (2018). https://doi.
org/10.1016/j.procs.2018.07.070
57. Gandhi, R., Nimbalkar, S., Yelamanchili, N., Ponkshe, S.: Plant disease detection using CNNs
and GANs as an augmentative approach. In: 2018 IEEE International Conference on Recent
Research Development ICIRD 2018, pp. 1–5. (2018). https://doi.org/10.1109/ICIRD.2018.837
6321
58. Sardogan, M., Tuncer, A., Ozen, Y.: Plant leaf disease detection and classification based on
CNN with LVQ algorithm. In: 2018 3rd International Conference on Computer Science and
Engineering (UBMK), pp 382–385 (2018)
59. Ma, J., Du, K., Zheng, F., et al.: Original papers a recognition method for cucumber diseases
using leaf symptom images based on deep convolutional neural network. Comput. Electron.
Agric. 154, 18–24 (2018). https://doi.org/10.1016/j.compag.2018.08.048
60. Picon, A., Alvarez-Gila, A., Seitz, M., et al.: Deep convolutional neural networks for mobile
capture device-based crop disease classification in the wild. Comput. Electron. Agric. 0–1
(2018). https://doi.org/10.1016/j.compag.2018.04.002
61. Johannes, A., Picon, A., Alvarez-Gila, A., et al.: Automatic plant disease diagnosis using mobile
capture devices, applied on a wheat use case. Comput. Electron. Agric. 138, 200–209 (2017).
https://doi.org/10.1016/j.compag.2017.04.013
62. Hari, S.S., Sivakumar, M., Renuga, P., et al.: Detection of plant disease by leaf image using
convolutional neural network. In: 2019 International Conference on Vision Towards Emerging
Trends in Communication and Networking (ViTECoN), pp. 1–5 (2019)
63. Francis, M., Deisy, C.: Disease detection and classification in agricultural plants using convo-
lutional neural networks—a visual understanding. In: 2019 6th International Conference on
Signal Processing and Integrated Networks (SPIN), pp. 1063–1068 (2019)
64. Adedoja, A., Owolawi, P.A., Mapayi, T.: Deep learning based on NASNet for plant disease
recognition using leave images. In: 2019 International Conference on Advances in Big Data,
Computing and Data Communication Systems (icABCD). pp. 1–5 (2019)
65. Nagasubramanian, K., Jones, S., Singh, A.K., et al.: Plant disease identification using explain-
able 3D deep learning on hyperspectral images. Plant Methods 15, 98 (2019). https://doi.org/
10.1186/s13007-019-0479-8
66. Paoletti, M.E., Haut, J.M., Plaza, J., Plaza, A.: Deep learning classifiers for hyperspectral
imaging: a review. ISPRS J. Photogramm. Remote Sens. 158, 279–317 (2019). https://doi.org/
10.1016/j.isprsjprs.2019.09.006
67. Venkatesan, R., Prabu, S.: Hyperspectral image features classification using deep learning
recurrent neural networks. J. Med. Syst. 43, 216 (2019). https://doi.org/10.1007/s10916-019-
1347-9
68. Moghadam, P., Ward, D., Goan, E., et al.: Plant disease detection using hyperspectral imaging.
In: 2017 International Conference on Digital Image Computing: Techniques and Applications
(DICTA), pp. 1–8 (2017)
Machine Learning Cancer Diagnosis
Based on Medical Image Size and
Modalities
1 Introduction
Medical imaging is a valuable tool for diagnosing the existence of various diseases
and the study of the experimental effects [1]. Large-scale medical images may support
experts in medical fields with more details to increase diagnosis precision in patho-
logical research [2–4]. Thus, the enhancement of the medical image is becoming very
important. In addition, large-scale medical images can significantly help computer-
aided automatic detection [5]. For instance, plurality of Computer Tomography (CT)
[6] scanner sand Magnetic Resonance Imaging (MRI) [7] create medical images as
practical as non-invasive examinations.
Bio-medical imaging is one of the foundations of intensive cancer treatment, dia-
betes, and bones, etc. It has many benefits including accessibility without destruction
of tissue, Monitoring in real time, and minimally no invasiveness. In addition, it can
function through a wide variety of size and time scales involved in pathological and
biological processes. Time-scales are different from disease to another e.g., chemical
reactions and protein binding need milliseconds while cancer needs years [8].
Early diagnosis was the most important factor in reducing the costs of any dis-
eases such as cancer management and mortality. Biomedical imaging [9] presents
an increasingly meaningful role in the cancer stages treatment [10]. These include
screening [11], prediction [12], biopsy guidance [13], staging [14], therapy planning
[15] and other diagnoses.
© The Editor(s) (if applicable) and The Author(s), under exclusive license 201
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_9
202 W. Al-Dhabyani and A. Fahmy
In this chapter, the type and size of medical images are focused on. The chapter
focuses on cancerous images when used with machine learning. And it is divided
as follows: Sect. 2 introduces machine learning and transfer learning. Subsequently,
Sect. 3 illustrates imaging and radiology. Section 4 explains medical images modal-
ities, size, format and etc. Section 5 discusses a technique for manipulating medical
images. And finally, Sect. 6 presents a conclusion and discussion.
2 Machine Learning
When medical images are available on the computer, a lot of applications have
been built for automated analyzes. At First, from the seventieths to the nineties,
medical image analysis was performed with the processing of the low-level pixel
sequential application and mathematical modeling to establish systems of rule-based
that answered to special tasks. In the same period, there is a relationship with skillful
systems with many statements that were common in artificial intelligence. GOF-
AI (good old-fashioned artificial intelligence) [16] is defined as an expert system
that was usually vulnerable; Comparable to the processing systems of a rule-based
image. In the late ninetieths, techniques under supervision, in which training dataset
were applied to improve a system, became frequently common in the analysis of
medical images. There are some examples such as the models of active shape (i.e.,
segmentation), atlas methods (i.e., the new data is matched with training data is
called atlases) and the use of statistical classifiers and the idea of feature extraction
(i.e., computer-aided diagnosis and detection). This machine learning (or pattern
recognition) approach is yet popular and makes a lot of analysis system of medical
images available that is the basis of several successful commercially. That is why
we have seen a change from fully human-designed systems to computer-trained
systems using sample data which is extracted from feature vectors. The algorithms
of computer restrict the best decision boundary in the space of the characteristics
of high-dimensional. Extracting discriminating image characteristics is a significant
part of developing such systems. Human researchers are still carrying out this process
and as such, we are referring to systems with handcrafted functionality.
The next reasonable step is to allow computers to discover the characteristics that
optimally express the data for the specific problem. This definition is based on many
deep learning algorithms: models (networks) consisting of several layers that change
input data (i.e., photos) into outputs (i.e., positive/negative disease), while the learn-
ing is increased when moving to the level of higher features. Convolutional Neural
Networks (CNNs) are the common powerful type of models for image analysis to
date. CNNs include several layers that change their input with small-extent convo-
lution filters. Work on CNN ran from the late seventieths [17] and already applied
to the analysis of the medical images in 1995 by SCB Lo et al [18]. LeNet [19]
was their first successful application in the real world for recognition of digit using
hand-written. Notwithstanding these early achievements, the use of CNN networks
did not gain traction until new technologies were created to efficiently train deep
Machine Learning Cancer Diagnosis Based on Medical Image … 203
Transfer learning (TL) [26, 27] is a process that provides a system for applying the
information acquired from prior tasks to a new task domain somehow relevant to the
previous domain. This theory is inspired by the concept that human can intuitively
use their past knowledge to determine and resolve new problems. CNN pretrained
model in ImageNet is used. ImageNet [28], actually the largest dataset for visual
recognition and image classification, is a dataset of images with more than fourteen
million images that are classified to a thousand categories of objects, constructed
according to the hierarchy of WordNet.
TL is a similar trend in which a CNN may be trained on data from a particular
domain and later reuse it to extract image features in another domain or as a first
204 W. Al-Dhabyani and A. Fahmy
network to refine new data. Transfer learning shows that a suitable process for one
task is applied to another task. To solve similar problems, it is the method of copy-
ing information from an already-trained network into a new network. Recently, TL
approach is widely used in the researches of biomedical. It achieves great results.
In modern hospitals, a various amount of medical imaging data is obtained for diag-
nosis, treatment planning, and estimation of patient response. Such vast collections
of images combined with other sources of images provide new possibility of using
enormous image data to extract computerized resources for image-based diagnosis,
teaching and biomedical research [29]. These applications are focused on identify-
ing, retrieving and classifying patient data which reflect similar clinical outcomes
[30], for example, images representing the same diagnosis.
tributed in curing cancers, fibroids in the uterus, back pain, blockages in the veins and
arteries, kidney problems and liver problems. Doctors make no incision or sometimes
only a very small one. And people are not required to remain in the hospital after
the procedure. But instead, moderate sedation is needed by most people (medicines
for relaxing). Interventional radiology operation examples are angiography, cancer
treatments, cryoablation, tumor ablation, placement of the catheter in the venous, ver-
tebroplasty, breast biopsy, needle biopsies, uterine artery embolization, kyphoplasty,
and microwave ablation.
4 Medical Images
A medical image is the depiction of an anatomical region’s inner structure and its
function in the form of a matrix of image components that are identified as voxels or
pixels. It is a separated description arising from a method of reconstruction/sampling,
designates numeral values to space locations. The pixels’ number used to explain a
given acquisition modality’s field of view is a measure of the precision in which the
feature or anatomy can be displayed. The pixel expresses numerical value depending
on the protocol of retrieval, imaging modality, the reconstruction, and ultimately, the
subsequent processing. This section explains the modalities of medical imaging and
histopathology. In addition, size and type of medical images are discussed afterward.
Medical imaging introduces processes that present the human body visual infor-
mation [32]. The goal of medical imaging is to help clinicians and radiologists to
obtain the treatment and diagnostic process effectively. Medical imaging is a major
portion of the diagnosis and treatment of illnesses. It describes a varity of imaging
modalities, for example, CT, US, MRI, X-rays, and the hybrid modalities [33]. Exam-
ples of medical image modalities are shown in Fig. 2. They play an essential role in
detecting anatomical and functional information for examination and treatment of
the various organs of the body [34]. Figure 1 shows a typology of medical imaging
modalities. Medical imaging is a fundamental tool in modern systems of medical care
and computer-aided diagnosis (CAD). ML performs an essential role in CAD with
its applications in detection and classification of tissues or cancer, medical image
retrieval, medical image analysis, image-guided therapy, and the annotation of medi-
cal image [35–39]. The properties of medical imaging modalities and histopathology
are explained in Sect. 4.4.
206 W. Al-Dhabyani and A. Fahmy
There are several types of medical Image modalities. Each medical image has its own
organ and properties [40]. Modalities operate an essential function in the discovery of
functional and anatomical knowledge about several organs of the body for research
and diagnosis [34]. Some examples of medical images are displayed in Fig. 2. Medical
imaging is an important help in machine learning algorithms and modern health-care.
Fig. 2 Example of medical image modalities types and their labels under each image
Machine Learning Cancer Diagnosis Based on Medical Image … 207
Digital image size of the radiograph relies on the dimension of the Detector
Element (from 0.2 mm to 0.1 mm (measured in millimeters)), Field of View of 18 ×
24 cm to 35 × 43 cm, and the depth of bit of 10 to 14 bits for each pixel. This
outcomes in standard image sizes of (8 to 32) MB for each single projection image.
For CAD algorithms that are always utilizied with images of digital radiology, it
is necessary to “process” the image data (raw). Communication protocols used in
medicine such as Digital Imaging and Communication in Medicine (DICOM) [41]
third-party interface box [41] are required for the linking of legacy equipment to
capture and encapsulate images for archiving.
There are two sources of power which are radiation and non-radiation. Radiation
is used with X-rays, Positron Emission Tomographies (PETs), and CTs. PETs use
radiotracers to read through the body. X-rays use ionizing radiation. CTs always use
ionizing radiation. The other one is the non-radiation which is used with MRIs and
Ultrasounds. USs use sound waves. MRIs use magnetic waves. Of course, each scan
can achieve optimal scan result. Some common types of medical image modalities
and their properties are as follows:
• X-ray. It is the most popular scan. The most common subtypes of its radiology
are radiography, Mammography, Upper GI, Fluoroscopy, Dexa Scan, Discography,
Discography, Arthrography, and Contrast radiography. The average size is 5.1 MB
[42]. The screens size monitor of radiography is generally suitable for 35 cm ×
43 cm radiographs. The resolution displayed in picture mode, for example, is
2, 048 × 2, 560 pixels (i.e., 5 MP (megapixels)) or even 3-megapixel displays are
commonly in use. An x-ray takes images that are useful to look at foreign objects
in tissues and bones. The Conditions diagnosed with X-rays are general infection
or injuries, fractured bones, breast tumor, osteoporosis, gastrointestinal problems,
bone cancer, and arthritis,. The most famous x-ray types are:
– Mammography Scan. Digital mammography (DM) is an imaging technique
for x-ray projection and is primarily used for breast imaging. The average size
is 83.5 MB [42]. The size of DM image varies in the detector element from
0.01 to 0.05 mm and data per pixel from 12 to 16 bits to make images of 8–50
Megabyte (MB) for a FoV of (18 × 24 cm) and (24 × 30 cm).
– Fluoroscope Scan. It is an image of an x-ray projection in real-time obtaining
arrangement utilized for dynamic assessment of a lot of patient procedures in
interventional and diagnostic radiology. Video sequences of x-ray images of
(1 to 60) frames per second are obtained to view and evaluate the anatomy
of the patient. Standard sizes of the images vary from 512 × 512 × 8 bit to
1024 × 1024 × 12 bit. 2048 × 2048 × 12 Bit arrays are used for spot image
applications.
• Computed Tomography (CT). The most common form of organs that this kind
of radiology is used for are brain, breast, chest, cervix, lungs, kidney, pancreas,
abdomen, appendix, bladder, and esophagus. As a general rule, a CT image should
have about as many pixels in each dimension as in detector channels that provide
data for a view. The average size is 153.4 MB [42]. For example, an array of 1024-
208 W. Al-Dhabyani and A. Fahmy
Image obsession includes pathology (complex workflow and the largest sizes of the
image), cardiology, dermatology, ophthalmology, and many other images. For dental
imaging, Information Object Definition (IOD) of intraoral X-ray images reflects the
Machine Learning Cancer Diagnosis Based on Medical Image … 209
Pathology reviews of biopsied tissues are often considered the gold standard. How-
ever, pathology slides reviewed is difficult even for professional pathologists [45],
as shown in Fig. 3. A slide of digital pathology at 40X magnification always has mil-
lions or billions of pixels and can use up several gigabytes of disk space. Pathologists
are often forced to look for micrometastases in these large archives, small groups of
tumor cells that contain an early sign of cancer. The size of these groups of tumor cells
are less than a thousand pixels in diameter Fig. 3. This makes reviewing the slides
of pathology without losing one of these tiny ones but clinically actionable evidence
very hard and time-consuming. In histopathology, Whole-slide images (WSI) are
very big and usually every WSI has a full 80, 000 × 80, 000 pixels of locative reso-
lution and about twenty gigabytes in the size of storage at 40× magnification [46].
Due to the high spatial resolution used in digital pathology, the dimensions of
WSI is very large, which generally exceed nine hundred million pixels. In addition,
there are several types of histopathology images (biopsy types) that are shown in
Fig. 4. Properties of histopathology images are explained in Sect. 4.4.
These days, the resolution and size of medical images become larger and sharper.
However, they use a large format in storage and need more computer resources when
they are used in machine learning. The resolution and size of the medical images are
explained in this subsection.
This is necessary because the applications of AI in radiology may be sensitive
to the changes in the equipment (e.g., many manufacturers and several scanners),
the study population features (e.g., distribution and prevalence of disease rainbow in
the chosen population), selection of reference standard (e.g., pathology, radiologist
interpretation, clinical results) and image protocols with various image attributes
(e.g., spatial resolution, “signal-to-noise” ratio, contrast, and temporal resolution)
[47]. Thus, there will be a need for consistent, perfect communication, and trans-
parency of patient cohort requirements to ensure that training data is generalized to
target hospital sites and that AI algorithms are guaranteed to be implemented safely
and responsibly.
Medical images and histopathology sizes differ from each other. Of course,
histopathology has the largest image size. They are different in image size, image
resolution and image type. Properties of radiology and histopathology images are as
follows:
Medical image size is illustrated in Table 1. We can see big differences in image size
among each modality. Medical image modalities have average sizes that range from
5.1 to 365.9 MB.
Machine Learning Cancer Diagnosis Based on Medical Image … 211
4.4.2 Histopathology
The size of histopathology is illustrated in Table 2. There are big differences in medi-
cal images size among different type of histopathology magnifications. Dealing with
these kinds of images sizes are more difficult. In general, the image of histopathology
must be subdivided into tiles (small images) to interact with machine learning. It is
a powerful method to defeat the curse of high dimension in this kind of images.
An important challenge with digital pathology is the image study the average size
of files. The file size will vary from fifty MB on average for a WSI to more than six
GB, and this relies on the slide scan magnification. Furthermore, a pathology imaging
study has a number of slides that can range from two to sixty slides. Accordingly,
Unlike other specialties such as radiology, where the average size of an image is
usually tens or hundreds of MB, WSI studies can have a magnitude of thousands of
MB [48].
The WSIs are saved in a compressed format for efficient storage. The compression
technique usually used is the encoding of JPEG2000 or JPEG, which has a factor
of high-quality to maintain the image information. A common WSI captured with
an objective of ×20 could theoretically reflect more than 20 Gb of storage if it is
not compressed, but the size decreases to an average range of 200 to 650 Mb after
compression [48].
212 W. Al-Dhabyani and A. Fahmy
Whole Slide Images: Modern scanning systems that produce WSI are formed of
hardware platforms of software-driven robotic that operate on principles related to
the compound of common microscopes. A glass slide of pathology is placed into
the scanner and constantly ran under an objective microscope lens at a constant high
speed. Because the histopathological sample has a tissue section with a recognized
density, algorithm-defined real-time focus settings are used to keep centered on the
resulting images. The objective lens defines the image output resolution and the total
recording speed with a greater resolution, resulting in smaller fields of view and
longer scanning times [48]. The resulting image is usually displayed in 24 bit/pixel
which is full color in both dimensions, with a size of several thousand pixels. In
general, the image is saved in a structure due to the size of the data that resembles a
pyramid arranged in layers, each with a different resolution, to allow efficient display
through specialized software given by a provider. A typical case of pathology can
hold several WSI images, each of which can be in storage in hundreds or thousands
of MB [48].
The image file formats present a regulated method to save information specifying
an image in a file in computer. A dataset of medical image usually depends on an
image or more images which describe the projection of an anatomic volume on an
imaging surface (planar/projection images), a series of images which describe fine
sections by volume (multislice 2D images or tomographic), data from a volume (3D
images), or based on volume overtime to perform a dynamic list of acquisitions or
duplicated acquisition of the related tomographic image (4D images). The format of
Machine Learning Cancer Diagnosis Based on Medical Image … 213
the file explains how image data is arranged in the image file. Furthermore, the way
that the software must interpret the pixel data for accurate loading and display. On
the other hand, the formats of the image files are regulated means for arranging and
saving digital images. The format of the image file can save data in compressed or
uncompressed. They are as follows:
The file formats of a medical image can be distributed into two divisions [49]. The
first is the format designed to regulate the images which use diagnostic modalities to
produced, e.g., Digital Imaging and Communications in Medicine (DICOM) [41].
The second is the format created for the purpose of facilitating and strengthening
analysis after post-processing, e.g., Nifti [50], Analyze [51] and Minc [52]. Medical
image files are usually saved with one of two possible arrangements that follow. The
first one is that both image data and metadata are included in an individual file, with
the metadata saved at file initialization. This paradigm is utilized by the Minc, Nifti,
and Dicom file formats, even though other formats allow it. The second arrangements
saves the image data in a file and the metadata in another. A two-file model is used
by analyze file format which are (.img and.hdr).
Most digital image formats are classified by their compression method into two
divisions. If no information is lost from the digital image file, principally when a
compressed algorithm is used, it is called lossless [53]. It contains some type of
images such as TIFF, PNG, BMP, and RAW. All RGB digital images are saved in
those image formats without information loss. This arises with the advantages of
high-quality reproductions. Though, it needs a large memory size to store those files
in it. The second has losses, resulting in the loss of certain image information to
obtain a smaller file size including GIF and JPEG. The size and resolution of all
images vary with regard to dots per inch (dpi) and bit depth.
number applied to save the sample that synchronizes with the pixel-depth. The color
information in color image is stored in the sample per pixel which is three. The images
are displayed with an RGB. The number of color density is clearly dependent on the
bits number applied to save the samples which corresponds to the depth of the pixel
in this case.
For example, color is used to represent the velocity and direction of blood flow
in ultrasound (Doppler US), to display additional details in an anatomical image in
shades of gray as colored overlays, as at the activation sites of fMRI, to concurrently
display anatomical and functional images such as in PET/MRI or PET/CT, and
occasionally instead of shades of gray to emphasize signal variations.
The use of 2D, 3D, and 4D in machine learning is as follows. Machine learning
uses a gray-scale image with an array of 3 matrices (height, width, and depth) where
the depth is equal to number one. The 2D color image uses an array of 3 matrices
(height, width, and depth). The Gray-scale volumetric image uses an array of 4
matrices (height, width, depth, and slices of images) where the depth is equal to
number one. The 3D color image uses an array of 4 matrices (height, width, depth,
and slices of images). More information about grayscale and color image are as
follows:
Grayscale images (or black & white) images are images that have one color which
is gray. It arranges from 0 to 255. It is placed with a number of one in the third
dimension in the image matrix. Clinical-radiological images, for example, MRI, CT,
and X-ray have a grayscale photometric interpretation.
Since color is a primary distinguishing feature for the various biological structures,
every pixel contains three color components. Color image (also called RGB) is a
colorful image. The image is containing three images above each other. Color map
are typically used to display some medical images e.g., PET, Nuclear medicine
images, and etc. Within a predefined color map, every pixel of the image is kinked
to a color, though the color refers only to the screen and is associated knowledge,
and is not actually saved in the pixel values. The images yet contain a sample/pixel
and are usually labeled in pseudo-color. They typically need multiple samples per
pixel to translate the color data into pixels. In addition, they choose a model of color
which shows how colors can be obtained by merging the samples [54]. Eight bits are
usually maintained for every part of the color component or sample. The depth of
pixels is estimated by multiplying the number of samples per pixel by the depth of
the sample . Most medical images are usually saved using the model of RGB color.
The pixel should be understood in this case as a mixture of the three prime colors
and three samples are stored for each pixel.
Machine Learning Cancer Diagnosis Based on Medical Image … 215
Fig. 5 The relation between the size of the matrix and the size of one pixel. If the field of view is
320 mm, the theoretical size of one pixel in an image matrix is: 0.156 mm for 2048 × 2048; 0.313
mm for 1024 × 1024; and 0.625 mm for 512 × 512 [55]
To let the image interacts with machine learning, it should be manipulated. There
are a lot of methods to change the properties of the images such as re-sizing, aug-
mentation, image compression and other techniques. We will focus on re-sizing and
compression.
There are several methods and algorithms for re-sizing and compressing images.
They are as follows:
There are some common types of functions and library that are used to resize medical
images e.g., python library and OpenCV library. This kind of re-sizing change the
resolution of the image. In addition, deep learning algorithms will produce not accu-
rate results because medical image loses some of its information. The relationship
between size and pixel are explained in Fig. 5.
With the revolution of Generative Adverisal Network (GAN) [56], we can synthesize
and produce great images. Super-resolution GAN (SRGAN) [57] uses a combination
of a deep network with an adversary network to create higher resolution images.
216 W. Al-Dhabyani and A. Fahmy
Table 3 The Royal College of Radiologists’ guidelines on the Lossy compression ratio [74]
Modality Compression Ratio
Radiotherapy CT (No compression)
CT (all fields) 5:1
Digital Angiography 10:1
Ultrasound (US) 10:1
Magnetic Resonance Images (MRI) 5:1
Skeletal Radiography 10:1
Chest Radiography 10:1
Mammography (MG) 20:1
SRGAN is attractive to a human with more details compared with the related design
without GAN e.g., SRResNet [57]. A HR image is downsampled to a LR image. The
generator of SRGAN upsamples LR images to SR images. They use a discriminator
to discriminate the images of HR and backpropagate the loss function of GAN to train
the discriminator and the generator. Another algorithm is used in enhancing images
such as Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN)
[58]. Dealing with super resolution images can be found in [59].
Modern medical imaging has created vast quantities of data so that the storage and
transmission systems can be overfilled rapidly [60, 61]. Data compression has gen-
erally been key to overcome this dilemma [62, 63]. Various compression techniques
of medical image, in this context, have been improved to decrease image volume,
such as genetic algorithms [64], Region of Interest Coding Techniques [65, 66], low-
complexity compression [67], Fractal coding methods [68], and Lossless dynamic
and adaptive compression [69]. However, JPEG2000 [70, 71], JPEG [72], JPEG-
LS(JLS) [73] and TIFF are still one of the widespread methods used as standards
because they provide greater performance when compressing images.
We perceive that these methods are based on algorithms that are coded to multi-
resolution domains using modified images [75, 76]. In addition, several medical
applications allow the employ of loss compression unless the compressed image
leads to the wrong diagnosis [77].
There are several advantages of image compression that can be summarized as
follows:
• It offers a significant saving of bandwidth with regard to sending a smaller amount
of data via the internet.
• Significant saving on storage space.
• Offers a higher level of immunity and security against illegal monitoring.
Machine Learning Cancer Diagnosis Based on Medical Image … 217
To save a gray image with a size of 1024 × 1024, approximately 1Mb of disk space
and for color images, this will be multiplied by 3. The process of size reduction is the
exclusion of unnecessary data. If the image is decreased with a compression ratio of
10:1, the storage needed is reduced to 300 KB [78]. Compression ratios for medical
images are shown in Table 3. Thus, the possibilities to reorganize compressed data
(e.g., interactive multi-resolution transmission with JPIP [79], tiled DICOM [80], or
tiling in BigTIFF [81]) are predominant to allow a fine viewing experience. More
information about medical image compression can be found in [82].
When using deep learning to classify images e.g., houses or dogs, an image of a
small size like 128 × 128 is frequently applied as an input. The large size image
also need to be re-sized to a smaller size that is suitable for adequate distinguishing,
because an expansion in the magnitude of the input image results in an expansion in
the parameter to be measured, the computing power needed and memory.
In opposition, WSI [83] has several cells, and the image composed of billions or
more of pixels, which typically makes it difficult to analyze. Nevertheless , re-sizing
the whole image to a smaller size e.g., 128 × 128 will drive to information loss at
the cellular level, ending in a marked reduction in the accuracy of the identifier.
Accordingly, the whole WSI is regularly separated into fractional regions of approx-
imately 128 × 128 pixels or (“patches”), and each patch is independently evaluated,
for example, RoIs detection. Because of advances in memory and computing power,
patch size is expanding (e.g., 900 × 900), which is expected to lead to improved
accuracy [84]. The process of incorporating the outcome from every patch is an
opportunity to get better yet. Because the WSI, for example, can contain a huge
number of patches (hundreds of thousands), it is very likely that false positives will
occur, even though the individual patches are minutely classed. The regional average
of every decision is a potential answer to this, such that the regions are marked as
ROI even if the ROI is spread over several patches.
Though, this methodology can suffer from false-negatives, leading to a lack of
small ROIs, for example, isolated tumor cells [85]. Scoring in a few applications
such as Immunohisto Chemistry (IHC), staging the metastasis of lymph nodes in
specimens or patients, and staging the diagnosis of prostate cancer with Glisson
multi-region score inside one slide, more complicated algorithms are incorporate to
integrate patch or object-level choices [85–89].
However, pathological images can precisely be labeled only by pathologists, and it
takes a lot of work to label it at the territorial level in an immense WSI. reusing ready-
to-analyze data is conceivable in machine learning as training data, for example,
natural images [90] in the macroscopic diagnosis of skin and ImageNet [21] in
International Skin-Imaging Collaboration (ISIC).
218 W. Al-Dhabyani and A. Fahmy
There are numbers of common algorithms used in machine learning with medical
imaging. Must of them are discussed by [91], they explore some algorithms such as
Naive Bayes Algorithm, Support Vector Machines, Decision trees, Neural Networks,
and k-Nearest Neighbors. Some other algorithms are used in deep learning such as
CNN [20], Deep Convolutional Neural Network (DCNN) [20], U-net [92], Fully
Convolutional Networks (FCN) [93], LeNet [94], etc. However, deep learning gives
a great result when acting with medical images and image segmentation. [95–97],
etc., are examples of deep learning algorithms that are used with medical images.
Many open source medical images datasets are available nowadays for researchers
such as [98–100], etc.
Medical images play a glory rule when determine to investigate human body. Machine
learning algorithms produce great results with medical images. Medical image prop-
erties such as resolution, size, and type, effect machine learning result. In addition,
the best image type is PNG for all scan type because it contains all the details about
the image. besides, it is time-consuming and takes larger data than other compressed
formats. Furthermore, Medical imaging modalities contain many scanning methods
for diagnostic and care purposes for visualizing the body of human. Such modalities
are also very beneficial for patient follow-up on the severity of the already diagnosed
disease state and/or a treatment plan is underway. Such modalities of medical imag-
ing involve themselves in all aspects of hospital care. The key aim is to get the right
diagnosis.
Acknowledgements The authors would like to express their gratitude to Mohammed Almurisi for
his support.
References
1. Fowler, J.F., Hall, E.J.: Radiobiology for the Radiologist. Radiat. Res. 116, 175 (1988)
2. Mifflin, J.: Visual archives in perspective: enlarging on historical medical photographs. Am.
Arch. 70(1), 32–69 (2007)
3. Cosman, P.C., Gray, R.M., Olshen, R.A.: Evaluating quality of compressed medical images:
SNR, subjective rating, and diagnostic accuracy. Proc. IEEE 82(6), 919–32 (1994)
4. Kayser, K., Görtler, J., Goldmann, T., Vollmer, E., Hufnagl, P., Kayser, G.: Image standards
in tissue-based diagnosis (diagnostic surgical pathology). Diagn. Pathol. 3(1), 17 (2008)
5. Ramakrishna, B., Liu, W., Saiprasad, G., Safdar, N., Chang, C.I., Siddiqui, K., Kim, W.,
Siegel, E., Chai, J.W., Chen, C.C., Lee, S.K.: An automatic computer-aided detection system
for meniscal tears on magnetic resonance images. IEEE Trans. Med. Imaging 28(8), 1308–
1316 (2009)
Machine Learning Cancer Diagnosis Based on Medical Image … 219
6. Brenner, D.J., Hall, E.J.: Computed tomography-an increasing source of radiation exposure.
New Engl. J. Med. 357(22), 2277–2284 (2007)
7. Foltz, W.D., Jaffray, D.A.: Principles of magnetic resonance imaging. Radiat. Res. 177(4),
331–348 (2012)
8. Fass, L.: Imaging and cancer: a review. Molecular oncology. 2(2), 115–52 (2008)
9. Ehman, R.L., Hendee, W.R., Welch, M.J., Dunnick, N.R., Bresolin, L.B., Arenson, R.L.,
Baum, S., Hricak, H., Thrall, J.H.: Blueprint for imaging in biomedical research. Radiology
244(1), 12–27 (2007)
10. Hillman, B.J.: Introduction to the special issue on medical imaging in oncology. J. Clin. Oncol.
24(20), 3223–3224 (2006)
11. Lehman, C.D., Isaacs, C., Schnall, M.D., Pisano, E.D., Ascher, S.M., Weatherall, P.T.,
Bluemke, D.A., Bowen, D.J., Marcom, P.K., Armstrong, D.K., Domchek, S.M.: Cancer yield
of mammography, MR, and US in high-risk women: prospective multi-institution breast can-
cer screening study. Radiology 244(2), 381–388 (2007)
12. de Torres, J.P., Bastarrika, G., Wisnivesky, J.P., Alcaide, A.B., Campo, A., Seijo, L.M., Pueyo,
J.C., Villanueva, A., Lozano, M.D., Montes, U., Montuenga, L.: Assessing the relationship
between lung cancer risk and emphysema detected on low-dose CT of the chest. Chest 132(6),
1932–1938 (2007)
13. Nelson, E.D., Slotoroff, C.B., Gomella, L.G., Halpern, E.J.: Targeted biopsy of the prostate: the
impact of color Doppler imaging and elastography on prostate cancer detection and Gleason
score. Urology 70(6), 1136–1140 (2007)
14. Kent, M.S., Port, J.L., Altorki, N.K.: Current state of imaging for lung cancer staging. Thorac.
Surg. Clin. 14(1), 1–3 (2004)
15. Fermé, C., Vanel, D., Ribrag, V., Girinski, T.: Role of imaging to choose treatment: Wednesday
5 October 2005, 08:30–10:00. Cancer Imaging. 2005;5(Spec No A):S113
16. Haugeland J. Artificial Intelligence: The Very Idea. MIT Press (1989)
17. Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of
pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 193–202 (1980)
18. Lo, S.C., Lou, S.L., Lin, J.S., Freedman, M.T., Chien, M.V., Mun, S.K.: Artificial convolution
neural network techniques and applications for lung nodule detection. IEEE Trans. Med.
Imaging 14(4), 711–718 (1995)
19. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document
recognition. Proc. IEEE 86(11), 2278–2324 (1998)
20. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional
neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105
(2012)
21. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A.,
Khosla, A., Bernstein, M., Berg, A.C.: Imagenet large scale visual recognition challenge. Int.
J. Comput. Vis. 115(3), 211–252 (2015)
22. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives.
IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
23. Ravì, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., Yang, G.Z.: Deep
learning for health informatics. IEEE J. Biomed. Health Inf. 21(1), 4–21 (2016)
24. Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Ann. Rev. Biomed.
Eng. 21(19), 221–248 (2017)
25. Smola, A., Vishwanathan, S.V.: Introduction to Machine Learning, vol. 32, pp. 34. Cambridge
University, UK (2008)
26. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
27. Thrun, S.: Is learning the n-th thing any easier than learning the first? In: Advances in Neural
Information Processing Systems, pp. 640–646 (1996)
28. House, D., Walker, M.L., Wu, Z., Wong, J.Y., Betke, M.: IEEE Computer Society Conference
on Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009, pp.
186–193. IEEE (2009)
220 W. Al-Dhabyani and A. Fahmy
29. Kumar, A., Kim, J., Cai, W., Fulham, M., Feng, D.: Content-based medical image retrieval: a
survey of applications to multidimensional and multimodality data. J. Digit. Imaging 26(6),
1025–1039 (2013)
30. Sedghi, S., Sanderson, M., Clough, P.: How do health care professionals select medical images
they need? In: Aslib Proceedings. Emerald Group Publishing Limited (29 Jul 2012)
31. Freeny, P.C., Lawson, T.L.: Radiology of the Pancreas. Springer Science & Business Media
(6 Dec 2012)
32. Anwar, S.M., Majid, M., Qayyum, A., Awais, M., Alnowami, M., Khan, M.K.: Medical image
analysis using convolutional neural networks: a review. J. Med. Syst. 42(11), 226 (2018)
33. Heidenreich, A., Desgrandschamps, F., Terrier, F.: Modern approach of diagnosis and man-
agement of acute flank pain: review of all imaging modalities. Eur. Urol. 41(4), 351–362
(2002)
34. Rahman, M.M., Desai, B.C., Bhattacharya, P.: Medical image retrieval with probabilistic
multi-class support vector machine classifiers and adaptive similarity fusion. Comput. Med.
Imaging Graph. 32(2), 95–108 (2008)
35. Sánchez Monedero, J., Saez Manzano, A., Gutiérrez Peña, P.A., Hervas Martínez, C.: Machine
learning methods for binary and multiclass classification of melanoma thickness from der-
moscopic images. IEEE Trans. Knowl. Data Eng. 2016 (ONLINE)
36. Miri, M.S., Lee, K., Niemeijer, M., Abràmoff, M.D., Kwon, Y.H., Garvin, M.K.: Multimodal
segmentation of optic disc and cup from stereo fundus and SD-OCT images. In: Medical
Imaging 2013: Image Processing. International Society for Optics and Photonics, vol. 8669,
p. 86690O (13 Mar 2013)
37. Gao, Y., Zhan, Y., Shen, D.: Incremental learning with selective memory (ILSM): towards
fast prostate localization for image guided radiotherapy. IEEE Trans. Med. Imaging 33(2),
518–534 (2013)
38. Tao, Y., Peng, Z., Krishnan, A., Zhou, X.S.: Robust learning-based parsing and annotation of
medical radiographs. IEEE Trans. Med. Imaging. 30(2), 338–350 (2010)
39. Camlica, Z., Tizhoosh, H.R., Khalvati, F.: Autoencoding the retrieval relevance of medical
images. In: 2015 International Conference on Image Processing Theory, Tools and Applica-
tions (IPTA), pp. 550–555. IEEE (10 Nov 2015)
40. Branstetter, B.F.: Practical Imaging Informatics: Foundations and Applications for PACS
Professionals. Springer, New York (2009)
41. Bidgood Jr., W.D., Horii, S.C., Prior, F.W., Van Syckle, D.E.: Understanding and using
DICOM, the data interchange standard for biomedical imaging. J. Am. Med. Inf. Assoc.
4(3), 199–212 (1997)
42. Lauro, G.R., Cable, W., Lesniak, A., Tseytlin, E., McHugh, J., Parwani, A., Pantanowitz,
L.: Digital pathology consultations-a new era in digital imaging, challenges and practical
applications. J. Digit. Imaging 26(4), 668–677 (2013)
43. Tirado-Ramos, A., Hu, J., Lee, K.P.: Information object definition-based unified modeling lan-
guage representation of DICOM structured Reporting: A Case Study of Transcoding DICOM
to XML. J. Am. Med. Inf. Assoc. 9(1), 63–72 (2002)
44. Seibert, J.A.: Modalities and data acquisition. In: Practical Imaging Informatics, pp. 49–66.
Springer, New York, NY (2009)
45. Li, Y., Ping, W.: Cancer metastasis detection with neural conditional random field (19 Jun
2018). arXiv:1806.07064
46. Cruz-Roa, A., Gilmore, H., Basavanhally, A., Feldman, M., Ganesan, S., Shih, N.,
Tomaszewski, J., Madabhushi, A., González, F.: High-throughput adaptive sampling for
whole-slide histopathology image analysis (HASHI) via convolutional neural networks: appli-
cation to invasive breast cancer detection. PloS one 13(5) (2018)
47. Kamnitsas, K., Baumgartner, C., Ledig, C., Newcombe, V., Simpson, J., Kane, A., Menon, D.,
Nori, A., Criminisi, A., Rueckert, D., Glocker, B.: Unsupervised domain adaptation in brain
lesion segmentation with adversarial networks. In: International Conference on Information
Processing in Medical Imaging, vol. 25, pp. 597–609. Springer, Cham (2017)
Machine Learning Cancer Diagnosis Based on Medical Image … 221
48. Park, S., Pantanowitz, L., Parwani, A.V.: Digital imaging in pathology. Clin. Lab. Med. 32(4),
557–584 (2012)
49. Larobina, M., Murino, L.: Medical image file formats. J. Digit. Imaging 27(2), 200–206 (2014)
50. NIFTI documentation, (Available via website, 2018). https://nifti.nimh.nih.gov/nifti-1/
documentation (Cited May 18, 2018)
51. Robb, R.A., Hanson, D.P., Karwoski, R.A., Larson, A.G., Workman, E.L., Stacy, M.C.: Ana-
lyze: a comprehensive, operator-interactive software package for multidimensional medical
image display and analysis. Comput. Med. Imaging Graph. 13(6), 433–454 (1989)
52. MINC software library and tools, (Available via website, 2018). http://www.bic.mni.mcgill.
ca/ServicesSoftware/MINC (Cited May 18, 2018)
53. Ukrit, M.F., Umamageswari, A., Suresh, G.R.: A survey on lossless compression for medical
images. Int. J. Comput. Appl. 31(8), 47–50 (2011)
54. Wikipedia: Encyclopedia of Graphics File Formats, (Available via website, 2019). https://en.
wikipedia.org/wiki/Machine-learning (25 March 2019)
55. Hata, A., Yanagawa, M., Honda, O., Kikuchi, N., Miyata, T., Tsukagoshi, S., Uranishi, A.,
Tomiyama, N.: Effect of matrix size on the image quality of ultra-high-resolution CT of the
lung: comparison of 512x512, 1024x1024, and 2048x2048. Acad. Radiol. 25(7), 869–876
(2018)
56. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville,
A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing
Systems, pp. 2672–2680 (2014)
57. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani,
A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative
adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 4681–4690 (2017)
58. Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., Change Loy, C.: Esrgan:
enhanced super-resolution generative adversarial networks. In: Proceedings of the European
Conference on Computer Vision (ECCV), pp. 0–0. 2018
59. Wang, Z., Chen, J., Hoi, S.C.: Deep learning for image super-resolution: a survey (16 Feb
2019). arXiv:1902.06068
60. Xia, Q., Ni, J., Kanpogninge, A.J., Gee, J.C.: Searchable public-key encryption with data
sharing in dynamic groups for mobile cloud storage. J. UCS 21(3), 440–453 (2015)
61. Chaabouni, I., Fourati, W., Bouhlel, M.S.: Using ROI with ISOM compression to medical
image. Int. J. Comput. Vis. Robot. 6(1–2), 65–76 (2016)
62. Suruliandi, A., Raja, S.P.: Empirical evaluation of EZW and other encoding techniques in
the wavelet-based image compression domain. Int. J. Wavelets, Multiresolution Inf. Process.
13(02), 1550012 (2015)
63. Ang, B.H., Sheikh, U.U.: Marsono MN. 2-D DWT system architecture for image compression.
J. Signal Process. Syst. 78(2), 131–137 (2015)
64. Shih, F.Y., Wu, Y.T.: Robust watermarking and compression for medical images based on
genetic algorithms. Inf. Sci. 175(3), 200–216 (2005)
65. Doukas, C., Maglogiannis, I.: Region of interest coding techniques for medical image com-
pression. IEEE Eng. Med. Biol. Mag. 26(5), 29–35 (2007)
66. Hernandez-Cabronero, M., Blanes, I., Pinho, A.J., Marcellin, M.W., Serra-Sagristà, J.: Pro-
gressive lossy-to-lossless compression of DNA microarray images. IEEE Signal Process. Lett.
23(5), 698–702 (2016)
67. Pizzolante, R., Carpentieri, B., Castiglione, A.: A secure low complexity approach for com-
pression and transmission of 3-D medical images. In: 2013 Eighth International Conference on
Broadband and Wireless Computing, Communication and Applications, pp. 387–392. IEEE
(28 Oct 2013)
68. Bhavani, S., Thanushkodi, K.G.: Comparison of fractal coding methods for medical image
compression. IET Image Process. 7(7), 686–693 (2013)
69. Castiglione, A., Pizzolante, R., De Santis, A., Carpentieri, B., Castiglione, A., Palmieri, F.:
Cloud-based adaptive compression and secure management services for 3D healthcare data.
Future Gener. Comput. Syst. 1(43), 120–134 (2015)
222 W. Al-Dhabyani and A. Fahmy
70. Ciznicki, M., Kurowski, K., Plaza, A.J.: Graphics processing unit implementation of
JPEG2000 for hyperspectral image compression. J. Appl. Remote Sens. 6(1), 061507 (2012)
71. Bruylants, T., Munteanu, A., Schelkens, P.: Wavelet based volumetric medical image com-
pression. Signal Process. Image Commun. 1(31), 112–133 (2015)
72. Pu, L., Marcellin, M.W., Bilgin, A., Ashok, A.: Compression based on a joint task-specific
information metric. In: 2015 Data compression conference. IEEE, pp. 467–467 (7 Apr 2015)
73. Starosolski, R.: New simple and efficient color space transformations for lossless image
compression. J. Visual Commun. Image Represent. 25(5), 1056–1063 (2014)
74. The Adoption of Lossy Image Data Compression for the Purpose of Clinical Interpretation,
(Available via website, 2017). https://www.rcr.ac.uk/sites/default/files/docs/radiology/pdf/
IT-guidance-LossyApr08.pdf (Cited 15 October 2017)
75. Wu, X., Li, Y., Liu, K., Wang, K., Wang, L.: Massive parallel implementation of JPEG2000
decoding algorithm with multi-GPUs. In: Satellite Data Compression, Communications, and
Processing X. International Society for Optics and Photonics, vol. 9124, pp. 91240S (22 May
2014)
76. Blinder, D., Bruylants, T., Ottevaere, H., Munteanu, A., Schelkens, P.: JPEG 2000-based
compression of fringe patterns for digital holographic microscopy. Opt. Eng. 53(12), 123102
(2014)
77. Chemak, C., Bouhlel, M.S., Lapayre, J.C.: Neurology diagnostics security and terminal adap-
tation for PocketNeuro project. Telemed. e-Health. 14(7), 671–678 (2008)
78. Dewan, M.A., Islam, R., Sharif, M.A., Islam, M.A.: An Approach to Improve JPEG for Lossy
Still Image Compression. Computer Science & Engineering Discipline, Khulna University,
Khulna 9208
79. Hara, J.: An implementation of JPEG 2000 interactive image communication system. In: 2005
IEEE International Symposium on Circuits and Systems, pp. 5922–5925. IEEE (23 May 2005)
80. Supplement 145: Whole Slide Microscopic Image IOD and SOP Classes, (Available via web-
site, 2019). ftp://medical.nema.org/MEDICAL/Dicom/Final/sup145-ft.pdf (Cited 10 March
2019)
81. BigTIFF: BigTIFF Library, (Available via website, 2019). http://bigtiff.org/ (Cited 12 March
2019)
82. Liu, F., Hernandez-Cabronero, M., Sanchez, V., Marcellin, M.W., Bilgin, A.: The current role
of image compression standards in medical imaging. Information 8(4), 131 (2017)
83. Farahani, N., Parwani, A.V., Pantanowitz, L.: Whole slide imaging in pathology: advantages,
limitations, and emerging perspectives. Pathol. Lab. Med. Int. 7(23–33), 4321 (2015)
84. Komura, D., Ishikawa, S.: Machine learning methods for histopathological image analysis.
Comput. Struct. Biotechnol. J. 1(16), 34–42 (2018)
85. Liu, Y., Gadepalli, K., Norouzi, M., Dahl, G.E., Kohlberger, T., Boyko, A., Venugopalan,
S., Timofeev, A., Nelson, P.Q., Corrado, G.S., Hipp, J.D.: Detecting cancer metastases on
gigapixel pathology images (3 Mar 2017). arXiv:1703.02442
86. Wang, D., Khosla, A., Gargeya, R., Irshad, H., Beck, A.H.: Deep learning for identifying
metastatic breast cancer (18 Jun 2016). arXiv:1606.05718
87. Mungle, T., Tewary, S., Das, D.K., Arun, I., Basak, B., Agarwal, S., Ahmed, R., Chatterjee,
S., Chakraborty, C.: MRF ANN: a machine learning approach for automated ER scoring of
breast cancer immunohistochemical images. J. Microsc. 267(2), 117–129 (2017)
88. Wang, D., Foran, D.J., Ren, J., Zhong, H., Kim, I.Y., Qi, X.: Exploring automatic prostate
histopathology image gleason grading via local structure modeling. In: 2015 37th Annual
International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC),
pp. 2649–2652. IEEE (25 Aug 2015)
89. Wollmann, T., Rohr, K.: Automatic breast cancer grading in lymph nodes using a deep neural
network (24 Jul 2017). arXiv:1707.07565
90. Gutman, D., Codella, N.C., Celebi, E., Helba, B., Marchetti, M., Mishra, N., Halpern, A.:
Skin lesion analysis toward melanoma detection: a challenge at the international symposium
on biomedical imaging (ISBI) 2016, hosted by the international skin imaging collaboration
(ISIC) (4 May 2016). arXiv:1605.01397
Machine Learning Cancer Diagnosis Based on Medical Image … 223
91. Erickson, B.J., Korfiatis, P., Akkus, Z., Kline, T.L.: Machine learning for medical imaging.
Radiographics 37(2), 505–515 (2017)
92. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image
segmentation. In: International Conference on Medical Image Computing and Computer-
assisted Intervention, vol. 5, pp. 234–241. Springer, Cham (2015)
93. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation.
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.
3431–3440 (2015)
94. LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The
Handbook of Brain Theory and Neural Networks, vol. 3361, no. 10 (1995)
95. Al-Dhabyani, W., Gomaa, M., Khaled, H., Aly, F.: Deep learning approaches for data aug-
mentation and classification of breast masses using ultrasound images. Int. J. Adv. Comput.
Sci. Appl. 10(5) (2019)
96. Khalifa, N.E., Taha, M.H., Hassanien, A.E., Hemedan, A.A.: Deep bacteria: robust deep
learning data augmentation design for limited bacterial colony dataset. Int. J. Reason. Based
Intell. Syst. 11(3), 256–64 (2019)
97. Khalifa, N.E., Taha, M.H., Hassanien, A.E., Mohamed, H.N.: Deep iris: deep learning for
gender classification through iris patterns. Acta Informatica Medica 27(2), 96 (2019)
98. Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A.: Dataset of breast ultrasound images.
Data Brief 1(28), 104863 (2020)
99. Cancer Imaging Archive, (Available via website, 2018). http://www.cancerimagingarchive.
net (Cited 20 October 2018)
100. National Cancer Institute, Genomic data commons data portal (legacy archive), (Available
via website, 2018). https://portal.gdc.cancer.gov/legacy-archive/ (Cited 18 October 2018)
Edge Detector-Based Hybrid Artificial
Neural Network Models for Urinary
Bladder Cancer Diagnosis
-
Ivan Lorencin, Nikola Andelić, Sandi Baressi Šegota, Jelena Musulin,
Daniel Štifanić, Vedran Mrzljak, Josip Španjol, and Zlatan Car
Bladder cancer is one of the most common cancers of the urinary tract and is a
result of uncontrolled growth of bladder mucous cells. Some of the potential risk
factors for urinary bladder cancer are [1]: smoking, previous cases of disease in
family members, exposure to chemicals and prior radiation therapy. The symptoms
that most commonly suggest bladder cancer are: lower back pain, painful urination
and blood in urine. When some or all of those symptoms are present, the diagnostic
procedure for urinary bladder cancer must be performed. As one of the key proce-
dures, cystoscopy is nowadays frequently performed examination during the process
of urinary bladder cancer diagnosis. It can be defined as an endoscopic method for
visual evaluation of urethra and bladder via insertion of the probe called cystoscope
-
I. Lorencin · N. Andelić (B) · S. Baressi Šegota · J. Musulin · D. Štifanić · V. Mrzljak · Z. Car
Faculty of Engineering, University of Rijeka, Vukovarska 58, 51000 Rijeka, Croatia
e-mail: [email protected]
I. Lorencin
e-mail: [email protected]
S. Baressi Šegota
e-mail: [email protected]
J. Musulin
e-mail: [email protected]
D. Štifanić
e-mail: [email protected]
V. Mrzljak
e-mail: [email protected]
Z. Car
e-mail: [email protected]
J. Španjol
Faculty of Medicine, University of Rijeka, Braće Branchetta 20/1, 51000 Rijeka, Croatia
e-mail: [email protected]
© The Editor(s) (if applicable) and The Author(s), under exclusive license 225
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_10
226 I. Lorencin et al.
[2]. In clinical practice, there are two basic types of cystoscopy, and these are flex-
ible cystoscopy and rigid cystoscopy [3]. Difference between two aforementioned
methods is that in flexible cystoscopy, the procedure is performed by using flexible
fiber-optic, while in the case of rigid cystoscopy procedure is performed by using
rigid fiber-optic. Modern cystoscopes are equipped with digital camera that is used to
broadcast cystoscopy images on a computer screen. Such approach offers a possibil-
ity for optical evaluation of lesions on bladder mucosa. This evaluation is performed
by a specialist urologist in vivo. By utilizing this method, papillary lesions of transi-
tional epithelium are detected in high number of cases. Negative side of this method
is the fact that is hard to distinguish inflammatory and scarring changes in bladder
mucosa from Carcinoma in-situ (CIS) or bladder cancer that does not exceed 75%
[4, 5]. For these reasons, confocal laser endomicroscopy (CLE) is utilized for lesions
observation. Such approach offers higher accuracy of urinary bladder cancer diagno-
sis and it can be called optical biopsy. By utilizing this procedure, malignant changes
in urinary bladder mucosa can be detected according to micro-structural changes.
Images obtained with CLE can roughly be divided in four classes, and these are:
• High-grade carcinoma,
• Low-grade carcinoma,
• Carcinoma in-situ (CIS) and
• Healthy tissue.
CIS represents a group of abnormal cells that may become cancer, while low-grade
cancer cells represent slower growing and spreading cancer cells. On the other hand,
high-grade carcinoma represents a term describing faster growing and spreading
cancer cells. A right diagnosis in regard of cancer gradation can be crucial in treatment
planning [6]. An example of all four classes is given in Fig. 1.
Aforementioned micro-structural changes for all three carcinoma classes are
detected with higher accuracy if AI methods, mainly ANNs are utilized for clas-
sification of images collected by CLE. Such approach can offer a significant increase
in bladder cancer diagnosis accuracy, especially in the case of CIS. Principle of ANN
utilization in optical biopsy is presented with a dataflow diagram in Fig. 2.
A standard procedure of ANN utilization in tasks of image recognition and clas-
sification includes use of convolutional neural networks (CNN). Such procedure
represents a standard approach with applications in various fields of science and tech-
nology [7, 8]. Described trend can also be noticed in medical applications [9–11].
Such approach can be costly from a standpoint of computational resources [12]. For
these reasons, the utilization of simpler algorithms can be of significant importance.
This feature is particularly emphasized in clinical practice where fewer computing
resources are available. Here lies a motive for utilization of hybrid ANN models. In
this chapter, an idea of edge detector-based ANN models utilization in diagnosis of
urinary bladder cancer is presented.
Edge Detector-Based Hybrid Artificial Neural Network Models … 227
Training/TesƟng
Split
HPC
WorkstaƟon
OpƟcal
Biopsy
Microscope
OpƟcal Biopsy
Hyperparameter
Images
search
Medical
ANN
Finding Model
Fig. 2 Dataflow diagram of proposed ANN utilization for optical biopsy of urinary bladder
228 I. Lorencin et al.
Image ∇/Δ
CNN
CN N
The idea of edge detector-based hybrid ANN model utilization in urinary bladder
cancer diagnosis is presented in [13], where Laplacian edge detector is utilized for
designing the hybrid ANN model. The main principle of edge detector-based hybrid
ANN model is application of edge detector on all images contained in the dataset.
Such operation can be defined as convolution of image with an edge detector kernel
as:
B = A ∗ K, (1)
whose elements are defined depending on the edge detector operator. Convolution
of a digital image with edge detector kernel is defined with its convolutional sum as:
m
n
B= Am−i,n− j K i+1, j+1 , (3)
i=0 j=0
where m represents image height and n represents image width in number of pixels.
According to ANN type, hybrid models can be divided to:
• hybrid models designed with MLP and
• hybrid models designed with CNN.
Dataflow diagram of MLP-based hybrid models is shown in Fig. 3.
The idea behind MLP-based hybrid models is to utilize edge detectors for feature
extraction. Such procedure is equivalent to convolutional layers in CNN. This step
is combined with image scaling, that is equivalent to pooling layers in CNN. The
detailed procedure of image downsizing is presented in [13]. Such approach allows
Edge Detector-Based Hybrid Artificial Neural Network Models … 229
As the name suggests, gradient-based hybrid models are designed by using gradient-
based edge detectors. The idea behind gradient-based edge detectors is utilization of
the gradient operator on an image. Digital image can be represented with a discrete
function with two variables as:
A = f (x, y), (4)
∂A ∂A
∇A = , . (5)
∂x ∂y
∂A
= A ∗ Kx (8)
∂x
and:
∂A
= A ∗ Ky. (9)
∂y
and phase:
∂A
∂y
= arctan ∂A
. (11)
∂x
Roberts hybrid model is based on the utilization of Roberts edge detector or Roberts
cross. This approach represents a simple method for image partial derivative approx-
imation [14]. Using numerical formulations, image gradient can be approximated
with:
∇ A ≈ ((Ai, j − Ai+1, j+1 ), (Ai+1, j − Ai, j+1 )), (12)
where the first difference represents a partial derivative on x and the second difference
a partial derivative on y coordinate. If partial derivatives are approximated around
interpolated pixel P(i + 0.5, j + 0.5) [14], kernels:
1 0
Rx = (13)
0 −1
Edge Detector-Based Hybrid Artificial Neural Network Models … 231
and:
0 1
Ry = , (14)
−1 0
are constructed. According to kernels presented in Eqs. (13) and (14) gradient mag-
nitude can be calculated to represent a new image with extracted edges. Examples
of all four class members with applied Roberts edge detector are presented in Fig. 5.
f (x + 1) − f (x − 1)
f (x) ≈ . (15)
2
In this way, approximations of partial derivatives for 2-D image can be expressed
as:
232 I. Lorencin et al.
∂A Ai+1, j − Ai−1, j
≈ , (16)
∂x 2
for x coordinate, and:
∂A Ai, j+1 − Ai, j−1
≈ . (17)
∂y 2
for y coordinate. Following the same logic as in the case of Roberts edge detector,
edge detector kernels can be constructed around pixel Ai, j as:
1
Px = −1 0 1 , (18)
2
for x coordinate, and: ⎡ ⎤
−1
1⎣ ⎦
Py = 0 , (19)
2 1
for partial derivative on y coordinate. Results of gradient magnitude for the case of
Prewitt edge detector are shown in Fig. 6 for all four classes.
As another gradient-based method, Sobel edge detector is utilized. This edge detector
is based on Prewitt edge detector with the difference in the middle row in the case
of approximation of partial derivative in x coordinate:
⎡ ⎤
−1 0 1
Sx = ⎣−2 0 2⎦ , (22)
−1 0 1
Edge Detector-Based Hybrid Artificial Neural Network Models … 233
and with the difference in the middle column in the case of the approximation of the
partial derivative in y coordinate:
⎡ ⎤
−1 −2 −1
Sy = ⎣ 0 0 0 ⎦ . (23)
1 2 1
∂2 A ∂2 A
A = ∇ · ∇ A = + , (24)
∂x2 ∂ y2
∂2 A ∂A ∂A
≈ − . (25)
∂x2 i, j ∂x i, j ∂x i−1, j
∂A
≈ Ai+1, j − Ai, j (26)
∂x i, j
and:
∂A
≈ Ai, j − Ai−1, j . (27)
∂x i−1, j
By combining Eqs. (25), (26) and (27), second partial derivative of an image on
x coordinate can be approximated with:
Edge Detector-Based Hybrid Artificial Neural Network Models … 235
∂2 A
≈ (Ai+1, j − Ai, j ) − (Ai, j − Ai−1, j ), (28)
∂x2 i, j
which yields:
∂2 A
≈ Ai+1, j − 2 · Ai, j + Ai−1, j . (29)
∂x2 i, j
For the case of the second partial derivative on y coordinate, the procedure is
similar. Second partial derivative can be approximated with the finite difference of
first partial derivatives as:
∂2 A ∂A ∂A
≈ − . (30)
∂ y2 i, j ∂ yi, j i, j ∂x i, j−1
First partial derivatives can also be approximated with finite differences giving:
∂2 A
≈ Ai, j+1 − 2 · Ai, j + Ai, j−1 . (31)
∂ y2 i, j
An overview of the results achieved with Laplacian edge detector is given in Fig. 8
for all four classes.
It is important to notice that Laplacian edge detector is designed with just one
kernel, in difference with gradient-based edge detectors that are designed with two
kernels. This difference comes from the fact that gradient is a vector, while Laplacian
is a scalar. The result of this property is a difference in execution times of these two
types of edge detectors.
5 Model Evaluation
In order to determine the optimal ANN model, it is necessary to define model eval-
uation criteria. The standard procedure for classifier evaluation is receiver operating
characteristic (R OC) analysis that is used for binary classifier evaluation [17]. R OC
236 I. Lorencin et al.
where elements on a main diagonal (M11 , M22 , M33 and M44 ) represent a fraction
of the correct classification in a particular class, while the other elements represent
fractions of incorrect classifications in all other classes. An example of a confusion
matrix for the case of urinary bladder cancer diagnosis is presented in Fig. 9.
Edge Detector-Based Hybrid Artificial Neural Network Models … 237
where PC represents the number of correct classifications into the positive class and
PU represents the number of incorrect classifications into the negative class. On the
other hand, F P R can be calculated as:
NU
FPR = , (36)
N C + NU
where NU represents the number of incorrect classifications into the positive class
and NC represents the number of correct classifications into the negative class. Due
to inability for standard binary R OC AU C approach to evaluation of non-binary
classifiers, some variations must be applied. For these reasons average AU C values
are introduced. For purposes of this chapter, two types of average AU C value are
introduced, and these are:
• micro-average AU C and
• macro-average AU C.
As it is in the case of a binary R OC curve, for the case of average values, R OC
curves are designed by using T P R and F P R. For the case of micro-average, T P R
238 I. Lorencin et al.
where tr (M) represents the trace of a confusion matrix that can be defined as a sum
of main diagonal elements, or:
N
tr (M) = Mmm (38)
m=1
N
N
G(M) = Mmn . (39)
m=1 n=1
F P R can be expressed as a fraction of all false classifications and a sum of all matrix
elements. Such fraction can be written as:
G(M) − tr (M)
F P Rmicr o = . (40)
G(M)
1
N
T P Rmacr o = T P Rn . (41)
N n=1
For example, T P R for the case of the first class can be defined as a fraction between a
number of correctly classified images and a total number of images that are members
of the first class. Such expression can be written as:
M11
T P R1 = . (42)
M11 + M12 + M13 + M14
For example, F P R for the case of the first class can be defined as a fraction of a
number of images that are incorrectly classified as members of the first class and total
number of images that are classified as members of the first class. Such expression
can be written as:
Edge Detector-Based Hybrid Artificial Neural Network Models … 239
Both AU C micr o and AU C macr o will be used for evaluation of various architectures
of hybrid ANN models for urinary bladder cancer diagnosis.
In order to determine the optimal ANN architecture, the grid-search procedure is per-
formed. Grid search takes all possible combinations of hyperparameters and creates
a model for each of those hyperparameter combinations, which is then trained and
evaluated. Such procedure is based on extensive search through the defined hyperpa-
rameter space, that is defined according to theoretical knowledge of model selection
[18, 19]. The constraints are particularly introduced for the number of neurons and
layers. An overview of possible hyperparameters for the case of CNN-based hybrid
models is given in Table 1. Hyperparameters are the values which define the archi-
tecture of the ANN/CNN. Choice of hyperparameters has a great influence on the
performance of the ANN [18, 20].
By using above presented hyperparameters, a grid search procedure will be per-
formed for cases of all four hybrid CNN models. Alongside that, six different config-
urations of CNN-based models will be utilized. These configurations are presented
in Table 2.
From architectures presented above, one with best performances will be deter-
mined for the case of all four hybrid models. For the case of MLP-based hybrid
models, the hyperparameter space is defined in Table 3.
Alongside hyperparameter search, four different sizes of input images will be
used, and these are:
• 30 × 30,
• 50 × 50,
• 100 × 100 and
• 200 × 200.
Table 1 Definition of hyperparameter space for the case of CNN-based hybrid models
Hyperparameter Values
Activation function per layer ReLU, Tanh, Logistic Sigmoid
Number of feature maps per layer 2, 5, 10, 20
Kernel size per layer [3, 3] [5, 5] [10, 10]
Pooling size per layer [2, 2] [5, 5]
Number of neurons per layer 10, 20, 40, 80
Number of epochs 1, 10, 20, 50
Batch size 4, 8, 16, 32, 64
240 I. Lorencin et al.
Table 2 Used CNN configurations with presented numbers of convolutional, pooling and fully
connected layers
CNN Convolutional layers Pooling layers Fully connected layers
1. 1 1 1
2. 1 1 2
3. 1 1 3
4. 2 2 1
5. 2 2 2
6. 2 2 3
Table 3 Definition of hyperparameter space for the case of MLP-based hybrid models
Hyperparameter Values
Activation functions per layer ReLU, Tanh, Logistic Sigmoid
Number of neurons per layer 10, 20, 40, 80
Number of hidden layers 1, 2, 3, 4, 5
Number of epochs 1, 10, 20, 50
Batch size 16, 32, 64
As it is in the case of CNN-based models, the best models for each edge detector
will be presented.
Table 4 Overview of hyperparameters used for design of hybrid CNN models for all four edge
detectors
Hyperparameter Hybrid model
Roberts Prewitt Sobel Laplacian
Feature maps (first 5 2 2 2
layer)
Activation Tanh Tanh ReLU ReLU
function (first
layer)
Kernel size (first [5, 5] [3, 3] [3, 3] [3, 3]
layer)
Pooling size (first [2, 2] [2, 2] [2, 2] [2, 2]
layer)
Feature maps 10 – – 2
(second layer)
Activation ReLU – – Sigmoid
function (second
layer)
Kernel size [5, 5] – – [5, 5]
(second layer)
Pooling size [2, 2] – – [2, 2]
(second layer)
Number of 40 10 20 40
neurons (first fully
connected layer)
Activation Tanh Tanh ReLU Tanh
function (first fully
conected layer)
Number of – 10 10 20
neurons (second
fully connected
layer)
Activation – Sigmoid Sigmoid Sigmoid
function (second
fully conected
layer)
Number of – 10 – –
neurons (third
fully connected
layer)
Activation – ReLU – –
function (third
fully connected
layer)
Solver Adam SGD SGD Adam
Number of epochs 10 10 20 10
Batch size 32 16 32 32
242 I. Lorencin et al.
Table 6 Overview of hyperparameters of optimal hybrid MLP models for all four edge detectors
Hyperparameter Hybrid model
Roberts Prewitt Sobel Laplacian
Number of 10 80 10 10
neurons (first
hidden layer)
Activation ReLU ReLU ReLU ReLU
function (first
hidden layer)
Number of 80 80 80 80
neurons (second
hidden layer)
Activation Sigmoid Sigmoid Tanh Sigmoid
function (second
hidden layer)
Number of 40 – 10 –
neurons (third
hidden layer)
Activation Sigmoid – ReLU –
function (third
hidden layer)
Solver Adam SGD Adam Adam
Number of 10 20 10 10
epochs
Batch size 16 32 4 4
Image size 100 × 100 50 × 50 50 × 50 100 × 100
in the case of CNN-based models, MLP-based models are also converging to the
models of intermediate complexity. Such conclusion could be driven from the fact
that all obtained models are constructed with two or three hidden layers.
As another interesting property, the fact of algorithm converging into the inter-
mediate image size can be noticed. Lower classification performances are achieved
if larger input images are used. By using above presented models, classification per-
formances represented with AU C micr o and AU C macr o are achieved. These values
are presented in Table 7.
Edge Detector-Based Hybrid Artificial Neural Network Models … 243
Fig. 10 Comparison of
AU C micr o values achieved
with CNN-based and
MLP-based hybrid models
When achieved results are compared, it can be noticed that CNN-based models
are performing better from the AU C micr o standpoint. It can also be noticed that the
highest AU C micr o value is achieved if Laplacian-based CNN model is utilized. On
the other hand, Laplacian-based MLP models are showing the lowest performances
from AU C micr o standpoint. Such comparison is presented in Fig. 10.
When AU C macr o is used for model comparison, it can be noticed that the same
results are achieved in both cases of Sobel and Laplacian utilization. It is interesting
to notice that if AU C macr o is used as a classification measure, Roberts-based MLP
model is showing higher performances that its CNN version. The gap between CNN
and MLP model is still noticeable for the case of Laplacian-based models. Such
property is shown in Fig. 11.
As the last measure for model comparison, the average edge detector execution
time can be used. In this case, it can be noticed that Laplacian-based model achieved
significantly shorter execution times, in comparison to gradient-based edge detec-
tors. Such property is logical because the fact of two-kernel nature of gradient and
one-kernel nature of Laplacian edge detectors. It is also interesting to notice that
Roberts edge detector requires shorter time to execute, in comparison to other gra-
dient detectors, as presented in Fig. 12. This characteristic is the result of simpler
kernels.
244 I. Lorencin et al.
Fig. 11 Comparison of
AU C macr o values achieved
with CNN-based and
MLP-based hybrid models
Fig. 12 Comparison of
average execution times for
all four edge detectors
When all presented facts are summed up, it can be concluded that the lowest
requirements in regard of computing resources are achieved if Laplacian-based
hybrid models are utilized. This conclusion is supported by the fact that this detec-
tor requires the shortest execution times. On the other hand, Laplacian-based CNN
model is also achieving the highest classification performances. A limiting fact for
Laplacian-based CNN model is higher model complexity, in comparison to other
CNN-based models. A simpler CNN model could be used if Sobel hybrid model is
used. Such approach, despite lower model complexity, requires more complex image
pre-processing.
Acknowledgments This research has been (partly) supported by the CEEPUS network CIII-HR-
0108, European Regional Development Fund under the grant KK.01.1.1.01.0009 (DATACROSS),
project CEKOM under the grant KK.01.2.2.03.0004 and University of Rijeka scientific grant uniri-
tehnic-18-275-1447.
Edge Detector-Based Hybrid Artificial Neural Network Models … 245
References
1. Janković, Slavenka, Radosavljević, Vladan: Risk factors for bladder cancer. Tumori J. 93(1),
4–12 (2007)
2. Al Bahili, H.: General Surgery & Urology: key principles and clinical surgery in one book.
Saudi Med. J. 36 (2015)
3. Hashim, H. Abrams, P., Dmochowski, R.: The Handbook of Office Urological Procedures.
Springer (2008)
4. Duty, B.D., Conlin, M.J.: Principles of urologic endoscopy. In: Campbell-Walsh Urology, 11th
edn. Elsevier, Philadelphia, PA (2016)
5. Lerner, S.P., Liu, H., Wu, M.-F., Thomas, Y.K., Witjes, J.A.: Fluorescence and white light
cystoscopy for detection of carcinoma in situ of the urinary bladder. In: Urologic Oncology:
Seminars and Original Investigations, vol. 30, pp. 285–289. Elsevier (2012)
6. Babjuk, M., Böhle, A., Burger, M., Capoun, O., Cohen, D., Compérat, E.M., Hernández, V.,
Kaasinen, E., Palou, J., Rouprêt, M., et al.: Eau guidelines on non–muscle-invasive urothelial
carcinoma of the bladder: update 2016. Eur. Urol. 71(3), 447–461 (2017)
-
7. Lorencin, I., Andelić, N., Mrzljak, V., Car, Z.: Marine objects recognition using convolutional
neural networks. NAŠE MORE: znanstveno-stručni časopis za more i pomorstvo 66(3), 112–
119 (2019)
8. Dong, C., Loy, C.C., Tang, X.: Accelerating the super-resolution convolutional neural network.
In European Conference on Computer Vision, pp. 391–407. Springer, Berlin (2016)
9. Kang, E., Min, J., Ye, J.C.: A deep convolutional neural network using directional wavelets for
low-dose X-ray CT reconstruction. Med. Phys. 44(10), e360–e375 (2017)
10. Han, Xiao: MR-based synthetic CT generation using a deep convolutional neural network
method. Med. Phys. 44(4), 1408–1419 (2017)
11. Anwar, S.M., Majid, M., Qayyum, A., Awais, M., Alnowami, M., Khan, M.K.: Medical image
analysis using convolutional neural networks: a review. J. Med. Syst. 42(11), 226 (2018)
12. Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural
network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 6848–6856 (2018)
-
13. Lorencin, I., Andelić, N., Španjol, J., Car, Z.: Using multi-layer perceptron with laplacian edge
detector for bladder cancer diagnosis. Artif. Intell. Med. 102, 101746 (2020)
14. Muthukrishnan, R., Radha, M.: Edge detection techniques for image segmentation. Int. J.
Comput. Sci. Inf. Technol. 3(6), 259 (2011)
15. Morse, B.S.: Lectures in image processing and computer vision. Department of Computer
Science, Brigham Young University (1995)
16. Gao, W., Zhang, X., Yang, L., Liu, H.: An improved sobel edge detection. In: 2010 3rd Interna-
tional Conference on Computer Science and Information Technology, vol. 5, pp. 67–71. IEEE
(2010)
17. Fawcett, Tom: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
18. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining,
Inference, and Prediction. Springer Science & Business Media (2009)
19. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006)
-
20. Lorencin, I., Andelić, N., Mrzljak, V., Car, Z.: Genetic algorithm approach to design of multi-
layer perceptron for combined cycle power plant electrical power output estimation. Energies
12(22), 4352 (2019)
Predicting Building-Related Carbon
Emissions: A Test of Machine Learning
Models
1 Introduction
E. B. Boateng
School of Health Sciences, The University of Newcastle, Newcastle, Australia
e-mail: [email protected]
E. A. Twumasi
School of Surveying and Construction Management, College of Engineering and Built
Environment, Technological University Dublin – City Campus, Dublin, Ireland
e-mail: [email protected]
A. Darko (B) · M. O. Tetteh · A. P. C. Chan
Department of Building and Real Estate, The Hong Kong Polytechnic University, Hung Hom,
Kowloon, Hong Kong, China
e-mail: [email protected]
M. O. Tetteh
e-mail: [email protected]
A. P. C. Chan
e-mail: [email protected]
© The Editor(s) (if applicable) and The Author(s), under exclusive license 247
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_11
248 E. B. Boateng et al.
estimated to reach 35% by 2020 [9]. With a rapidly growing urban population, energy
consumption and CO2 emissions are expected to rise.
Considering these, reducing CO2 emissions in the buildings and construction
sector is key to attaining the Paris Agreement commitment and the United Nations
(UN) Sustainable Development Goals (SDGs) [20]. However, recent reports by the
International Energy Agency and the United Nations Environment Programme [20]
suggest that climate actions by the sector geared towards reducing building emissions
to the desired level are unenviable. As a result, most countries and their institutions
are making efforts to reduce building-related CO2 emissions and develop low-carbon
cities by introducing carbon mitigation schemes such as incentives for adopting
decarbonising measures. Also, researchers have made significant efforts, however,
according to a recent review [23], much of the investigations have been focussed on
energy use from buildings while studies on carbon emissions from buildings are rare.
Accurate estimation of building emissions is of prime importance for the attain-
ment of more sustainable built environment outcomes [26]. A reliable method for
predicting emissions could serve as a tool for international organisations and envi-
ronmental policymakers to design and implement sound climate change mitigation
policies [2]. However, studies have shown that carbon emissions from construction
activities have been underestimated [25]. Likewise, there have been many simulation
studies on buildings [23]. Nevertheless, simulated applications are not effective in
estimating energy consumption from buildings [3]. An alternative to building energy
simulations are statistical and machine learning (ML) algorithms [24].
On the other hand, classical statistical approaches are not appropriate for
modelling the complex (e.g., non-linear, non-parametric, chaotic, etc.) behaviour
of carbon emission variables [2, 15]. ML approaches have been known to yield
promising results in numerous industrial settings. Whereas linear regression and ML
techniques such as artificial neural networks (ANNs) and support vector machines
(SVMs) are popular in estimating building energy consumption, tree-based models
are seldomly used. However, recent advancement in ML algorithms has resulted in
the development of robust ML approaches for modelling and forecasting. As such,
different and novel ML algorithms need to be continuously tested in the quest to attain
high-quality forecasts for providing realistic foresights and making better-informed
decisions concerning sustainability in the built environment.
There is limited research on robust ML algorithms that can accurately predict
building-related CO2 emissions in a timely manner. This chapter evaluates and
compares the accuracy and computational costs in predicting building-related CO2
emissions, using China’s emissions as illustration. Six ML algorithms: decision
tree (DT), random forest (RF), support vector regression (SVR), extreme gradient
boosting (XGBoost), k-nearest neighbour (KNN) and adaptive boosting (AdaBoost)
were evaluated. The rest of the chapter is organised as follows. Section 2 describes the
data and methodology used in this study, Sect. 3 presents the results and discussions.
Concluding remarks are given in Sect. 4.
Predicting Building-Related Carbon Emissions … 249
Time series data spanning over the period 1971–2014 on building-related CO2 emis-
sions (CO2 emissions from residential buildings and commercial services, % of
total fuel combustion), population size (total), R&D (Trademark applications, total),
urbanisation (urban population), GDP (GDP per capita, current US$), and energy
use (kg of oil equivalent per capita) were sourced from [32] and International Energy
Agency [18, 19] repositories. However, to develop accurate models, the study follows
Shahbaz et al. [27], Bannor and Acheampong [5] to use quadratic-sum method to
transform the annual data from low-frequency to high-frequency data. Hence, this
study applied quarterly data between 1971Q1 and 2014Q4, representing 176 quarters.
Literature on fundamental variables that influence building emissions at the macro-
level was used in informing variable selection. See Acheampong and Boateng [2]
for extensive literature on the variables.
Multivariate imputation by chained equations technique was used in treating
missing data [8]. Similar approach was used by Bannor and Acheampong [5] as
it has been known to give reasonable replacements than mean, median, and modal
imputations. Upon prior experimentations, the input data was standardised while the
output data was normalised. This pre-processing of data was to eliminate instances
of one variable dominating the other [6], as the variables used in this study have
different units [2]. Moreover, this process eases the severe computations performed
by ML models. 80% (140 observations) of the data was used in training each model,
while the remaining 20% (36 observations) were used for validation. Same data
ratios were applied in previous studies (e.g. [1, 2]). Table 1 presents the descriptive
statistics for all variables used in this study.
AdaBoost was introduced in the ninety’s by Freund and Schapire [14]. The algorithm
uses an ensemble learning technique to combine several “weak learners” to form a
“strong learner” in yielding high-quality and accurate predictions. This combination
of weak learners is essential as a single learner is governed by rules that are not strong
enough to make predictions. These weak learners are mostly termed as decision tree
stumps. Larger stumps have much influence in making decisions than smaller ones
based on the “majority rule”. A key advantage of AdaBoost is that it is more resistant
to overfitting than many ML algorithms.
Upon the initial iteration, equal weights are assigned to individual samples by the
base leaner or first decision stump. For each subsequent iteration, false predictions
are assigned to the subsequent learner with greater weights while there is a reduction
in weights allocated to correct predictions. This process of assigning bigger weights
suggest a more significant value is placed on engaging false predictions to improve
generalisability. The procedure is repeated sequentially until the AdaBoost algorithm
deems that all observations have been rightly predicted. Due to their efficiency, these
algorithms are one of the most powerful ML tools and have been used in diverse
engineering and industrial fields. Despite its accolades in the ML arena, it has often
been used in classification rather than regression problems. The final hypothesis of
the AdaBoost regression model is generated in Eq. (1) as:
⎧ ⎫
⎨ ⎬
1 1 1
hf (x) = inf y ∈ Y : log ≥ log (1)
⎩ βt 2 t βt ⎭
t:ht (x) ≤ y
Decision trees (DTs) are non-parametric ML tools that are used for both regression
and classification tasks. DTs are simple and understandable, as the final tree can
explain how a prediction was yielded. They comprise of a supervised learning algo-
rithm and are advantageous in handling smaller data sets than deep learning models.
These algorithms have been applied and attained successes in many fields due to
their efficiency and interpretability [30, 34]. Generally, a decision tree consists of
Predicting Building-Related Carbon Emissions … 251
branches (splits) and nodes, where a node could be a leaf, internal, or root node. The
branches influence the complexity of the model; simpler models are less likely to
face fit problems. Through an iterative process, a tree is built from the root to the
leaves. The root node is the first node, the internal node (non-terminal) relates to
questions asked about features of variable X, and the leaf node also termed as the
terminal node contains the prediction or output.
A cost function is used in splitting a node, where the lowest cost is determined as
the best cost. Specifically, for regression, on each subset of data, the DT algorithm
computes a mean squared error (MSE), and the tree with the least MSE is designated
as the point of split. Whereas, metrics such as cross-entropy or Gini index are used
to evaluate splits for classification problems. For cross-entropy (CE) and Gini index
(GI), partitioning the dataset into subsets continues at each internal node based on
an assessment of the functions [17]:
K
CE = − Pm (k)logPm (k) (2)
k=1
K
GI = Pm (k)Pm k = Pm (k)(1 − Pm (k)) (3)
k=k k=1
where Pm (k) is the ratio of class k observations in node m. The process of splitting
continues until each node reaches a specified minimum number of training samples
or a maximum tree depth.
K
yi = fk (xi ), fk ∈ F (4)
k=1
where F is the space of function containing all possible regression trees, f is the
function in the functional space, xi is the i-th training sample, and fk (xi ) is the predic-
tion from the k-th decision tree. The objective function to be optimised is expressed
in Eq. (5).
n
K
obj(θ ) = l (yi yi )+ Ω(fk ) (5)
i k=1
where ni l(yi , yi ) is a differential loss function that controls the predictive accu-
k
racy, k=1 Ω(fx ) and penalises the complexity of the model. Striving for simplicity
prevents over-fitting. The function that controls the complexity is formulated in
Eq. (6).
1 2
T
(f ) = γ T + λ ω (6)
2 j=1 j
where γ and are user-configurable parameters, T is the number of leaves, and ωj2
is the score on the j-th leaf. The leaf is split and gains a score using Eq. (7).
2 2 2
1 i∈IL gi i∈IR gi i∈I gi
gain = + − −γ (7)
2 i∈IL hi + λ i∈IR hi + λ i∈I hi + λ
where IL and IR are the instance sets of left and right nodes after the split, and
(t−1) (t−1)
gi = ∂y(t−1) l(yi , y ) and hi = ∂ 2 l(y , y ) are the first and second-order
y(t−1) i
gradient statistics on the loss function.
The KNN is a simple and non-parametric tool applied to regression and classifi-
cation problems. The algorithm is often referred to as the lazy learning technique.
The theory of KNN assumes that input observations in local neighbourhoods will
tend to have similar outcomes. The algorithm performs reasonably well with low-
dimensional data [17]. For classification, the intuition behind the algorithm is to
locate the distances between the query and all the data points, choosing a fixed
number (k) of its closest neighbours in the feature space, and then selects the most
frequent label. On the other hand, KNN regression estimates the value of the target
y of x as the average of the labels of its nearest neighbours:
Predicting Building-Related Carbon Emissions … 253
1
k
y= yi (x) (8)
k i=1
1
K
f (x) = T (x) (9)
K
k=1
SVR is a kind of support vector machine (SVM) that supports linear and non-linear
regression in a higher-dimensional space. This kernel-based model has advantages in
high dimensional spaces since its optimisation does not depend on the dimensionality
of the input area [12]. Based on Vapnik [31] concept of support vectors, Drucker et al.
[12] developed the SVR technique. The support vector algorithm is contemporarily
used in industrial settings due to its sound inclination towards real-world applications
[28]. Using a supervised learning approach, the SVR trains by employing a symmet-
rical loss function that similarly penalises misestimates [4]. In SVC (support vector
classification), model complexity is minimised with the heuristic that all observa-
tions are correctly classified, and for SVR, minimal deviation of the predicted value
from the actual value is expected. The decision function of an SVR is expressed as:
254 E. B. Boateng et al.
n
f (x) = (αi − αj∗ )K(xi , x) + ρ (10)
i=1
where αi and αi∗ are the Lagrange multipliers, ρ is an independent term, and K(xi , x)
is defined as the kernel function. ρ can be estimated by using the Karush-Kühn-
Tucker (KKT) criteria [13, 21, 22, 28]. The kernel function could be linear x, x ,
d
polynomial ( γ x, x + r ), sigmoid tanh γ x, x + r , or radial basis function
(RBF) (exp(−γ x − x )).
2
data, resulting in 1000 models. The maximum features parameter was maintained
with the default of “auto”, which equals the number of features.
For the SVR algorithm, a too-small C (regularisation parameter) coefficient will
under-fit the training data, while a too-large C coefficient may over-fit the training
data. As such, the value of C should be carefully selected to produce an optimal
model. The gamma determines the influence of a single training sample, as well as
the capacity to capture the shape of the data. The kernel function determines how a
lower-dimensional data is mapped into a higher dimensional space. Ten C arguments
(0.5, 0.8, 0.9, 1.0, 2.0, 3.0, 4.0, 5.0, 10.0, and 50.0), five gammas’ (0.001, 0.01,
0.1, 1.0, and 2.0), and two kernels (RBF and polynomial function) were assessed
on 10-folds of training data, yielding 1000 models. For each algorithm, the ideal
combination of parameters and associated arguments is selected based on the model
with the peak MCV score.
For hardware and software environment, we used an Intel i5-2520 M [4] at 3.2 GHz
CPU, 8 GB memory operating on Ubuntu 18.04.2 LTS. Two GPUs were used, Intel
2nd Generation Core Proce, and NVIDIA GeForce GTX 1050Ti. All processing
units were involved to perform the severe parallel computations in rapidly gener-
ating numerous models. Spyder (Python 3.7) was used to write and execute the
programming codes.
2.4 Performance Evaluation Metrics
Grid-search with 10-fold cross-validation process suggested that the AdaBoost model
with a learning rate of 0.05, loss function as linear, and the number of estimators as
300 had the highest MCV score of 0.992434. This combination of parameters was
used to configure, train, and validate the AdaBoost model. The correlation between
the three parameters and their arguments can be observed in Fig. 1. Figure 2 shows
that the trained AdaBoost model has its best-fit line pass through majority of the
test data. This indicates that the AdaBoost model has developed good generalising
capabilities. The variables X0 to X4 within the internal node’s tallies with features of
256 E. B. Boateng et al.
The KNN model with the parameter combination of 1 leaf size, 1 number of neigh-
bours, and a p of 5 resulted in the highest MCV score of 0.998302. The relationship
among these parameter arguments and the corresponding MCV score can be seen
in Fig. 5. The leaf sizes are laid on each other in the left illustration, with the peak
MCV score at a p of 5. The KNN model in Fig. 6 shows promising results, as its
best-fit line accurately passes through/near the test data points.
The SVR model with the parameter combination of a C of 1.0, gamma of 0.1, and
an RBF kernel attained the highest MCV score of 0.968532. These parameters were
deemed optimal to develop the SVR model. As shown in Fig. 7, the RBF-SVR models
tend to have superior MCV scores, implying that the poorest performing RBF-SVR
model was better than the highest performing POLY-SVR model. The developed
RBF-SVR model is presented in Fig. 8, however, some of its predictions tend to
slightly deviate from the test data points.
The RF model with a maximum tree depth of 9, minimum samples on a leaf node as
1, and the number of trees as 10 attained the highest MCV score of 0.997515. The
relationship among the parameters and their corresponding MCV scores during the
grid-search with 10-fold cross-validation process is shown in Fig. 9. The RF model
configured with the ideal parameter arguments is shown in Fig. 10. Due to brevity,
the 2nd tree at a maximum depth of 2 is presented.
After the grid-search with 10-fold cross-validation procedure, the XGBoost model
with the parameter combination of a “dart” booster, 0.2 learning rate, maximum tree
depth of 3 and 500 trees yielded the highest MCV score of 0.997011. As shown
in Fig. 11, irrespective of the kind of booster, higher MCV scores was positively
associated with increasing learning rates in this study. The ideal parameters were used
to develop the XGBoost model (Fig. 12). The variables X0 to X4 in the AdaBoost,
RF, and DT models correspond to f0 to f4 in the XGBoost model. The scores on the
leaf nodes are also known as weights.
262 E. B. Boateng et al.
The observed variance that each validated model explained was evaluated as shown
in Fig. 13. The RF model explained almost 100% (99.88%) of the observed variance.
In theory, a model explaining 100% of the variance would always have the actual
values equal to the predicted values. Though the SVR model had the lowest R2 in
this study, its capacity to account for 97.67% of the observed variance only leaves
less than 3% unaccounted-for.
The actual and predicted building-related CO2 emissions from the six ML models
on the test sample were plotted (Fig. 14). It can be observed that predictions from
Predicting Building-Related Carbon Emissions … 263
the SVR slightly deviates from the actual building-related CO2 emissions while
that of the RF model tend to nearly-perfectly lay on the actual emissions. Further
metrics on errors are presented in Table 2, where the RF model shows superior
accuracy, closely followed by the KNN model. Time considerations are essential in
industrial and problem-solving scenarios, as such, the elapsed time taking by the six
algorithms in generating 1000 models each during parameter optimisation is further
evaluated. The DT model outperformed all other models in generating 1000 models
in just 2.2 s, followed by the KNN model in 3.2 s. Based on these results, it can be
concluded that the RF algorithm is the best performing ML algorithm in accurately
predicting building-related CO2 emissions, whereas the best algorithm in terms of
time efficiency is the DT algorithm. The KNN model is highly recommended when
practitioners want to have accurate predictions in a timely manner.
264 E. B. Boateng et al.
4 Conclusions
References
1. Abidoye, R.B., Chan, A.P.: Improving property valuation accuracy: a comparison of hedonic
pricing model and artificial neural network. Pacific Rim Prop. Res. J. 24(1), 71–83 (2018)
2. Acheampong, A.O., Boateng, E.B.: Modelling carbon emission intensity: application of arti-
ficial neural network. J. Clean. Prod. 225, 833–856 (2019). https://doi.org/10.1016/j.jclepro.
2019.03.352
3. Ahmad, M.W., Mourshed, M., Rezgui, Y.: Trees vs Neurons: comparison between random
forest and ANN for high-resolution prediction of building energy consumption. Energy Build.
147, 77–89 (2017)
4. Awad, M., Khanna, R.: Support vector regression. In: Efficient Learning Machines: Theories,
Concepts, and Applications for Engineers and System Designers, pp. 67–80. Apress, Berkeley,
CA (2015). https://doi.org/10.1007/978-1-4302-5990-9_4
5. Bannor, B.E., Acheampong, A.O.: Deploying artificial neural networks for modeling energy
demand: international evidence. Int. J. Energy Sect. Manag. ahead-of-print (ahead-of-print)
(2019). https://doi.org/10.1108/ijesm-06-2019-0008
6. Boateng, E.B., Pillay, M., Davis, P.: Predicting the level of safety performance using an artificial
neural network. In: Ahram T, Karwowski W, Taiar R (eds) Human Systems Engineering and
Design. Human Systems Engineering and Design, vol. 876, pp. 705–710. Springer International
Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-02053-8
7. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
8. Buuren, S.V., Groothuis-Oudshoorn, K.: mice: Multivariate imputation by chained equations
in R. J. Stat. Softw. 1–68 (2010)
9. Chen, H., Lee, W., Wang, X.: Energy assessment of office buildings in China using China
building energy codes and LEED 2.2. Energy Build. 86, 514–524 (2015)
10. Chen, T., Guestrin, C.: XGBoost: A scalable tree boosting system. In: Association for
Computing Machinery, pp. 785–794 (2016). https://doi.org/10.1145/2939672.2939785
11. Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM,
pp. 785–794 (2016)
12. Drucker, H., Burges, C.J., Kaufman, L., Smola, A.J., Vapnik, V.: Support vector regression
machines. In: Advances in Neural Information Processing Systems, pp. 155–161 (1997)
266 E. B. Boateng et al.
1 Introduction
Breast cancer is a common and dangerous disease. Survival of patients with this
diagnosis depends on the accuracy of early diagnosis. A combination of methods
for detecting early signs of cancer and artificial intelligence, in particular, statistical
machine learning, provides a solution to this problem. A wide review of works on
this topic is given in [8].
Now, fractal analysis is one the most promising field of investigation of cell hetero-
geneity in healthy and tumor tissues. Fractal dimension (FD) is a power diagnostic
tool used for defining heterogeneity of cells of complex endometrial hyperplasia and
well-differentiated endometrioid carcinoma [4]. Also, it is an independent prognostic
factor for survival in melanoma [3], leucemia [1, 14], and other diseases [12, 17, 18].
The nuclear patterns of human breast cancer cells attracted a special attention [6,
11, 19–21, 26, 29]. These investigations have demonstrated the significant potential
role of fractal analysis in assessing the morphological information. Nevertheless, all
these investigations were aimed only on tumor cells but not cells that are distant from
the tumor.
The effect of malignancy-associated changes (MAC) in cells distant from a tumor
was considered in numerous papers, see for example [2, 5, 31, 32]. The authors of
these papers have demonstrated that the analysis of MAC in buccal epithelium is one
© The Editor(s) (if applicable) and The Author(s), under exclusive license 267
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_12
268 A. V. Andreichuk et al.
of the perspective noninvasive methods for the effective screening of cancer. Such
methods can be divided in two groups: methods involving the analysis of MAC in
non-tumor cells located near a tumor [31], Us-Krasovec [32] and methods involving
the analysis of MAC in non-tumor cells located far from a tumor, in particular, in
buccal epithelium (oral mucosa) [2, 5].
As far as now there are no any results on the fractal properties of nuclear patterns
of buccal epitheliocytes in tumor and healthy conditions, the aim of the paper is to
present an AI system that estimates the distinguishes between the distributions of
the fractal dimension of chromatin in feulgen-stained nuclei in buccal epithelium of
patients with breast cancer, fibroadenomatosis and healthy people and implement a
screening test for breast cancer.
2 Input Data
The study was conducted on samples from 130 patients: 68 patients with breast
cancer, 33 patients with fibroadenomatosis and 29 healthy individuals. Every sample
contains in average 52 nuclei of buccal epithelium. The input data set for the study
consists of 20,256 images of interphase nuclei of buccal epithelium (6752 nuclei
were scanned without filter, through a yellow filter and through a violet filter). Each
image consists of three channels: red, green, blue, as well as gray halftones.
At the first step, we had obtained the scrapes from various depths of the spinous
layer (conventionally they were denoted as median and deep) after gargling and
taking down of the superficial cell layer of buccal mucous. The smears were dried
out under the room temperature and fixed during 30 min in the Nikiforov mixture
and we made Feulgen reaction with the cold hydrolysis in 5 n. HCl during 15 min
under the temperature t = 21–22 °C. The DNA content stained by Feulgen was
estimated using the Olympus computer analyzer, consisting of the Olympus BX
microscope, Camedia C-5050 digital zoom camera and a computer. We investigated
52 cells in average in every preparation. The DNA-fuchsine content in the nuclei
of the epitheliocytes was defined as a product of the optical density on area. Thus
under investigation of the interphase nucleus we obtained a scanogram of the DNA
distribution which is rectangular table (matrix) 160 × 160 pixels.
According to the results of the diagnosis, the patient may be assigned to one of
two groups—either breast cancer patients or non-breast cancer patients (healthy and
patients with fibroadenomatosis).
For each patient, we receive a scrape from the oral mucosa, which is used to
distinguish between 23 and 81 cells. On the basis of cytospectrophotometry data,
Artificial Intelligence System for Breast Cancer Screening … 269
we obtain a DNA scanogram of the inter-phase nuclei of these cells in two filters
(yellow and violet), as well as without it.
For each image, there are three channels—red, green, blue—and a gray version.
We perform pre-processing of images for binarization and denoising. For each
filter/channel pair, we use a special diagnostic test. The blue channel with the yellow
filter gives dark grey verging on black, so such pairs were not considered.
Microscopic images in pure form are usually unacceptable for analysis due to defects,
digital noise caused by the need to increase the photo sensitivity of the material, as
well as foreign objects. For further analysis of the incoming image pre-processing
and segmentation are required. To reduce the noise level of the microscope images,
a median filter was used, one of the types of digital filters described in [23], which
is widely used in digital signal and image processing.
The next step is processing is separating the background pixels from the pixels of
the objects in the image (cell nuclei or third-party objects). The image is binarized,
that is, the background pixels are white and the nucleus pixels are black. To achieve
this goal, the Otsu algorithm [22] is appropriate.
270 A. V. Andreichuk et al.
The binarized image is getting rid of residual noise of salt and pepper type (sepa-
rate small groups of white or black dots, respectively). The binary morphological
operations [28] were used, namely: opening to get rid of black points, and then
closing to fill white points.
Then we calculate the fractal dimension for pre-processed images. Thus, a set of
fractal characteristics will be constructed for each patient. As a measure of fractal
dimension we use the Minkowski dimension computed by the modified box-counting
algorithm [10].
Each patient has a set of fractal characteristics, the size of which is not constant
(each cell number is different). To compare the set of characteristics of different
patients, it is necessary to use the measure of proximity between samples, and for
this purpose p-statistics is used [9].
For classification of patients the closest neighbor method was used (to classify an
element x, we select from the training sample an element closest to x and relate x to
the class of this element).
Let us consider these stages in detail.
Images obtained directly from the microscope are often subject to noise and
defects. They are caused by the increased photo sensitivity of used photographic
materials. This means that raw images are not usable and require further processing,
otherwise there is a high risk of incorrect results. Therefore, for further analysis, we
will apply the already processed image sets.
The blurring process can be seen as reducing the sharpness of the image. Blurring
makes the image details less clear and often used for smoothing.
Images that are perceived as too sharp can be softened by using various blurring
methods and levels of intensity. Often, images are smoothed out to remove or reduce
noise on the image. When selecting contours, the best results are often achieved with
preceded noise reduction.
Consider the effect of different blur filters on the image of the cell nucleus (Fig. 1).
We prefer the median filter [23] because it is a widely used digital signal and image
processing for noise reduction and well-preserving image boundaries. The median
filter is classified as a nonlinear filter. Unlike many other image blurring methods, the
implementation of this filter does not involve a convolution or a predefined kernel.
Fig. 1 Initial image, image after Gaussian blurring with the kernel 5 × 5, image after medial filtering
with the kernel 5 × 5, and image after “moving blurring” with the kernel 5 × 5, respectively
Artificial Intelligence System for Breast Cancer Screening … 271
Pre-processing algorithm. For each pixel of the input image, do the following:
1. Add to the array all pixels of the image that are in the window with the center in
the current pixel.
2. Sort this array in ascending order.
3. The value in the middle of the sorted array of neighbors is a median of window
centered in the current pixel. Replace all pixel values of the image with the value
of the median.
To separate nucleus from the background on the image we use the Otsu method
[22] that is intended for automatic computation of threshold value and construction
of binary images from grey-level images where the pixel values vary from 0 to 255.
Thus, we have the range of pixels of an image in 256 gray levels. Let us
denote the
255
number of pixels at level l by n l and the total number of pixels by N = i=0 nl =
160 × 160 = 25,600. Compute a probability distribution of pixel values:
nl
255
pl = , pl ≥ 0, pi = 1. (1)
N l=0
The Otsu method separates pixels into two classes by minimizing the within-class
variance at the threshold T :
As = { a + s|a ∈ A} ∀s ∈ R 2 . (4)
Artificial Intelligence System for Breast Cancer Screening … 273
The transfer can be determined by an ordered pair of numbers (x, y), where x is
the number of pixels offset along the axis X, and y is the offset along the axis y.
For two sets A and B from space, the erosion (narrowing) of the set by the structural
element is defined as:
AB = z ∈ Z 2 Bz ⊆ A . (5)
In other words, the erosion of the set A by a structural element B is such a geometric
location of points for all such positions of points of the center z, at which the set B
is completely contained in A.
Dilation (expansion) of the set is defined as:
A ⊕ B = z ∈ Z 2 Bs z
∩A=0 . (6)
In this case, the dilation of a set A by a structural element B is the set of all such
displacements z, in which the sets A and B coincide with at least one element.
The dilation is a commutative function, meaning the following expression:
A⊕B = B⊕ A= Ba . (7)
a∈A
A B = (AB) ⊕ B. (8)
A B = (AB) ⊕ B. (9)
As a result of the closing operation, the contour segments are smoothed, but, unlike
the opening, in general, small breaks and long valleys of small width are filled, as
well as small holes are opened and the gaps of the contour are filled.
Salt and Pepper Noise Reduction. The “salt and pepper” type noise erases black
dots called “pepper” and fills the holes in the image called “salt”. First, open A by
B, it will remove all the black points (pepper), then close A by B, and all the holes
will be filled (salt).
We will use an octagon-type structural element with R = 2:
274 A. V. Andreichuk et al.
B = {(–2, –1), (–2, 0), (–2, 1), (–1, –2), (1, –2), (0, –2), (–1, –1), (–1, 0), (–1, 1),
(0, –1), (0, 0), (0, 1), (1, –1), (1, 0), (1, 1), (2, –1), (2, 0), (2, 1), (–1, 2), (1, 2), (0, 2)}
Let’s look at an example of the effect of “salt and pepper” denoising with an
octagon structural element on the image of a cell nucleus (Fig. 3).
Contour detection. Roberts Operator. For further calculations, it is necessary to
select the contours of the image, because on the basis of them the fractal dimension
will be calculated. To do this, we use the Roberts operator, a discrete differential
operator used in image processing to distinguish boundaries.
Roberts [25] proposed the following equations:
√ 2 2
yi j = xi j , z i j = yi j − yi+1, j+1 + yi+1 j − yi, j+1 , (10)
where x is the value of the pixel intensity of the image, z is the approximated
derivative, i, j are the coordinates of the pixel.
The resulting image highlights the changes in intensity in diagonal direction. The
transformation of each pixel by the Roberts cross-operator can tell a derivative image
along a non-zero diagonal, and the combination of these transformed images can also
be considered as a gradient from the top two pixels to the bottom two.
The difference vector module can also be calculated in the Manhattan metric.
The Roberts operator uses the module of this total vector, which shows the largest
difference between the four points covered. The direction of this vector corresponds
to the direction of the largest gradient between the points.
1 0
Convolution algorithm. We convolute the image with the kernels and
0 −1
0 1
. If I (x, y) is the point of the initial image, G x (x, y) is the point of the
−1 0
Artificial Intelligence System for Breast Cancer Screening … 275
image convoluted using the first kernel, and G x (x, y) is the point of the image
convoluted using the first kernel, then the gradient can be defined as:
2
∇ I (x, y) = G(x, y) = (G x (x, y))2 + G y (x, y) . (11)
Let’s look at an example of the action of Roberts operator on the image of the cell
nucleus (Fig. 4).
Fractal dimension. In a broad sense, a fractal means a figure whose small parts are
at random magnification similar to itself. Let Ω be a finite set in metric space X.
Definition Let ε > 0. A set of not more than a counted family of subsets {ωi }i∈I of
space X is called an ε-coverage of the set Ω if the following conditions holds:
1. Ω ⊂ ωi ,
i∈I
2. ∀i ∈ I diam ωi < ε.
Let α > 0 and Θ = {ωi }i∈I be a coverage of the set Ω. Let’s define the following
function, which in some sense determines the size of this coverage:
Fα (Θ) = (diam ωi )α . (13)
i∈I
Here the infimum is taken over all the coverages of the set Ω. Obviously, the
function Mαε (Ω) does not decrease with decreasing ε, since with decreasing ε we
also narrow down the set of possible coverages. Therefore, it has a finite or infinite
boundary at:
Definition The value Mα (Ω) is called the Hausdorff α-measure of the set Ω.
Mα (Ω) = 0 ∀α > α0 ,
Mα (Ω) = +∞ ∀α < α0 .
Definition A set is called a fractal if its Hausdorff dimension strictly exceeds its
topological dimension.
The Minkowski dimension. Fractal sets usually have a complex geometric structure,
and its main property is self-similarity. The fractal dimension is the characteristic
that describes this property [33].
When measuring the fractal dimension of various natural and artificial objects,
we deal with problems associated with the fact that there are several definitions
of fractal dimension. The fundamental concept is the Hausdorff dimension, but its
computation is often a difficult task. Therefore, other dimensions are more commonly
used in practice, such as the Minkowski dimension, which is calculated using the
box-counting algorithm [27].
For an arbitrary positive δ some function Mδ (Ω) is calculated. If Mδ (Ω) ∝ δ −D ,
then the set Ω has a fractal dimension D. Thus,
ln(Mδ (Ω))
dim M Ω = D = lim , (15)
δ→0 − ln(δ)
Artificial Intelligence System for Breast Cancer Screening … 277
where the value Mδ (Ω) is equal to the number of n-dimensional cubes with sides δ
required to cover the set Ω.
Definition The number dim M Ω is called the Minkowski dimension of the set Ω.
Note that it may not always exist. The relation between the Minkowski dimension
and the Hausdorff dimension is expressed by the following theorem:
Fig. 5 Cell nucleus, its contour and the graph of linear regression for computation of the Minkowski
dimension of the nucleus
278 A. V. Andreichuk et al.
5 Similarity Measure
Definition The ends of these intervals are called asymptotic confidence limits.
Definition The value β is called an asymptotic significance level of the sequence
(ak , bk ).
Definition In particular, when all values p1 , p2 , . . . , pk , . . . are the same, i.e. pk =
p ∀k = 1, 2, . . ., the interval (ak , bk ) is called an asymptotic confidence interval for
value p, and the value β is called an asymptotic significance level of the interval
(ak , bk ).
Note If β = lim βk , then β can be considered as approximate value of the true
n→∞
significance level βk of the confidence intervals (ak , bk ): βk ≈ β.
Definition The value β is called an approximate significance level of the intervals
(ak , bk ).
Consider the following criterion for the test of hypothesis H about equality distri-
bution functions F1 (u) and F2 (u) of the general population G 1 and G 2 , respectively.
Let x = (x1 , . . . , xn ) ∈ G 1 and y = (y1 , . . . , ym ) ∈ G 1 , x(1) ≤ . . . ≤ x(n) ,
y(1) ≤ . . . ≤ y(m) , be order statistics. Suppose that G 1 = G 2 . Then [7]
j −i
pi(n)
j = p yk ∈ x (i) , x ( j) = . (17)
n+1
h i j m + g 2 2 ∓ g h i j (1 − h i j )m + g 2 4
pi(1,2) = . (18)
j
m + g2
Theorem [15] If (1) n = m; (2) 0 < lim pq(n) = p0 < 1 and (3) 0 < lim i
=
n→∞ n→∞ n+1
(n)
p ∗ < 1, then the asymptotic level β of the sequence of confidence intervals Ii j for
the probabilities pi(n)
j = p yk ∈ x (i) , x ( j) does not exceed 0.05.
Theorem [15] If in a strong random experiment E samples x = (x1 , . . . , xn ) ∈ G 1
and y = (y1 , . . . , ym ) ∈ G 2 have the same size, then the asymptotic significance
level of the interval I (n) = p (1) , p (2) constructed by the 3σ -rule when g = 3 with
the help of formulas [3], does not exceed 0.05.
6 Results
We call the group of breast cancer patients “positive” and those with fibroadeno-
matosis and healthy “negative”. So, the total number of positive examples is 68, and
the negative ones are 33 + 29 = 62.
We used a 1NN voting procedure among the pairs channel-filter and among the
filters separately. Also, we used one-fold cross-validation with several sizes of control
samples (5, 10, 20% and leave-one-out). The data were broken into p parts, so that p
− 1 were used for model evaluation and one for testing. This procedure was repeated
p times. The number p depends on the size of a control sample. As a similarity
measure, we used p-statistics.
The results of testing are provided in Tables 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10. These
tables contain the sensitivity, specificity and accuracy of the cross-validation with
different channels (red, green, blue, and grey) and filters (violet, yellow, and without
filters) depending on size of control set.
280 A. V. Andreichuk et al.
Table 1 Results of cross-validation on control samples (20% of the all samples) using 1NN voting
among the channels
N Pair filter/channel Sensitivity Specificity Accuracy
1 Blue 0.8889 0.5728 0.7309
2 Grey 0.8955 0.8730 0.8843
3 Green 0.8649 0.9286 0.8967
4 Red 0.6018 1.0000 0.8009
5 (Violet filter) blue 0.6029 0.5645 0.5837
6 (Violet filter) grey 0.9494 0.8310 0.8900
7 (Violet filter) green 0.8272 0.9796 0.9034
8 (Violet filter) red 0.6735 0.9375 0.8055
9 (Yellow filter) grey 0.8333 0.9423 0.8878
10 (Yellow filter) green 0.8250 0.9600 0.8925
11 (Yellow filter) red 0.8462 0.9615 0.9038
Mean 0.8008 0.8683 0.8345
Table 2 Results of cross-validation on control samples (10% of the all samples) using 1NN voting
among the channels
N Pair filter/channel Sensitivity Specificity Accuracy
1 Blue 0.9545 0.5648 0.7597
2 Grey 0.9286 0.9500 0.9393
3 Green 0.9296 0.9661 0.9478
4 Red 0.5913 1.0000 0.7956
5 (Violet filter) blue 0.5645 0.5147 0.5396
6 (Violet filter) grey 0.9688 0.9091 0.9389
7 (Violet filter) green 0.9067 1.0000 0.9533
8 (Violet filter) red 0.6733 1.0000 0.8366
9 (Yellow filter) grey 0.9178 0.9825 0.9501
10 (Yellow filter) green 0.8816 0.9815 0.9315
11 (Yellow filter) red 0.9315 1.0000 0.9658
Mean 0.8407 0.8976 0.8689
From Tables 1, 2, 3 and 4, we can see that the highest average and individual
(i.e. separately for filter/channel pairs) accuracy is achieved using the one-on-all
principle—when one patient in the control sample compares with all others. This is
natural, because in this way the comparison is made with as many training samples
as possible. The highest accuracy obtained was 99.28% for grey channel in yellow
and violet filters. This result was achieved by using the principle of one against all
(Table 4). Specificity is 100% in all acceptable results, meaning that cancer patients
Artificial Intelligence System for Breast Cancer Screening … 281
Table 3 Results of cross-validation on control samples (5% of the all samples) using 1NN voting
among the channels
N Pair filter/channel Sensitivity Specificity Accuracy
1 Blue 0.9545 0.5648 0.7597
2 Grey 0.9437 0.9830 0.9634
3 Green 0.9571 0.9833 0.9702
4 Red 0.5965 1.0000 0.7982
5 (Violet filter) blue 0.5500 0.5000 0.5250
6 (Violet filter) grey 0.9851 0.9683 0.9767
7 (Violet filter) green 0.9315 1.0000 0.9658
8 (Violet filter) red 0.6800 1.0000 0.8400
9 (Yellow filter) grey 0.9577 1.0000 0.9789
10 (Yellow filter) green 0.9189 1.0000 0.9595
11 (Yellow filter) red 0.9577 1.0000 0.9789
Mean 0.8575 0.9090 0.8833
Table 4 Results of cross-validation on control samples (leave one out) using 1NN voting among
the channels
N Pair filter/channel Sensitivity Specificity Accuracy
1 Blue 0.9525 0.5596 0.7560
2 Grey 0.9444 1.0000 0.9722
3 Green 0.9710 0.9836 0.9773
4 Red 0.5913 1.0000 0.7957
5 (Violet filter) blue 0.5500 0.5000 0.5250
6 (Violet filter) grey 0.9855 1.0000 0.9928
7 (Violet filter) green 0.9315 1.0000 0.9658
8 (Violet filter) red 0.6476 1.0000 0.8238
9 (Yellow filter) grey 0.9855 1.0000 0.9928
10 (Yellow filter) green 0.9315 1.0000 0.9658
11 (Yellow filter) red 0.9714 1.0000 0.9857
Mean 0.8602 0.9130 0.8866
Table 9 Results of 1NN voting among filters (leave-one-out) after medial filtering only
3 best performance Sensitivity Specificity Accuracy
Without filter 0.7303 0.9268 0.8286
Yellow filter 0.7500 0.9524 0.8512
Violet filter 0.6939 1.0000 0.8469
Table 10 Results of 1NN voting among channels (leave-one-out) after medial filtering only
N Pair filter/channel Sensitivity Specificity Accuracy
1 Blue 0.6129 0.5051 0.5590
2 Grey 0.8125 0.9400 0.8763
3 Green 0.6355 1.0000 0.8178
4 Red 0.6579 0.6667 0.6623
5 (Violet filter) blue 0.6951 0.7708 0.7330
6 (Violet filter) grey 0.6250 0.5541 0.5895
7 (Violet filter) green 0.8072 0.9787 0.8930
8 (Violet filter) red 0.6569 0.9643 0.8106
9 (Yellow filter) grey 0.7647 0.9333 0.8490
10 (Yellow filter) green 0.7363 0.9744 0.8553
11 (Yellow filter) red 0.7037 0.7755 0.7396
Mean 0.7007 0.8239 0.7623
Artificial Intelligence System for Breast Cancer Screening … 283
will not be accidentally classified into a group of non-cancer patients, which is very
important.
If you can use images that were taken with only one filter, we can see in Tables 5,
6, 7 and 8 that it is appropriate to take violet filter. At the same time, accuracy is
increased compared to the maximum individual accuracy everywhere, except for one
against all.
To discover the effect of median filtering we compared Tables 1 and 10 and 8
and 9 respectively. In Tables 9 and 10 demonstrating the results without morpho-
logical operations we see a significant decrease in accuracy both on average and
individually for all characteristics. Therefore, the use of morphological operations is
highly recommended for pre-processing the images used in cancer diagnosis. This
also means that the noise on the images that appeared during cytospectrophotometry
is of a type similar to that of salt and pepper.
7 Conclusion
References
1. Adam, R., Silva, R., Pereira, F., et al.: The fractal dimension of nuclear chromatin as a prognostic
factor in acute precursor B lymphoblastic leukemia. Cell Oncol. 28, 55–59 (2006)
2. Andrushkiw, R.I., Boroday, N.V., Klyushin, D.A., Petunin, YuI.: Computer-Aided Cytogenetic
Method of Cancer Diagnosis. Nova Publishers, New York (2007)
3. Bedin, V., et al.: Fractal dimension of chromatin is an independent prognostic factor for survival
in melanoma. BMC Cancer 10, 260 (2010)
4. Bikou, O., et al.: Fractal dimension as a diagnostic tool of complex endometrial hyperplasia
and well-differentiated endometrioid carcinoma. In Vivo 30, 681–690 (2016)
5. Boroday, N., Chekhun, V., Golubeva, E., Klyushin, D.: In vitro and in vivo densitometric
analysis of DNA content and chromatin texture in nuclei of tumor cells under the influence of
a nano composite and magnetic field. Adv. Cancer Res. Treat. 2016, 1–12 (2016)
6. Einstein, A., Wu, H., Sanchez, M., Gil, J.: Fractal characterization of chromatin appearance
for diagnosis in breast cytology. J. Pathol. 185, 366–381 (1998)
7. Hill, B.: Posterior distribution of percentiles: Bayes’ theorem for sampling from a population.
J. Am. Stat. Assoc. 63(322), 677–691 (1968)
284 A. V. Andreichuk et al.
8. Huanga, S., Yang, J., Fong, S., Zhaoa, Q.: Artificial intelligence in cancer diagnosis and prog-
nosis: opportunities and challenges. Cancer Lett. 471, 61–71 (2020). https://doi.org/10.1016/j.
canlet.2019.12.007
9. Klyushin, D., Petunin, Yu.: A nonparametric test for the equivalence of populations based on
a measure of proximity of samples. Ukrainian Math. J. 55(2), 181–198 (2003)
10. Li, J., Du, Q., Sun, C.: An improved box-counting method for image fractal dimension
estimation. Pattern Recognit. 42(11), 2460–2469 (2009)
11. Losa, G., Castelli, C.: Nuclear patterns of human breast cancer cells during apoptosis: charac-
terization by fractal dimension and (GLCM) co-occurrence matrix statistics. Cell Tissue Res.
322, 257–267 (2005)
12. Losa, G.: Fractals and their contribution to biology and medicine. Medicographia 34, 365–374
(2012)
13. Mandelbrot, B.: The Fractal Geometry of Nature. W. H. Freeman and Co., San Francisco (1982)
14. Mashiah, A., Wolach, O., Sandbank, J., et al.: Lymphoma and leukemia cells possess fractal
dimensions that correlate with their interpretation in terms of fractal biological features. Acta
Haematol. 119, 142–150 (2008)
15. Matveichuk, S., Petunin, Yu.: A generalization of the Bernoulli model occurring in order
statistics. I. Ukrainian Math. J. Ukr. Math. J. 42, 459–466 (1990). https://doi.org/10.1007/bf0
1071335
16. McKinney, S.M., Sieniek, M., Godbole, V., et al.: International evaluation of an AI system
for breast cancer screening. Nature 577, 89–94 (2020). https://doi.org/10.1038/s41586-019-
1799-6
17. Metze, K.: Fractal dimension of chromatin and cancer prognosis. Epigenomics 2(5), 601–604
(2010)
18. Metze, K.: Fractal dimension of chromatin: potential molecular diagnostic applications for
cancer prognosis. Expert Rev. Mol. Diagn. 13(7), 719–735 (2013)
19. Muniandy, S., Stanlas, J.: Modelling of chromatin morphologies in breast cancer cells under-
going apoptosis using generalized Cauchy field. Comput. Med. Imaging Graph. 32, 631–637
(2008)
20. Nikolaou, N., Papamarkos, N.: Color image retrieval using a fractal signature extraction
technique. Eng. Appl. Artif. Intell. 15(1), 81–96 (2002)
21. Ohri, S., Dey, P., Nijhawan, R.: Fractal dimension in aspiration cytology smears of breast and
cervical lesions. Anal. Quant. Cytol. Histol. 26, 109–112 (2004)
22. Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11, 23–27
(1975)
23. Perreault, S., Hébert, P.: Median filtering in constant time. IEEE Trans. Image Process. 16(9),
2389–2394 (2007)
24. Pires, A.: Interval estimators for a binomial proportion: comparison of twenty methods.
REVSTAT Stat. J. 6, 165–197 (2008)
25. Roberts, L.: Machine perception of 3-D solids. In: Tippett, J.T., et al. (eds.) Optical and Electro-
optical Information Processing. MIT Press, Cambridge (1965)
26. Russo, J., Linch, H., Russo, J.: Mammary gland architecture as a determining factor in the
susceptibility of the human breast to cancer. Breast J. 7, 278–291 (2001)
27. Sarker, N., Chaudhuri, B.B.: An efficient differential box-counting approach to compute fractal
dimension of image. IEEE Trans. Syst. Man Cybern. 24, 115–120 (1994)
28. Serra, J.: Image Analysis and Mathematical Morphology. Academic Press Inc., London (1982)
29. Sharifi-Salamatian, V., Pesquet-Popescu, B., Simony-Lafontaine, J., Rigaut, J.P.: Index for
spatial heterogeneity in breast cancer. J Microsc. 216(2), 110–122 (2004)
30. Schroeder, M.: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. W. H. Freeman,
New York (1991)
31. Susnik, B., Worth, A., LeRiche, J., Palcic, B.: Malignancy-associated changes in the breast:
changes in chromatin distribution in epithelial cells in normal-appearing tissue adjacent to
carcinoma. Anal. Quant. Cytol. Histol. 17(1), 62–68 (1995)
Artificial Intelligence System for Breast Cancer Screening … 285
32. Us-Krasovec, M., Erzen, J., Zganec, M., et al.: Malignancy associated changes in epithelial
cells of buccal mucosa: a potential cancer detection test. Anal. Quant. Cytol. Histol. 27(5),
254–262 (2005)
33. Voss, R.F.: Random fractal forgeries. In: Earnshaw, R. (ed.) Fundamental Algorithms in
Computer Graphics, pp. 805–835. Springer, Berlin (1985)
The Role of Artificial Intelligence
in Company’s Decision Making
1 Introduction
Decision making is an act that is often performed by humans [1]. Indeed, people
are faced with situations in which decision-making is necessary. Often decisions do
not require complex thinking processes because generally, it only takes a heuristic
experience or sometimes even intuition to decide. But in other situations, decision
making is more difficult. This is due to several factors [2]:
• Structural complexity of decisions;
• The impact of the decision taken can be significant, it can be economic, political,
organizational, environmental, etc.;
• The need for speed in decision-making, this is the case of medical or military
emergencies or even the diagnosis of industrial installations.
Computerized Decision Support Systems (CDSS) have been developed to assist
the decision-maker in his task. Emery et al. [3], have introduced the notion of Manage-
ment Decision Systems. CDSS also designate Interactive Decision Support Systems,
since interaction with the decision-maker occupies a prominent place. The purpose
of the CDSS is for assisting to the decision-maker rather than to replace it. Several
researchers have taken an interest in this field [4–6] which has led to CDSS which are
© The Editor(s) (if applicable) and The Author(s), under exclusive license 287
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_13
288 D. Saba et al.
first interactive and then intelligent. Then, the CDSS is dedicated to semi-structured
tasks, the interaction is of considerable contribution because it is based on the involve-
ment of humans in conjunction with the processing of information by the computer,
unlike structured situations where solutions and procedures are fully automatable.
To make a system intelligent, one or more knowledge bases were introduced with
inference engines [7]. Intelligence can relate to reasoning in decision-making, or in
the interaction between man and machine [8]. As a result, the machines became more
efficient and therefore could support cooperation with the decision-maker. A system
is said to be cooperative if it has additional capacities to cooperate with its user [9].
Cooperation, in this case, consists of sharing the tasks to be carried out between the
user and the system [10]. In addition, certain situations require that the decision not
be taken by a single individual, but rather within the framework of consultation and
collective solicitation [11]. When the problem to be solved is divided into sub-tasks
which will be assigned separately to participants, decision-making, in this case, takes
on a so-called man-man cooperation aspect between the participants. However, this
will not prevent each decision-maker from being helped by an individual cooperative
CDSS (human/machine cooperation) in the task assigned to him. When, a priori, there
is no division of tasks between the decision-makers who collectively participate in
each step of the decision-making process, and, when the latter will lead the group
towards a common or consensual decision, the decision, in this case, is taken in
a context of collaboration between decision-makers [12]. The collective decision-
making process involves a group of people who may be in one place or spread
across several places, at the same time or at different times. The current trend is that
decision-makers are geographically distant. Indeed, globalization has brought about
a change in organizational structures as well as in the attitudes of managers who find
them facing new challenges. This new situation is characterized by the following
points:
• The evolution of information technology which has enabled geographically distant
individuals to share data and ideas;
• The distribution of organizations across the planet;
• Stronger competition;
• The opening of the international market.
Consequently, decision-making requires that decision-making tools support
collective decision-making processes where group members will be involved in a
cooperative or collaborative decision.
This work fits into the context of collaborative decision support systems. The latter
is considered in two dimensions: the collective dimension and the individual dimen-
sion. The collective dimension concerns the collaborative aspect because it consists
of providing collective decision support where each participant is involved in each
step of the decision-making process. There is no sharing of tasks between participants
and the group of decision-makers who engage in decision making may be geograph-
ically distant [13]. They follow a collective decision-making process which is guided
by a facilitator. Such a system is characterized by a strong interactional component
that requires rigorous coordination between the facilitator and the decision-makers.
The Role of Artificial Intelligence in Company’s Decision Making 289
However, the individual dimension concerns the cooperative aspect of the decision
because it consists of providing decision support to a decision-maker who is expert
in a given field and who is proposed to solve a particular problem. Problem-solving
follows a pre-established decision support process which is based on the breakdown
of the problem into tasks and sub-tasks. The cooperation, in this case, is of the
man/machine type. Indeed, the decision-maker has knowledge and skills, but the
machine can also be endowed with knowledge and skills which allow task sharing
between the two actors and thus lead to the implementation of cooperation.
In this work, AI technology is used as a basic tool for the decision process.
Indeed, AI offers many advantages for its design and development. They consist of
distributing complex treatments on entities and intelligent objects of less complexity.
The problem consists of finding a good breakdown of the treatments than in imple-
menting good coordination which can ensure the cooperation of the entities (objects)
whose common objective is to accomplish the execution of the process of assistance
to the collective decision.
The objective of this document is to provide an information platform for the
role of AI in decision support in companies. Hence, the remainder of this paper is
organized as follows. Section 2 presents the decision making. Section 3 explains
the models and decision-making process. Section 4 explains the Interactive Deci-
sion Support System. Section 5 details the Cooperative decision support systems
and group decision making. Section 6 clarifies the algorithms, software, and AI
data. Section 7 provides sensors and connected objects and AI hardware. Then, the
Machine Learning, the AI response in assisting decision making, in Sect. 8. Finally,
Sect. 9 concludes the paper.
2 Decision Making
Human beings are often faced with problems for which decision-making is necessary.
The latter is often taken based on our intuitions or according to our past experiences
[14]. However, this decision is not always easy to make because it can be a complex
problem for which the decision sought can have relatively significant consequences.
Therefore, a bad decision can be expensive and sometimes even fatal. For example,
when it comes to managing technological or natural risks, it becomes necessary to
take into account a large amount of data and knowledge of different natures and
qualities, and for this, managers have more and more use of computers to acquire
powerful tools for the decision. However, it becomes necessary to formalize these
kinds of problems to guarantee better decision-making while being effective. Effi-
ciency means making decisions quickly by exploring all the possibilities. In addition,
the criterion of cost of execution of the decision must also be taken into account in
the choice of the decision [15]. Also, it is inappropriate to adopt a trial-and-error
strategy to manage an organization and use decision support systems that assess the
situation and provide the various alternatives and their impacts [16].
290 D. Saba et al.
decision comes in a more random context in the sense that the manner of achieving
the objective pursued can go through different types of actions. These changes are
understandable because they only underline the changes in the productive system
[20]. The business environment has become more complex, more uncertain, and
decision-making is no longer the responsibility of a single individual but can be
shared among a large number of actors acting within the company. This increase in
the number of decision-makers also reflects the diversity of decisions that must be
taken in a company.
There are several definitions of the decision, the first definition is as follows:
“A decision is an action that is taken to face difficulty or respond to a change in
the environment, that is to say, to solve a problem that arises for the individual (or
organization)” [21]. A second definition is as follows: “A decision is the result of a
mental process which chooses one among several mutually exclusive alternatives”
[22]. A decision is also defined as “a choice between several alternatives”, or by
“the fact that it also concerns the process of selecting goals and alternatives” [23].
A decision is often seen as the fact of an isolated individual (“the decision-maker”)
exercising a choice between several possibilities of actions at a given time. In general,
the decision can be defined as the act of deciding after deliberation, and that the actor
exercises an important role. It is therefore not a simple act, but rather the outcome
of a whole decision process. Then, decision-making can appear in any context of
daily life, whether professional, sentimental, or familiar, etc. The process, in its
essence, solves the various challenges that must be overcome. When it comes to
making a decision, several factors are frustrated [1]. Ideally, use your reasoning
skills to be on the right track and lead to a new stage or, at least, to resolve an actual
or potential conflict. However, all decision-making must include great knowledge
of the problem. By analyzing and understanding it, it is then possible to find a
solution. Of course, when faced with simple questions, decision-making takes place
practically on its own and without any complex or profound reasoning. On the other
hand, in the face of decisions that are more transcendent for life, the process must
be thought through and treated. When a young person has to choose which studies
to pursue after high school, he must make a reasoned decision, since this choice will
have important consequences. In the business field, each transcendent decision for
a company involves extensive research and study, as well as collaboration between
multidisciplinary teams. Finally, the decision-making process is a complex process,
the study of which can be facilitated by reference to theoretical models. The model
of limited rationality or IMCC proposed by Herbert Simon has four phases [24]:
intelligence, modeling, choice, and control.
• Intelligence: the decision-maker identifies situations in his environment for which
he will have to make decisions.
• Modeling: the decision-maker identifies the information, the structures to have
possible solutions.
• The choice: from the evaluation of each solution, the decision-maker chooses the
best of them.
• Control: confirms the question made or questions it.
292 D. Saba et al.
If all the problems that managers had to solve were well defined, they could easily
find a solution by following a simple procedure. However, managers often face more
complex problems since they frequently have to assess several options, each with
advantages and disadvantages. Before even making a final decision, managers must
have a thorough understanding of all the conditions and events related to the problem
to be resolved. The conditions surrounding decision-making show almost unlimited
variations. However, it is generally possible to divide them according to three types
of situations: decisions taken in a state of certainty, those made in a state of risk and
those made in a state of uncertainty.
comprising maximum yield and minimum consumption of resources is not the daily
characteristic of all managers. Hence, the manager must be creative in choosing the
solution to try to obtain the maximum benefits while minimizing the costs related to
his decision. In addition to dealing with different categories of problems, managers
find themselves in different decision-making situations. Decisions can be routine,
adaptive, innovative, programmed and unscheduled (Table 2).
The policy and procedures explained in writing help individuals to make a choice
quickly, as they make detailed analysis unnecessary. However, since most everyday
• Normative models: in this type of model, there are three categories of normative
models, complete enumeration which seeks the best solution among a relatively
small set of alternatives. The main methods are tables, decision trees, and multi-
criteria analysis. Then, the models for optimization via algorithms, which seek
to find the best solution among a large or even infinite set of alternatives, using
a step-by-step improvement process. The main methods are linear programming,
integer linear programming, convex programming, and multi-objective program-
ming which is a variant of linear programming for several functions (criteria) to be
optimized simultaneously. Finally, for models for optimization using analytical
formulas, is the find the best solution in one step using an analytical formula.
• Descriptive models: give a satisfactory solution by exploring part of the solutions.
Among the descriptive models, the simulation concerns a technique that makes
it possible to carry out decision making by observing the characteristics of a
given system under different configurations. This technique makes it possible to
solve by choosing the best among the alternatives evaluated. The second model
concerns prediction, which makes it possible to predict the consequences of the
different alternatives according to prediction models. Markov models are among
the best-known methods in this category. Prediction provides a fairly good solution
or a satisfactory solution. Then, heuristic models allow reaching a satisfactory
solution at lower cost by using heuristic programming techniques and knowledge-
based systems. Rather, these methods are used for complex and poorly structured
problems where finding solutions can result in high cost and time.
The decision process consists of determining the steps to be followed by a
decision-maker to arrive at opting for a decision as a solution to a problem posed. In
1960 Simon has proposed the IDC model breaking down the decision-making process
into three stages (intelligence—design—choice). In 1977, this same researcher
revised his model by adding a fourth step (review or evaluation) [27]. The latter
can be seen as a process evaluation step to validate or not the decision to be applied
(Fig. 1). This model remains to this day a reference for decision modeling.
• Study of the existing (discover the problem and collect the data): this is a
phase of identifying the problem. It involves identifying the objectives or goals of
the decision-maker and defining the problem to be solved. For this, it is necessary
to seek the relevant information according to the concerns of the decision-maker.
• Design (formulation and modeling of the problem): this is a modeling phase
proper. The decision-maker builds solutions and imagines scenarios. This phase
leads to different possible paths to solving the problem.
• Calculations (display of results): the models formulated in the previous step are
used to perform the calculations associated with the resolution of the problem
addressed. The display of the results is done through output devices (screens).
• Choice (choose from the alternatives): this is a phase of selecting a particular
mode of action, that is to say making a decision.
The Role of Artificial Intelligence in Company’s Decision Making 297
• Evaluation (check that the solution complies with expectations): this phase
makes it possible to evaluate the solution was chosen (the decision is taken). It
can lead to backtracking to one of the three previous phases or, on the contrary,
to the validation of the solution.
Decisions are not made after posing the problem and collecting all the information,
but gradually during a long process of action and planning [28]. Indeed, for any
decision-making process, we develop a decision model. The latter generally consists
of five elements:
298 D. Saba et al.
on the central computer. Undoubtedly the challenge for companies, in the context
of their decision-making support projects, is the processing of their data to extract
their value. And since these concepts and systems have been developed for more
than 40 years, the excuse for not carrying out these projects cannot be in technology,
as Michael Scott Morton declares, “The general unresolved issue I see is one of
understanding the management of change. Without a better understanding of this, it
is hard to implement and learn from DSS applications. As an engineer trained in the
technology it took me a while to understand that the hard problems lie in the ‘soft’
domains of management and human behavior, not in the hardware and software”.
In general, an IDSS is a computer system that must help a decision-maker to
follow a decision support process. It is thanks to the interactivity which is the basis
of the cooperation of the two partners (the system and the user) that the IDSS will
fulfill the role for which it was designed. These are systems that use computers to:
• Assist decision-makers during the decision-making process concerning semi-
structured tasks,
• Help rather than replace the judgment of decision-makers,
• Improve the quality of decision-making rather than efficiency.
In addition, various functions are assigned to the IDSS:
• They should mainly help with poorly structured or poorly structured problems by
connecting human judgments and calculated information.
• They must have a simple and user-friendly interface to avoid the user being lost
in front of the complexity of the system.
• They must provide help for different categories of users or different groups of
users.
• They must support interdependent or sequential processes.
• They must be adaptive over time. The decision-maker must be able to withstand
rapidly changing conditions and adapt the IDSS to deal with new situations. An
IDSS must be flexible enough for the decision-maker to add, destroy, combine,
change and rearrange the variables in the decision process as well as the various
calculations, thus providing a rapid response to unexpected situations.
• They must leave control of all stages of the decision-making process to the
decision-maker so that the decision-maker can question the recommendations
made by the IDSS at any time. An IDSS should help the decision-maker and not
replace him.
• The most advanced IDSS uses a knowledge-based system that provides effective
and efficient assistance in particular in problems requiring expertise.
• They must allow the heuristic search.
• They must not be black box type tools. The functioning of an IDSS must be done
in such a way that the decision-maker understands and accepts it.
• They must use templates. Modeling makes it possible to experiment with different
strategies under different conditions. These experiences can provide new insights
and learning.
300 D. Saba et al.
Since there are several definitions for IDSS, there are also several architectures.
Previously, researchers defined IDSS in terms of data and models for solving poorly
structured problems. They proposed the following IDSS architecture (Fig. 2).
This architecture is composed of a Human/Machine interface, a Database Manage-
ment System (DMS) including the Database as well as a Manager System for Model
Bases (MSMB) including a Model Database. In addition, a system based on this
architecture has interactive capabilities that allow the user to be involved in solving
unscheduled, poorly structured, or semi-structured problems. In the same context,
a conceptual architecture is proposed, and a prototype system which validates the
technical feasibility of the proposed architecture is presented by Szu-Yuan Wang
et al. The relational data model is used in the proposed approach, and how its tabular
structure can be exploited to provide a unified logical view of data management and
model management [32]. In addition, the concept of the macro is used to extend
the conventional meaning of the model to include the composite requests that often
occur in a DSS environment. A primitive knowledge base of macros, data, and asso-
ciated dictionaries for both is included in the architecture. Organized in a three-level
hierarchy, the knowledge base consists of macros and system-wide, group, and indi-
vidual data. Also included in the architecture as a language system and modules to
monitor system performance, maintain data and macro security, and link the DSS to
other parts of the local computing environment.
In the development of enterprise applications, a critical success factor is to make
good architectural decisions. There are text templates and support for tools to capture
architectural decisions. One of the inhibitors on large-scale industrial projects is that
the capture of architectural decisions is considered as a retrospective documenta-
tion task and therefore undesirable which brings no advantage during the design
work. A major problem with such a retrospective approach is that the justification
for the decision is not available to decision-makers when they identify, make, and
apply decisions. Often, a large community of decision-makers, possibly distributed,
is involved in these three stages. Zimmermann et al. proposed a new conceptual
The Role of Artificial Intelligence in Company’s Decision Making 301
• The strategic IDSS: it presents managers with a periodic time series. It provides
the management committee with a shared and early assessment of the essential
indicators.
Finally, for the classification which is according to the conceptual level of the
system using the assistance mode as the criterion, there are four generic types of
decision support systems:
• A data-centric IDSS highlights access to and manipulation of a time series of data
internal to the organization and sometimes of external data;
• A model-oriented IDSS highlights access to and manipulation of simulation, opti-
mization, financial, and statistical model. A Model-Oriented IDSS uses data and
parameter provided by users to help decision-makers analyze a situation, but is
not necessarily data-centric;
• Knowledge-driven DSS providing the problem-solving expertise stored as facts,
rules or procedures;
• A document-oriented IDSS provides expertise in solving specialized problems,
stored as facts, rules, procedures or in similar structures;
• IDSS oriented communication supports more than a person working on a shared
task.
exploit the bases of formal rules and facts. Rule engines are now called BRMS
for Business Rules Management Systems and are often integrated into DMS for
Decision Management Systems. These systems can incorporate fuzzy logic to
handle imprecise facts and rules.
– Machine learning or automatic learning: is used to make forecasts, clas-
sification and automatic segmentation by exploiting generally multidimen-
sional data, such as a customer database or an Internet server log. Machine
learning is based on a probabilistic approach. Machine learning tools are used
to exploit large volumes of corporate data, in other words “big data”. Machine
learning can rely on simple neural networks for complex tasks involving
multidimensional data.
– Neural networks: are a sub-domain of machine learning for performing iden-
tical tasks, but when the probabilistic space managed is more complex. This
elementary bio mimicry is exploited when the dimension of the problem to
be managed is reasonable. Otherwise, we quickly move on to deep learning,
especially for image and language processing.
– Deep learning: allows you to go further than machine learning to recognize
complex objects such as images, handwriting, speech and language. Deep
learning uses multilayer neural networks, knowing that there are many varia-
tions. However, this is not the solution to all of the problems that AI seeks to
address. Deep learning can also generate content or improve existing content,
such as automatically coloring black and white images. Deep means deep. But
deep learning does not think. It is deep because it taps into neural networks
with many layers of filters. That’s all! Deep learning, however, is not exclu-
sively dedicated to image and language processing. It can be used in other
complex environments such as genomics. It is also used in so-called multimodal
approaches which integrate different senses such as vision and language.
– Agent networks: or multi-agent systems are a little-known area that covers the
science of orchestrating the technical building blocks of AI to create complete
solutions. A Chatbot like a robot is always a motley assembly of the bricks
below with rules engines, machine learning and several deep learning tech-
niques [43]. Agent networks are both conceptual objects and tools for assem-
bling AI software bricks. The principle of an agent is that it is conceptually
autonomous, with inputs and outputs. The assembly of agents in multi-agent
networks is a “macro” version of the creation of AI solutions.
The common point of machine learning and deep learning is to use data for the
training of probabilistic models. Along with algorithms/software and hardware, data
is the third key component of most AI today. There are generally three types of data
to train a machine learning and deep learning system: training data, test data and
production data.
In machine and supervised deep learning, training and test data contains their
label, that is, the information that must be generated by the system to be trained.
It is a test game with a good distribution of the possible space of the application.
Firstly, it is arbitrarily divided into two subsets, one for training and the other for
308 D. Saba et al.
qualification tests of the trained neural network which determine a recognition error
rate. In general, the share of the tagged base dedicated to training is greater than
that which is dedicated to tests and in a 3/4 and 1/4 ratio. However, training and test
data is essential for the vast majority of machine learning-based AI systems, whether
for supervised or unsupervised learning. Then, reinforcement learning uses a smaller
amount of data but is generally based on models already trained beforehand, followed
by the addition of reinforcement data which are used for reinforcement learning.
Finally, these are new training datasets that allow you to adjust that of an already
trained neural network.
• Training data: these are the data sets that will be used to train a machine learning
or deep learning model to adjust the parameters. In the case of image recognition,
it will be a database of images with their corresponding tags that describe their
content. The larger the base, the better the system will train, but the longer it
will take. The image training bases have a size which depends on the diversity
of the objects to be detected. In addition, in medical imaging, training bases for
specialized pathologies can be satisfied with a few hundred thousand images to
detect a few tens or hundreds of pathologies. At the other extreme of complexity,
Google Search’s image training base relies on hundreds of millions of images
and allows the detection of more than 20,000 different objects. However, training
a 50,000 image system takes about a quarter of an hour in cloud resources, but
it depends on the resources allocated on that side. When you go to hundreds of
millions of images, it will take thousands of servers and up to several weeks for
training. In practice, the training games for deep learning solutions are limited in
size by the computing power required. You can also perform incremental training
as you add data using neural network transfer techniques. It is necessary to have
quality training data, which often requires a great deal of filtering, cleaning and
reduplication before data ingestion, a task already existing in the context of big
data applications.
• Test data: this is the data, also tagged, which will be used to check the quality
of the training of a system. These data must have a statistical distribution close
to training data, in the sense that they must be well representative of the diversity
of data that we find in the training base and that we will have in production data.
However, test data is a subset of a starter set, part of which is used for training and
another, more limited part, which is used for testing. They will be injected into
the trained system and the resulting tags will be compared with the base tags. This
will identify the system error rate. We will go to the next step when the error rate
is considered acceptable for the production of the solution. Finally, the level of
acceptable error rate depends on the application. Its generally accepted maximum
is the error rate of human recognition. But as we are generally more demanding
with machines, the rate truly accepted is much lower than that of humans.
• Production data: this is untagged data that will feed the system when it is used in
production to forecast missing tags. While training data is normally anonymized
for system training, production data can be nominative as well as the associated
forecast generated by the solution.
The Role of Artificial Intelligence in Company’s Decision Making 309
• Reinforcement data: the use of this expression to describe the data used for
reinforcement learning. In a Chatbot, this will be, for example, the reactivity
data of users to Chatbots’ responses allowing them to identify which are the most
appropriate. In a way, these are results of A/B testing performed on the behavior of
AI-based agents. Anything that can be captured from the real-world reaction to the
actions of an AI-based agent will potentially adjust behavior through retraining.
However, reinforcement learning is ultimately a kind of incremental supervised
learning because it is used to evolve already trained systems in small impressionist
touches.
The data that feeds the AI systems comes from inside and/or outside the company.
They come from all kinds of various sensors: connected objects, from the simplest
(connected thermometer) to the most sophisticated (machine tool, smartphone,
personal computer). As with traditional big data applications, data sources must
be reliable and data well extracted and prepared before being injected into machine-
based systems like deep learning. However, the most advanced solutions jointly
exploit external open data and cross it with data that only the company can control.
It’s a good way to create differentiated solutions.
Sensors and connected objects play a key role in many artificial intelligence appli-
cations. Microphones and cameras power speech recognition and artificial vision
systems. Smartphones and Internet access tools in general are creating data drums on
user behavior. The smart city and autonomous vehicles are also powered by all kinds
of sensors. However, one of the ways to get closer to and even go beyond humans
is to multiply sensory sensors. The main difference between humans and machines
is the range of these sensors. For humans, the range is immediate and concerns only
its surroundings. For machines, it may be distant and global. We see around us, we
feel the temperature, we can touch, etc. The machines can capture environmental
data on a very large scale. This is the advantage of large-scale connected object
networks, such as in “smart cities”. And the volumes of data generated by connected
objects are growing, creating both a technological challenge and an opportunity for
their exploitation. In addition, the brain has an unknown characteristic: it does not
include sensory cells. This explains why you can do open brain surgery on someone
awake. The pain is only noticeable at the periphery of the brain. When you have a
migraine, it is generally linked to peripheral pain in the brain, which does not come
from inside. The computer is in the same case: it does not have its sensory sensors.
He doesn’t feel anything if he’s not connected outside. AI without sensors or data is
useless. Indeed, convolutional neural networks use low resolution source images to
take into account current hardware constraints. They are rare to operate in 3D with
stereoscopic vision.
310 D. Saba et al.
The sensor market has experienced strong development since the late 2000s thanks
to the emergence of the smartphone market, powered by the iPhone and Android
smartphones. There are currently about 1.5 billion units sold per year, and they are
renewed approximately every two years by consumers. As an example, connected
home management platforms also take advantage of many room sensors to optimize
comfort [7, 44]. They play on the integration of disparate data: outdoor and indoor
temperature, humidity, brightness and the movements of users, captured with their
smartphones. This allows for example to anticipate the temperature of the accommo-
dation in anticipation of the return home of its occupants [8, 13]. This orchestration
is going more and more often through deep learning to identify user behaviors and
adapt system responses.
Innovation in photo and video sensors is also ceaseless, if only by the minia-
turization of those that equip smartphones and are now equipped with 3D vision.
The American Rambus is working on its side on a photo sensor that does not need
optics! Vibration sensors and microphones have unsuspected industrial applications
revealed by AI: the detection of anomalies. Thus, sensors placed in industrial vehicles
or machines generate a signal which is analyzed by deep learning systems capable
of identifying and characterizing anomalies.
One of the key tools of AI is the deep learning training servers. If it gives very good
results, as in image recognition, it consumes a lot of resources during its training
phase. It takes easily 1000–100,000 times more machine power to train an image
recognition model than to run it next. This explains why, for example, GPUs and other
TPUs (Tensor Processing Units) have a computing capacity of around 100 Tflops/s
while the neural bricks of Huawei’s latest Kirin 980s and the A11 Bionic are content
with 1 at 4 Tflops/s. And again, training the largest neural networks requires hundreds
or even thousands of servers using these GPUs and TPUs. In addition, GPUs are the
most widely deployed hardware solution to accelerate neural networks and deep
learning. They are the ones who made deep learning possible, especially for image
processing and from 2012.
Quantum computers are used to solve so-called exponential mathematical prob-
lems, whose complexity and size grow exponentially with their size. They are based
on qubits, information management units handling 0 s and 1 s, but in an overlap-
ping state, and arranged in registers of several qubits. A system based on n qubits is
capable of simultaneously representing 2n states to which various operations can be
applied simultaneously.
Machine Learning is the most promising area of application for Artificial Intelli-
gence. It uses specific algorithms, created jointly by analysts and developers, which
consolidate and categorize a variety of raw data, combining the processing and anal-
ysis of information [45]. Machine Learning continues to develop its understanding
The Role of Artificial Intelligence in Company’s Decision Making 311
of information to provide an increasingly relevant and useful result to the user. It’s
also learning that is done with or without human intervention. This ability to adapt
automatically ensures that the database of the “machine” is constantly updated and
adapts to the requirements. This allows him to deal with a request from a holistic point
of view, providing a qualitative and targeted analysis of the need. It is interesting to
observe the correlation between Big Data and Machine Learning, namely, to offer
a “machine” that can grasp and process a large amount of raw data. This opens up
the field of possibilities, especially since whatever the field of study, the applications
is multiple. Take for example the case of companies’ financial and administrative
departments: imagine being able to identify improvements within their service on
various subjects such as fraud detection, decision-making, reporting, etc.
Machine Learning should be seen as both a tool that optimizes processes and
improves employee engagement, but also as a great lever for customer satisfaction.
The absence of human intervention in the analysis and processing of data makes
Machine Learning the most avant-garde technology concerning the optimization of
business processes (reduction of the workload of third parties or human interaction).
For example, it can process a supplier’s order, taking into account the current stock
level of the warehouse and that of the requesting point of sale. In parallel, it can
deliver a relevant analysis of the most trendy products (based on the level of sales
of a set of stores), to assist the logistics manager when making his replenishment
choices. The added value of Machine Learning lies here in its real-time qualitative
analysis, which the company can use as decision support. Machine Learning also
represents a lever in controlling fraud risk, by creating algorithms capable of:
• Analyze information on all types of existing fraud;
• Analyze the information to determine the new types of fraud possible;
• And assess the associated financial impacts.
These helps support organizations and make their anti-fraud policy more effective.
For a single person, this requires time as well as meticulous research work. If Machine
Learning is capable of carrying out this detection work, organizations only have to
put their efforts into resolving disputes. Organizations can also create new detection
policies based on machine learning analyzes.
9 Conclusion
motley ecosystem. The vast majority of AI commercial solutions are made up of odds
and ends, depending on specific needs. It’s far from having generic AI solutions, at
least in business. In the general public, AI bricks are already used in everyday life
without users noticing it. This is for example the case of face tracking systems in
the development of photos and videos in smartphones. The most generic techno-
logical building blocks of AI are software libraries and development tools such as
TensorFlow, PyTorch, scikit-learn or Keras. Since 2015, artificial intelligence (AI)
has become one of the top priorities of major players in the digital economy and
businesses. The wave of AI is starting to match that of the Internet and mobility in
terms of impact. It is rather difficult to escape! It has become a tool for business
competitiveness both in the digital sector and in the rest of the economy.
In the environment, decision makers are faced with difficult decisions. Many of
these decisions are made through either intuition, experience or ineffective tradi-
tional approaches. Making appropriate decisions usually involves controlling and
managing risk. Although decision makers have some control over the levels of risk
they are exposed to, risk reduction must be pursued by experts to reduce costs and
use resources efficiently.
Improving added value is the major challenge for companies to be competitive.
However, an Interactive Decision Support System (IDSS) is one that determines good
production practices and the product that maximizes added value.
Businesses are increasingly confronted with problems of such complexity and
size that they cannot be solved by a human, no matter how expert. This is the case for
extracting knowledge from a large mass of documents, integrating data from hetero-
geneous sources, detecting breakdowns or anomalies, re-planning in “real time” in
the event incidents in the fields of transport or production chains, etc. In addition,
the company tends to delegate certain dangerous or tedious activities to robots or
software agents: drones for air combat, demining robots, exploration rovers of the
planet Mars… However, performing these activities requires autonomy, complete or
partial, based on learning and adaptation capacities in the event of an unforeseen
situation. Finally, humans and machines (computers, robots, etc.) are increasingly
called upon to communicate using natural language, speech, or even images. It is
precisely the aim of artificial intelligence and decision support to deal with these
problems which are too complex for humans or to take an interest in autonomy,
learning or even human- machine communication. Demand, in commercial terms,
therefore exists and is likely to increase over the next few years.
References
1. Kvist, T.: Decision making. In: Apical Periodontitis in Root-Filled Teeth: Endodontic
Retreatment and Alternative Approaches (2018)
2. Landrø, M., Pfuhl, G., Engeset, R., et al.: Avalanche decision-making frameworks: classifica-
tion and description of underlying factors. Cold Reg. Sci. Technol. (2020)
3. Emery, J.C., Morton, M.S.S.: Management decision systems: computer-based support for
decision making. Adm. Sci. Q. (1972). https://doi.org/10.2307/2392104
The Role of Artificial Intelligence in Company’s Decision Making 313
4. Druzdzel, M.J., Flynn, R.R.: Decision support systems. In: Understanding Information
Retrieval Systems: Management, Types, and Standards (2011)
5. Adhikari, N.K.J., Beyene, J., Sam, J., et al.: Effects of computerized clinical decision support
systems on practitioner performance. JAMA J. Am. Med. Assoc. (2005)
6. Turban, E., Watkins, P.R.: Integrating expert systems and decision support systems. MIS Q.
Manag. Inf. Syst. (1986). https://doi.org/10.2307/249031
7. Saba, D., Sahli, Y., Abanda, F.H., et al.: Development of new ontological solution for an energy
intelligent management in Adrar city. Sustain. Comput. Inform. Syst. 21, 189–203 (2019).
https://doi.org/10.1016/J.SUSCOM.2019.01.009
8. Saba, D., Laallam, F.Z., Degha, H.E., et al.: Design and development of an intelligent ontology-
based solution for energy management in the home. In: Hassanien, A.E. (ed.) Studies in
Computational Intelligence, 801st edn, pp. 135–167. Springer, Cham, Switzerland (2019)
9. Saba, D., Berbaoui, B., Degha, H.E., Laallam, F.Z.: A Generic optimization solution for hybrid
energy systems based on agent coordination. In: Hassanien, A.E., Shaalan, K., Gaber, T., Tolba,
M.F. (eds.) Advances in Intelligent Systems and Computing, pp. 527–536. Springer, Cham,
Cairo, Egypte (2018)
10. Saba, D., Degha, H.E., Berbaoui, B., et al.: Contribution to the modeling and simulation of
multiagent systems for energy saving in the habitat. In: Proceedings of the 2017 International
Conference on Mathematics and Information Technology, ICMIT 2017 (2018)
11. Saba, D., Laallam, F.Z., Berbaoui, B., Fonbeyin, H.A.: An energy management approach in
hybrid energy system based on agent’s coordination. In: The 2nd International Conference on
Advanced Intelligent Systems and Informatics (AISI’16). Advances in Intelligent Systems and
Computing, Cairo, Egypt (2016)
12. Saba, D., Laallam, F.Z., Hadidi, A.E., Berbaoui, B.: Contribution to the management of energy
in the systems multi renewable sources with energy by the application of the multi agents
systems “MAS”. Energy Proc. 74, 616–623 (2015). https://doi.org/10.1016/J.EGYPRO.2015.
07.792
13. Saba, D., Maouedj, R., Berbaoui, B.: Contribution to the development of an energy management
solution in a green smart home (EMSGSH). In: Proceedings of the 7th International Conference
on Software Engineering and New Technologies—ICSENT 2018. ACM Press, New York, New
York, pp. 1–7 (2018)
14. Miller, C.C., Ireland, R.D.: Intuition in strategic decision making: friend or foe in the fast-paced
21st century?. Acad. Manage., Exec (2005)
15. Mukhopadhyay, A., Chatterjee, S., Saha, D., et al.: Cyber-risk decision models: to insure IT or
not? Decis. Support Syst. (2013). https://doi.org/10.1016/j.dss.2013.04.004
16. Uusitalo, L., Lehikoinen, A., Helle, I., Myrberg, K.: An overview of methods to evaluate
uncertainty of deterministic models in decision support. Environ. Model, Softw (2015)
17. Burstein, F., Holsapple C, Power DJ (2008) Decision support systems: a historical overview.
In: Handbook on Decision Support Systems, 1
18. Ai, F., Dong, Y., Znati, T.: A dynamic decision support system based on geographical infor-
mation and mobile social networks: a model for tsunami risk mitigation in Padang, Indonesia.
Saf. Sci. 90, 62–74 (2016). https://doi.org/10.1016/J.SSCI.2015.09.022
19. Scotch, M., Parmanto, B., Monaco, V.: Evaluation of SOVAT: an OLAP-GIS decision support
system for community health assessment data analysis. BMC Med. Inform. Decis. Mak. (2008).
https://doi.org/10.1186/1472-6947-8-22
20. Lerner, J.S., Li, Y., Valdesolo, P., Kassam, K.S.: Emotion and decision making. Annu. Rev.
Psychol. (2015). https://doi.org/10.1146/annurev-psych-010213-115043
21. Yamashige S (2017) Introduction to decision theories. In: Advances in Japanese Business and
Economics
22. Lands, Ii S.: Better Choices : You Can Make Better Choices. Authorhouse (2015)
23. Schwarz, N.: Emotion, cognition, and decision making. Cogn. Emot. (2000)
24. Simon, H.A.: A behavioral model of rational choice. Q. J. Econ. (1955). https://doi.org/10.
2307/1884852
314 D. Saba et al.
25. Rabin, J., Jackowski, E.M.: Handbook of Information Resource Management. Dekker, M
(1988)
26. Banning, M.: A review of clinical decision making: models and current research. J. Clin. Nurs.
(2008). https://doi.org/10.1111/j.1365-2702.2006.01791.x
27. Schweizer, R., Johanson, J.: Internationalization as an entrepreneurial process. Artic. J. Int.
Entrep. (2010). https://doi.org/10.1007/s10843-010-0064-8
28. de Witte, B.: The decision-making process. In: A Companion to European Union Law and
International Law (2016)
29. Zarte, M., Pechmann, A., Nunes, I.L.: Decision support systems for sustainable manufacturing
surrounding the product and production life cycle—a literature review. J. Clean. Prod. (2019)
30. Scott Morton, M.S.: Management decision systems; computer-based support for deci-
sion making. Division of Research, Graduate School of Business Administration, Harvard
University (1971)
31. Jones, J.W., McCosh, A.M., Morton, M.S.S., Keen, P.G.: Management decision support
systems. decision support systems: an organizational perspective. Adm. Sci. Q. (1980). https://
doi.org/10.2307/2392463
32. Wang, M.S.-Y., Courtney, J.F.: A conceptual architecture for generalized decision support
system software. IEEE Trans. Syst. Man Cybern. SMC 14, 701–711 (1984). https://doi.org/10.
1109/TSMC.1984.6313290
33. Zimmermann, O., Gschwind, T., Küster, J., et al.: Reusable Architectural Decision Models for
Enterprise Application Development, pp. 15–32. Springer, Berlin, Heidelberg (2007)
34. Plataniotis, G., de Kinderen, S., Ma, Q., Proper, E.: A conceptual model for compliance
checking support of enterprise architecture decisions. In: 2015 IEEE 17th Conference on
Business Informatics. IEEE, pp. 191–198 (2015)
35. Schaub, M., Matthes, F., Roth, S.: Towards a conceptual framework for interactive enterprise
architecture management visualizations. In: Lecture Notes in Informatics (LNI), Proceedings—
Series of the Gesellschaft fur Informatik (GI) (2012)
36. Marin, C.A., Monch, L., Leitao, P., et al.: A conceptual architecture based on intelligent services
for manufacturing support systems. In: 2013 IEEE International Conference on Systems, Man,
and Cybernetics. IEEE, pp. 4749–4754 (2013)
37. Harper, P.R.: A review and comparison of classification algorithms for medical decision making.
Health Policy (New York) (2005). https://doi.org/10.1016/j.healthpol.2004.05.002
38. Zaraté, P.: Outils pour la décision coopérative. Hermès Science (2013)
39. Nof SY: Collaborative control theory and decision support systems. Comput. Sci. J. Mold
(2017)
40. Jarrahi, M.H.: Artificial intelligence and the future of work: human-AI symbiosis in organiza-
tional decision making. Bus. Horiz. (2018). https://doi.org/10.1016/j.bushor.2018.03.007
41. Saba, D., Degha, H.E., Berbaoui, B., Maouedj, R.: Development of an Ontology Based Solution
for Energy Saving Through a Smart Home in the City of Adrar in Algeria, pp. 531–541. Springer,
Cham (2018)
42. Saba, D., Zohra Laallam, F., Belmili, H., et al.: Development of an ontology-based generic
optimisation tool for the design of hybrid energy systems. Development of an ontology-based
generic optimisation tool for the design of hybrid energy systems. Int. J. Comput. Appl. Technol.
55, 232–243 (2017). https://doi.org/10.1504/IJCAT.2017.084773
43. Bollweg, L., Bollweg, L., Kurzke, M., et al.: When robots talk—improving the scalability of
practical assignments in moocs using chatbots. EdMedia + Innov. Learn. (2018)
44. Saba, D., Sahli, Y., Berbaoui, B., Maouedj, R.: Towards smart cities: challenges, components,
and architectures. In: Hassanien, A.E., Bhatnagar, R., Khalifa, N.E.M., Taha, M.H.N. (eds.)
Studies in Computational Intelligence: Toward Social Internet of Things (SIoT): Enabling
Technologies, Architectures and Applications. Springer, Cham, pp. 249–286 (2020)
45. Lemley, J., Bazrafkan, S., Corcoran, P.: Deep learning for consumer devices and services:
pushing the limits for machine learning, artificial intelligence, and computer vision. IEEE
Consum. Electron. Mag. (2017). https://doi.org/10.1109/MCE.2016.2640698
The Virtues and Challenges
of Implementing Intelligent
‘Student-Success-Centered’ System
Fatema Abdulrasool
1 Introduction
Universities exist to guarantee student success. There are different definitions and
facets of student academic success. The most prevailing definition is linked to
student retention and graduation rates [41]. “Institutions of higher education have
been concerned about the quality of education and use different means to analyze
and improve the understanding of student success, progress, and retention” [9].
Student success is a complicated endeavor that requires a comprehensive frame-
work that put in mind the technical, social, organizational, and cultural aspects [14,
41]. Information management directly aid the accomplishment of university mission
[10].
In the last year (2019) DUCAUSE Top 10 IT issues list, the issues were categorized
into three main themes: Empowered Students, Trusted Data, and Business Strategies.
Student success was ranked the second most problematic issue in the higher education
intuitions while student-centered institution occupied the fourth position [41].
Forty years ago, teaching and learning processes were conducted in person and
without any use of technology. The introduction of distance learning in 1998 signals
the beginning of technology utilization to support higher education operations. Tech-
nology has affected not only student academic experience, but also the whole opera-
tions of higher education institutions (HEIs). Currently, the whole student life-cycle
is managed and reformed by advanced technologies. The technologies used can posi-
tively or negatively impact the institution ability to attract, admit, enroll, and retain
F. Abdulrasool (B)
University of Tsukuba, Ibaraki 305-8577, Japan
e-mail: [email protected]
© The Editor(s) (if applicable) and The Author(s), under exclusive license 315
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_14
316 F. Abdulrasool
students as well as other stakeholders and partners. Despite the fact that the advance-
ment in technologies made e-campus initiative possible, it intensifies the need for
technology protection and monitoring [23, 39].
Business model ensures business continuity which can be thrive in the 21st century
through cost-efficient use of technology. In higher education, understanding data is
the key to achieve the most pressing priority: student success. Universities produce,
process, transmit, store, and collect streams of information. The focus is on orga-
nizing, standardizing, and safeguarding data and then apply it to advance their mission
[23, 41].
Implementing AI techniques in higher education environment is in its early stages.
Analyzing higher education data started later than other sectors because institutional
data was not stored electronically until recently. Further, the organizational structure
challenged the adoption of intelligent knowledge systems [6].
The use of AI to analyze educational data has been increased due to different
factors which are: enhancement in AI techniques; atomization of student data; and
increasing global competition which requires more innovative ways to deal with the
budgetary constraints and increase return on investments [5, 40].
In the strive to improve student outcomes, the focus is not only on student academic
journey but also on their life circumstances. To do so, trusted and meaningful
data will help in monitoring student achievement and guiding the decision-making
process. The extensive use of IT in all university operations (teaching and learning,
research, and business functions) and the increasing funding challenges necessitate
the integration of IT in the overall institutional strategy and business model [23, 41].
“HEIs are facing a growing governance crisis which might cause a decrease in staff
and students retention rates. ‘Good’ university governance does not simply happen. It
is the product of continuous efforts to find the most appropriate governance structures,
protocols and processes. It is also about timing and judgement: it requires boards
of governors to recognize when a governance model is not working, why and how
to repair it. Ultimately, governance models are created by people to govern people.
They are only as good as they who devise and apply them, as well as those who live
by them” [52].
In this study, we will discuss the virtues and challenges of embedding AI tech-
niques in universities processes to improve student success rates. We will also discuss
the role of a good governance system in ensuring the achievement of the intended
goals. The paper is arranged as follows.
Section 2 describes the meaning of student success; Sect. 3 explains business
intelligence in universities; Sect. 4 describes the strategic role of IT in universities;
Sect. 5 details the virtues of AI in university; Sect. 6 lists the challenges facing the
adoption of AI in universities; Sect. 7 provides a full description of IT Governance
concept; Sect. 8 discusses the differences between agility and IT Governance; Sect. 9
discusses the intelligent “student-success-centered” system; finally, the conclusion
and the references.
The Virtues and Challenges of Implementing Intelligent … 317
2 Student Success
“Student success is a complex endeavor. Universities embrace the aspiration, yet they
are still grappling with big questions about how to define, measure, and structure
student success, all while keeping the student at the center” [41].
There are different definitions and facets of student academic success. The most
prevailing definition is linked to student graduation and retention rates. The different
preventive and corrective techniques implemented by universities to guarantee
students success falls into three categories which are:
1. Lagging (backward): retention and graduation rates are considered as lagging
indicators which investigate events that have already occur and fail to give an
explanation for these events as of why and how it happened.
2. Leading (predictive): focus on analyzing student ongoing performance in
different areas such as: course performance, and full-time continuous enroll-
ment, accumulated GPA, number of registered courses per semester, attendance,
and e-learning activities. These indicators are preventative measures that can help
in identifying at risk student and direct universities resources towards supporting
them before they quit their study.
3. Actionable (behavioral): study student behavior such as timely registration,
participating in advising sessions, active participation and engagement in social
networks and university events [41].
Many studies have been conducted to understand the factors that could elevate
student success rates. Some are related the student profile and previous educational
background while the others are related to the student current academic progress.
Progression reports extracted from the institutional Learning Management System
(LMS) can be utilized to develop proactive programs to mark and support at risk
students [7, 12].
Students retention is a very complex problem that requires a comprehensive
system to solve it. Researches directed toward understanding this phenomenon has
been intensified [21]. Reports show that one fourth of students failed to continue
their study. Universities are leveraging technology to identify at risk students and
effectively support them to ensure their success [21, 33].
There are many indicators that can predict student’s performance even before
enrollment. AI can help in analyzing and finding the hidden pattern and create knowl-
edge that helps in guiding students during program selection phase. College entry
exams results, previous academic history, and students profile information (gender,
age, country, etc.) are examples of data analyzed to create that knowledge [9, 31, 41].
Data-driven technology brings new opportunities to help student success even after
graduation by connecting them with companies that offer jobs or internship positions
[29]. Further, it helps students in developing employability skills which are:
318 F. Abdulrasool
3 Business Intelligence
Nowadays universities are abandoning their obsolete scattered systems and imple-
menting university-wide ERP system that integrates all administrative functions in
one comprehensive system with centralized data warehouse. If implemented effec-
tively, the system could help the institution to achieve its strategic objectives as stated
in the strategic plan [42, 44].
Student Information Systems (SIS) are used to automate processes related to
admissions, course scheduling, exams scheduling, curricula follow up and develop-
ment, study plans, students transfer from one program to another from within the
university or transfers from other universities, graduation, student extra-curriculum
activities, student complaints and enquiries, academic advising, counselling, etc.
Some universities might take their SIS to new levels and include students’ hobbies,
faculty research interests and publications, faculty conference attendance, etc., and
thus moving from a basic student information system to an academic information
system.
SIS play a major role in achieving university objectives by providing the
required information to all university stakeholders especially students, faculty, deans,
chairpersons, vice presidents, the president and the board of directors.
Some of the university systems are closely interconnected like grading and regis-
tration systems, whereas other systems such as alumni and complaint systems require
less integration due to the static nature of data. Unfortunately, software vendors failed
in developing a comprehensive software solution that covers end to end university
business processes [50].
The daily and accumulated information creates a wealth of knowledge which can
be inquired in different ways to provide the different stakeholders with different
reports and information to be used for their purposes. For example, student infor-
mation systems can be used to monitor student progress and identify difficulties
and solve them and thus improve students’ retention and completion rates. Another
example of student information system utilization is to study course taking patterns.
The university senior management can use the student information system to do
their planning, e.g., future recruitment of faculty or for identifying sources for future
students’ recruitment or training opportunities for the students’ internships. There-
fore, it is evident that keeping good student information system and having proper
The Virtues and Challenges of Implementing Intelligent … 319
IT governance for such systems will have a direct influence of achieving university
objectives.
The increasing adoption of enterprise management systems such as ERP,
data warehousing, supply chain management systems, and customer relationship
management systems have expanded the definition of IT infrastructure. As a result,
organizations transformed their fundamental mission from applications develop-
ment towards platform building and solutions delivery. The traditional style of
managing IT infrastructures for cost-effectiveness and efficiency has been extended
to incorporate issues related to global reach and range, flexibility, scalability, and
openness to emerging technologies [47].
The processes of selecting, implementing and evaluation universities ERP systems
have been considered as challenging tasks due to factors related to the decentral-
ized organizational structure of universities which weaken the communication and
collaboration between IT and business functions. As an outcome of the aforemen-
tioned issue, risks increased while benefits are not fully optimized, and resources are
wasted. Institutions who are able to successfully implement high quality, efficient,
effective, and widely accepted ERP system; may gain tremendous amount of benefits
and will surely enhance the university competitiveness capabilities in the turbulent
marketplace [42, 44].
Universities are building an increasing attention towards enterprise wide business
intelligence to fulfill the growing demand for performance management. Business
intelligence defined as “an umbrella term that includes the applications, infrastructure
and tools, and best practices that enable access to and analysis of information to
improve and optimize decisions and performance” [50].
The implementation of business intelligence tools that enhance the analytical
capabilities are very complicated due to the complex system landscapes of univer-
sities, autonomy of departments as well as the decentralization of information.
Although the access to the university scattered information has been improved due
to the adoption of document management tools; the data maintenance issue has been
raised [50].
Classification is an important aspect of data mining (DM) and machine learning
(ML) which are used to forecast the value of a variable based on previously known
historical variables. Statistical classification, decision tree, rule induction, fuzzy rule
induction, and neutral networks are classification algorithms used to predict student
performance. In this study we are not going into the specifics of these algorithm since
it is beyond our scope. Figure 1 shows the subcategories of these algorithms. Some
of these algorithms proved to be better than others in predicting student success [45].
Zacharis [54] forecasted student course performance by examining the data stored
in the Moodle server using neural network algorithm. The researcher studied student
interaction with the system in terms of number of emails, content created with wiki,
number of accessed materials, and online quizzes scores. The prediction model shows
an accuracy level up to 98.3%.
In a previous study, neural networks, logistic regression, discriminant analysis, and
structural equation modelling were compared to determine the most accurate method
in predicting student performance. Results show that the neutral network method is
320 F. Abdulrasool
the most accurate method followed by discriminant analysis [32]. Other study found
that the decision tree is the easiest to be understood and implemented [45].
Online technologies that may threaten traditional universities may also expand
their capacities. Traditional universities have natural advantages in delivering online
learning by simply utilizing the available resources. They already have the required
assets, experience, and human resources [15].
Management of universities realized the importance of utilizing intelligent
systems to improve the overall performance including improving student retention
and graduation rates. These systems can help in creating student’s admission poli-
cies, predict the number of students to be enrolled in the forthcoming semester,
The Virtues and Challenges of Implementing Intelligent … 321
5 Virtues of AI in Universities
Computing capacities and the level of technology utilization defined the future of
HEIs. The advances in Artificial Intelligence brings new possibilities and challenges
for teaching and learning, research, and governance of HEIs. AI techniques can be
used in many aspects of universities administrative functions such as: helpdesk, auto-
mated timetable, and decision making. It also helps in the university core missions;
teaching and learning, and research; by means like: AI data analysis tools and
personalized online learning [43].
The following sections illustrates the benefits of implementing AI in universities.
5.1 Admission
5.2 Registration
Artificial Intelligence can improve student experience and therefore, student satisfac-
tion. AI techniques that utilize student education history, profile, e-learning perfor-
mance, and historical data about other students can help students in planning their
study plan and registration process. It can also initiate early warnings if the pattern
shows that the student is at risk of not completing the program or found a mismatch
between the student skills, and the academic program. Many universities recognized
the importance of AI in helping students with registration processes, Elon University
is an example. It provides students with a tool for tracking registration process and
helps them in planning the upcoming semester courses [7].
The Virtues and Challenges of Implementing Intelligent … 323
5.4 Research
Research computing deals with the governance of a set of software, hardware and
human expertise that support research practices. Many years ago, the access to
high-performance research facilities was restricted to elite researchers since it was
very expensive. As the cost declined and technologies advances, the access for all
researchers has been assured [19].
There are many software packages dedicated to help researchers in interpreting
and relating data. The new packages included AI techniques which proved to be more
accurate than human in analyzing data and selecting the best regression model.
Student engagement has been defined as “the time and effort students devote to activ-
ities that are empirically linked to desired outcomes of college and what institutions
do to induce students to participate in these activities”.
Many studies have been conducted to understand the impact of deploying intel-
ligent technologies on student engagement, Beeland [8] study is an example. The
aforementioned study aims to evaluate whether the use of interactive whiteboard will
affect student engagement. The results show a positive relationship between the use
of interactive whiteboard in classrooms and student engagement [8].
AI can proactively help in improving students-faculty-university interaction which
in turns improve student success [14, 41].
5.6 Advising
5.7 Summary
6 Challenges of AI in Universities
Although the interest in using AI technologies to support student success has been
increasing drastically, it did not receive the full attention from the university top
management. Successful implementation of a comprehensive student-successes-
centered AI system cannot be achieved without an active engagement of the top
management who can influence the system and university culture [48].
Information quality (accuracy, completeness, and integrity) and algorithm bias can
affect the outcomes of AI function. Ethical and privacy issues pertaining to the use of
students’ personal data must be addressed. Transparency, security, and accountability
are major concerns of information privacy. Noncompliance with privacy rules may
jeopardize institution reputation and existence [4, 40, 55].
Many factors that affect the accuracy of the AI function can be avoided through
the implementation of a good IT Governance system that ensure the quality of data,
Business/IT alignment, clear communication between business and IT function, and
compliance with internal and external laws, rules, and regulation [55].
The governance and management of high-performance computing and campus
cyberinfrastructure deemed to be inappropriate [19].
326 F. Abdulrasool
7 IT Governance
Despite the huge amount of money that universities invest in IT, it’s never seeming
to achieve its full potential. Nowadays, universities recognized the importance of
IT Governance in supporting their mission in attaining the full potential of their IT
spending [25, 30]. The concept of IT Governance focuses on the sustainability in
controlling, managing and monitoring IT activities through five driven mechanisms
which are strategic IT-business alignment, value delivery, IT resource management,
IT risk management, and performance measurement [49]. IT Governance is often
the weakest link in a corporations’ overall governance structure. It represents one
of the fundamental functional governance models receiving a significant increase in
attention by business management [11].
IT Governance concept is defined as “a set of relationships and processes designed
to ensure that the organizations’ IT sustains and extends the organization’s strate-
gies and objectives, delivering benefits and maintaining risks at an acceptable level”
[24, 34].
IT Governance can be generally applied to any organization, e.g., banks, schools,
universities, industrial companies, etc. However, we believe that each domain or
type of business may necessitate its own style of IT Governance which makes the
IT Governance model or methodology more applicable and able to yield better
outcomes.
IT Governance can be deployed using a mixture of structures (strategies and
mechanisms for connecting IT with business), processes (IT monitoring procedure)
and relational mechanisms (participation and collaboration between management)
[11, 38].
Data quality is a complex research area. The most important data quality dimen-
sions are accuracy, completeness, and security/privacy/accessibility. Figure 2 which
was created by COBIT detailed data quality dimensions [17].
Data Governance is a subset of the overall IT Governance. Data Management Inter-
national (DAMA) defines Data Governance as “the exercise of authority, control and
shared decision-making (planning, monitoring and enforcement) over the manage-
ment of data assets. Data Governance is high-level planning and control over data
management” [37].
As the proliferation of technology and communication occurs, the more all-
embracing the governance issues have become. Data and information are what
enables decision making, therefore, Data Governance is the starting point in any
discussion concerning governance frameworks” [37].
Data Governance can’t be separated from IT Governance and it needs a holistic
approach. The university top management including the president and the board
must be deeply involved in the Data Governance endeavors since it can be used as
a strategic tool to enhance the organization ability to compete in the highly diverse
marketplace [23].
The Virtues and Challenges of Implementing Intelligent … 327
Based on Couto, E., et al. the IT Governance practices involving the organizations
procedures, supervision, monitoring and control are proven to have positive impact on
the organizational performance. The main challenge, however, arises from the ability
of the presumed rigidness and relatively static IT Governance approach in coping with
a rapidly changing industry needs and requirements. This issue of IT Governance
and agility has been deemed as a stress between adaptation and anticipation [20].
The significance of enterprise agility has been emphasized by Tallon and Pinson-
neault [51] where they were able to find through an enquiry to 241 organizations that
there is a positive correlation between alignment and agility and between agility
and organizational performance. The study was able to particularly confirm the
substantial effect of agility in volatile business environments [51].
Adapting IT to a rapidly changing environment has been perceived as a chal-
lenging task in key aspects of development, design, testing and implementation.
Agility concept has been misunderstood and confused with other terminologies
leading to more difficulties in scope understanding and implementation. Unlike flex-
ibility that deals with known changing factors, agility is more viewed as the ability
to adhere with a yet unknown context. Ultimately, these challenging variables raise
further questions on whether the current well-established IT Governance structures
are responsive to unknown regulatory and economic changes in any presumed busi-
ness setting including the educational sector and universities. Figure 3 illustrates the
differences between IT Governance and agility.
University resources, policies, and processes affects students ‘success. All universi-
ties functions must be strategically planned and orchestrated towards achieving its
core mission [31].
A conceptual framework that focuses on intelligent governance of AI in univer-
sities was developed (Fig. 4). The kernel of this model is student success which
is an ongoing process that requires regular evaluating, directing and monitoring.
Any change in the university internal policies or the environment surrounding the
educational institution must be examined.
Away from the traditional perception of student success, the model defines student
success in terms of its ability to proactively support and guide students during their
entire academic life to which will surely improve retention and completion rates.
Students success can be forecasted in early stages, even before the student enroll-
ment. With the availability of reliable data, AI can suggest the appropriate academic
program for each student. It also helps student even after graduation by connecting
them with the marketplace.
A successful framework must bear in mind the needs of all stakeholders. Students,
parents, faculty, administrative staff, and top management are the key stakeholders
and they may have not only different needs but may be also conflicting with the
needs of each other. The tripartite IT Governance has been introduced as a solution
to avoid these conflicts.
King [29] specified 5 steps that must be embedded in the institution strategy to
become an intelligent campus. The top management involvement in the goals setting
plans executing is crucial. The advocated five steps are:
1. IT/Business alignment.
2. A portfolio of personalized services.
3. Data strategy.
4. Resource Planning.
5. Agility [29].
Like student success governance, IT Governance is also a continuous process
which focuses on three main area; evaluating, directing, and monitoring IT initiatives.
Policies, laws, and procedures guides the planning, implementation and monitoring
of IT projects. Best practices are used to enlighten the management about the best
way to maximize their profits from their IT investments while risks are controlled,
and resources are optimized.
Universities are facing many issues with regards to the control, management, and
governance of IT. These issues result from the decentralized structure of univer-
sities which makes it very difficult to trace, evaluate, and monitor the interwind
systems with business processes scattered throughout the different departments. The
New technologies that are proliferating brings new challenges and possibilities for
teaching, learning and research, as well as university administration processes; hence,
models to justify additional investment and long-term sustainability plans are needed.
The increasing legislation intended to protect and secure the information assets are
increasing drastically and universities must develop plans and policies to assure
compliance which is the core to its sustainability and existence. The level of infor-
mation technologies awareness increased among the university stakeholders specially
students; this leads to growing demand for more advance technologies acquisition. As
the reliance on IT increased, the risk of system failure increased. The clash between
the need to centralize IT and the resistance from scholars is increasing [18, 56].
Research shows that successful implementation of IT Governance relies not only on
the design of the IT Governance framework, but also on how well-communicated
the IT strategy and IT policy from the board is. In the context of universities, for
the sake of minimizing capital risk and failure of trial and error; universities have to
analyze their current status of IT governance performance before taking any further
steps [27].
IT Governance structure is spearheaded by the Chief Information Officer (CIO)
who supposed to report directly to the university president and should have a seat in
the university council committee.
To govern and manage universities initiative that focus on student success, a
new position in the vice president level has been created with “student success”
notion (VPSS). This move proved to be fruitful and it helped in lifting retention
and completion rates. VPSS is accountable for evaluating, directing, and monitoring
the university operation by ensuring the availability of well-communicated policies,
The Virtues and Challenges of Implementing Intelligent … 331
procedures, and plan that orchestrate student success. Although VPSS is accountable
for the student success, the whole institution is responsible for the management and
implementation of the student success system. The system covers the university-wide
practices including advising, institutional research (IR), business intelligence, and
IT [41].
Communication and collaboration strategies and plans must be in place to grantee
a successful alignment between student success operations and IT Governances
processes. The relationship between the VPPS and CIO can positively or negatively
affect the overall performance of the university.
IT Governance focuses on IS and their risk management and performance. IT
Governance will help those responsible for university governance and IT Manage-
ment to understand, direct and manage the IS and IS security. Implementing univer-
sity governance in general and IT Governance in particular, will put the university
in compliance with the needs key legislation such as Sarbanes-Oxley in the USA.
Risk is an integral part of business process. Universities must be proactive in
identifying and managing risk by developing a risk management plan which will
contribute to its survival. Understanding and managing risk is an inherent part of the
business process. The plan aims to control, mitigate and confront risk that await the
university before they become a threat. The plan contains a clear repository of risks
and controls and how they are related to the business process management along the
dimensions of time and ownership. Reporting findings of audits helps in attaining
compliance with more efficiency. Compliance initiatives such as the Sarbanes-Oxley
forces university models to conduct more transparent audit trail. Security plan must
be developed for information systems with technical and management requirement
to assure data and communications integrity, confidentiality, accountability, access
controls, data availability, as well as disaster recovery contingency measures [46].
IT Governance helps higher education in attaining regulatory compliance
specially when it is facing higher scrutiny from the public. Even though the scholars
advocated top-down management as an effective technique to enforce Information
Security Policy (ISP); the unique structure of HEIs necessitate the need for another
approach to inspire compliance. Participatory leadership and shared governance that
involves employees in decision-making might be the key to foster compliance in
HEIs. ISP enforcement which is achieved using coercive force does not blend well
with higher education culture [28]. Performance measurement of IT processes/IT
controls is one of the critical operational aspects of IT governance. IT governance
process starts with planning phase where the IT and business are aligned. The plan-
ning phase has a guiding impact on the succeeding operating phase where the moni-
toring of IT resources, risks, and management is taking place. The last stage is
the evaluation stage where the KPIs are measured and evaluated (benefits, costs,
opportunities, and risks) [38].
Many countries have liked the financial incentives to institutional performance,
Japan and United States are examples [41]. The adoption of an independent role of
an internal auditing function in enhancing governance by a university has not been
examined thoroughly. The multi-theoretical approach to university governance and
the views of their chief executive officers to examine the extent to which internal
332 F. Abdulrasool
10 Conclusion
References
12. Bryant, T.: Everything depends on the data. EDUCAUSE Rev. https://er.educause.edu/articles/
2017/1/everything-depends-on-the-data
13. Castillo, J.R.: Predictive analytics for student dropout reduction at Pontificia Universidad Jave-
riana Cali. EDUCAUSE Rev. (2019). https://er.educause.edu/articles/2019/12/predictive-ana
lytics-for-student-dropout-reduction-at-pontificia-universidad-javeriana-cali
14. Cheston, A., Stock, D.: The AI-first student experience. EDUCAUSE Rev. (2017). https://er.
educause.edu/articles/2017/6/the-ai-first-student-experience
15. Christensen, C., Eyring, H.J.: The Innovative University: Changing the DNA of Higher
Education. Forum for the Future of Higher Education, pp. 47–53 (2011)
16. Christopher, J.: The adoption of internal audit as a governance control mechanism in Australian
public universities–views from the CEOs. J. Higher Educ. Policy Manag. 34(5), 529–541 (2012
17. COBIT® 2019: Framework: Introduction & Methodology. ISACA (2019)
18. Coen, M., Kelly, U.: Information management and governance in UK higher education
institutions: bringing IT in from the cold. Perspect. Policy Pract. Higher Educ., 7–11 (2007)
19. Collins, S.: What Technology? Reflections on Evolving Services. Educause, pp. 68–88, 30 Oct
2009
20. Couto, E.S., Lopes, F.C., Sousa, R.D.: Can IS/IT governance contribute for business agility?
Procedia Comput. Sci. (2015). http://hdl.handle.net/11328/1311
21. De Carvalho Martinho, V.R., Nunes, C., Minussi, C.R.: Intelligent system for prediction of
school dropout risk group in higher education classroom based on artificial neural networks.
In: IEEE 25th International Conference on Tools with Artificial Intelligence. IEEE (2013).
https://doi.org/10.1109/ictai.2013.33
22. Dennis, M.J.: Artificial Intelligence and recruitment, admission, progression, and retention
22(9), 2–3 (2018)
23. Grajek, S.: Top 10 IT issues, 2019: the student genome project. EDUCAUSE Rev., 4–41 (2019)
24. Gunawan, W., Kalensun, E.P., Fajar, A.N.: Applying COBIT 5 in higher education. Mater. Sci.
Eng. 420, 12108 (2018)
25. Hicks, M., Pervan, G., & Perrin, B.: A Study of the review and improvement of IT governance
in Australian universities. In: CONF-IRM 2012 Proceedings. AIS Electronic Library (AISeL)
(2012)
26. Humble, N., Mozelius, P.: Artificial intelligence in education-a promise, a threat or a hype? In:
Proceedings of the European Conference on the Impact of Artificial Intelligence and Robotics,
pp. 149–156. EM-Normandie Business School Oxford, UK, England (2019)
27. Jairak, K., Praneetpolgrang, P.: Applying IT governance balanced scorecard and importance-
performance analysis for providing IT governance strategy in university. Inform. Manag.
Comput. Sec., 228–249 (2013)
28. Kam, H.-J., Katerattanakul, P., Hong, S.: IT governance framework: one size fits all? In: AMCIS
2016: Surfing the IT Innovation Wave-22nd Americas Conference on Information, San Diego
(2016)
29. King, M.: The AI revolution on campus. EDUCAUSE Rev, 10–22 (2017)
30. Ko, D., Fink, D.: Information technology governance: an evaluation of the theory-practice gap.
Corp. Gov. Int. J. Bus. Soc., 662–674 (2010)
31. Kuh, G.D., Bridges, B. K., Hayek, J.C.: What Matters to Student Success: A Review of the
Literature. America: National Postsecondary Education Cooperative (2006)
32. Lin, J.J., Imbrie, P., Reid, K.J.: Student retention modelling: an evaluation of different methods
and their impact on prediction results. In: Proceedings of the Research in Engineering Education
Symposium, Palm Cove, QLD (2009)
33. Lin, S.-H.: Data Mining for Student Retention Management. CCSC: Southwestern Conference,
pp. 92–99. The Consortium for Computing Sciences in Colleges (2012)
34. lliescu, F.-M.: Auditing IT governance. Informatica Economică 14(1), 93–102 (2010)
35. Marginson, S.: Dynamics of national and global competition in higher education. Higher Educ.,
1–39 (2006)
36. Miller, T., Irvin, M.: Using artificial intelligence with human intelligence for student
success. EDUCAUSE Rev. (2019). https://er.educause.edu/articles/2019/12/using-artificial-int
elligence-with-human-intelligence-for-student-success
334 F. Abdulrasool
37. Mosley, M.: DAMA-DMBOK Functional Framework. Data management Association (2008)
38. Nicho, M., Khan, S.: IT governance measurement tools and its application in IT-business
alignment. J. Int. Technol. Inform. Manag. 26(1), 81–111 (2017)
39. Norris, A.: The catch 22 of technology in higher education. EDUCAUSE Rev. (2018). https://
er.educause.edu/blogs/sponsored/2018/8/the-catch-22-of-technology-in-higher-education
40. Pardo, A., Siemens, G.: Ethical and privacy principles for learning analytics. B. J. Educ.
Technol., 438–450 (2014)
41. Pelletier, K.: Student success: 3 big questions. EDUCAUSE Rev. (2019). https://er.educause.
edu/articles/2019/10/student-success–3-big-questions
42. Pollock, N., Cornford, J.: ERP systems and the university as a ‘Unique’ organisation. Inform.
Technol. People 17(1), 31–52 (2004)
43. Popenici, S.A., Kerr, S.: Exploring the impact of artificial intelligence on teaching and learning
in higher education. Res. Pract. Technol. Enhanc. Learn. (2017). https://doi.org/10.1186/s41
039-017-0062-8
44. Rabaa’i, A.A., Bandara, W., Gable, G.G.: ERP systems in the higher education sector: a descrip-
tive case study. In: 20th Australian Conference on Information Systems, pp. 456–470. ACIS,
Melbourne (2009)
45. Romero, C., Ventura, S., Espejo, P.G., Hervás, C.: Data mining algorithms to classify students.
In: The 1st International Conference on Educational Data Mining, Montréal, Québec, Canada,
pp. 8–17 (2008)
46. Rosca, I.G., Nastase, P., Mihai, F.: Information systems audit for university governance in
bucharest academy of economic studies. Informatica Economica 14(1), 21–31 (2010)
47. Sambamurthy, V., Zmud, R.W.: Research commentary: the organizing logic for an enterprise’s
IT activities in the digital era-a prognosis of practice and a call for research. Inform. Syst. Res.
11(2), 105–114 (2000)
48. Shulman, D.: Personalized learning: toward a grand unifying theory. EDUCAUSE Rev., 10–11
(2016)
49. Subsermsri, P., Jairak, K., Praneetpolgrang, P.: Information technology governance practices
based on sufficiency economy philosophy in the thai university sector. Inform. Technol. People,
195–223 (2015)
50. Svensson, C., Hvolby, H.-H.: Establishing a business process reference model for universities.
Procedia Technol, 635–642 (2012)
51. Tallon, P.P., Pinsonneault, A.: Competing perspectives on the link between strategic information
technology alignment and organizational agility: insights from a mediation model. MIS Q. 35,
463–486 (2011)
52. Trakman, L.: Modelling university governance. Higher Educ. Q., 63–83 (2008)
53. Urh, M., Vukovic, G., Jereb, E., Pintar, R.: The model for Introduction of Gamification into E-
learning in higher education. In: 7th World Conference on Educational Sciences, (WCES-2015),
pp. 388–397. Procedia-Social and Behavioral Sciences, Athens, Greece (2015)
54. Zacharis, N.Z.: Predicting student academic performance in blended learning using artificial
neural networks. Int. J. Artif. Intell. Appl. (IJAIA), 17–29 (2016)
55. Zeide, E.: Artificial intelligence in higher education: applications, promise and perils, and
ethical questions. EDUCAUSE Rev., 31–39 (2019)
56. Zhen, W., Xin-yu, Z.: An ITIL-based IT service management model for Chinese universities.
In: International Conference on Software Engineering Research, pp. 493–497 (2007)
Boosting Traditional
Healthcare-Analytics with Deep
Learning AI: Techniques, Frameworks
and Challenges
1 Introduction
Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) has
gained a lot of traction in the recent past especially in the healthcare sector. The
amalgamation of EHR systems with these technologies are advancing healthcare
sector into a new realm. The analytics platforms and ML algorithms running in the
background of a clinical setup, uses the data stored in EHR, making the treatment
option and outcomes more precise and predictable than ever before. Google has
developed a ML algorithm to help identify cancerous tumors on mammograms. To
improve radiation treatment Google’s DeepMind Health is working with Univer-
sity College London Hospital (UCLH) to develop ML algorithms capable of distin-
guishing healthy and cancerous tissues. To identify skin cancer, Stanford is using a
DL algorithm. The use of new AI solutions in healthcare is limitless ranging from
simplifying administrative process, treating diseases, predicting possibilities of a
patient contracting a certain disease and to giving personalized care to patients [1].
The purpose of this chapter is to provide valuable insights on Deep Learning
techniques used in clinical setup. First, quick overview about Artificial Intelligence,
Machine Learning and Deep Learning are presented. The next sections will discuss
about the DL tools, framework and techniques followed by its limitations and oppor-
tunities. Use cases of DL techniques in general and finally its contributions in the
healthcare sector will be highlighted to emphasize its significance for providing better
healthcare outcomes.
P. S. Mathew (B)
Bishop Cotton Women’s Christian College, Bengaluru, India
e-mail: [email protected]
A. S. Pillai
School of Computing Sciences, Hindustan Institute of Technology and Science, Chennai, India
e-mail: [email protected]
© The Editor(s) (if applicable) and The Author(s), under exclusive license 335
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_15
336 P. S. Mathew and A. S. Pillai
2 Background
While the terms Artificial Intelligence (AI), Machine Learning (ML) and Deep
Learning (DL) are often used interchangeably, they are different. The concept of
AI came first, followed by ML and lastly, ML’s advanced sub-branch DL. DL is
gaining a lot of traction and popularity due to its ability to solve complex problems.
Machine Learning makes use of algorithms that detect patterns and learn directly from
the data and experience, to make predictions without being explicitly programmed.
The three important classes of ML algorithms are:
Supervised Learning: In supervised learning, a system is trained with labelled
data. The system then uses these training data to learn relationship of a given inputs
to a given output. It is used to predict category of a given data.
Unsupervised Learning: In unsupervised learning, system is trained with unla-
beled data. Based on the similarity of the data, clusters are formed, and data is
assigned to it.
Boosting Traditional Healthcare-Analytics with Deep Learning … 337
A deep learning framework is a library or a toolkit for defining deep learning models
without having to worry about the specifics of the background algorithms. There are
several deep learning frameworks available and each of it is built to cater different
objectives. Some of the deep learning frameworks [7–15] are.
It is an open source google project. It is one of the most widely adopted and most
popular deep learning frameworks. This library does numerical computation using
data flow graphs. Its comprehensive and flexible architecture allows deploying of
computation to one or more CPUs or GPUs without modifying code [9]. It is available
both in desktop and mobile. It can be used for Text based detection/summarization,
Image/Sound recognition and Time series/Video analysis [12].
338 P. S. Mathew and A. S. Pillai
3.2 Keras
3.3 Caffe
It is another popular open source deep learning framework for visual recognition and
classification. Caffe stands for Convolutional Architecture for Fast Feature Embed-
ding. It is supported with interfaces like C, C++, Python and MATLAB. It offers
fantastic speed for processing images and modularity, so for modelling CNN (Convo-
lutional Neural Network) or image processing issues it is a right choice. It works
also works with CUDA deep neural networks (cuDNN) [7]. Caffe provides meagre
support for recurrent networks, language modelling while it does not support fine
granular network layers. Facebook open sources a version of Caffe known as Caffe2
to help researchers and developers train huge machine learning models. It is now a
part of PyTorch [8, 12].
3.4 CNTK
The Microsoft Cognitive Toolkit (CNTK) is an open source DL library that consists
of all building blocks which describes neural networks as a series of computational
steps via a directed graph. It is created to deal with huge datasets and facilitates
efficient training for handwriting, voice and facial recognition. It supports both CNN
(Convolutional Neural Network) and RNN (Recurrent Neural Network) [8].
It supports long short-term memory (LTSM) networks along with both RNNs and
CNNs [11, 12].
It is open source framework which runs on python. It uses CUDA along with C/C++
libraries for processing, to scale production of building models and to provide overall
flexibility. PyTorch uses dynamic computational graphs. It works best for small
projects and prototyping purposes. while for cross-platform solution TensorFlow is
a better option than PyTorch. It can be used in deep learning applications such as
image detection/classification, NLP and Reinforcement Learning [7, 8].
3.7 Theano
It is a deep learning framework written in python enabling its users to use python
libraries to define, enhance and evaluate mathematical expressions with multidimen-
sional array. It is tightly integrated with NumPy for data computations and uses GPUs
for performing data intensive computations [7, 13].
3.8 Deeplearning4j
It is a distributed deep learning framework written mainly in Java and Scala. DL4J
supports different neural networks, like CNN (Convolutional Neural Network), RNN
(Recurrent Neural Network), RNTNs (recursive neural tensor networks) and LSTM
(Long Short-Term Memory). It integrates with Kafka, Hadoop and Spark using any
number of GPUs or CPUs. MapReduce is used to train the network, while large matrix
operations are executed depending on other libraries such as ND4J (TensorFlow
library). This framework can be used for image recognition, fraud detection, text
mining, parts-of-speech tagging, and NLP. This framework as it is implemented in
java it is more efficient compared to other frameworks developed using python. For
image recognition using multiple GPUs it is as fast as Caffe [12].
3.9 Chainer
NLP tasks. It supports CUDA computation along with multi-GPU. It provides better
GPU performance than TensorFlow. It is faster than other python-based frameworks.
It supports both RNN and CNN. It is mainly used for sentiment analysis, machine
translation, speech recognition, etc. ChainerMN outperformed MXNet, CNTK and
TensorFlow for ImageNet classification using ResNet-50 giving the best performance
in a multi-node setting [10].
Some other deep learning frameworks are Gluon, Sonnet, Paddle, Fast.ai, nolearn,
BigDL, Lasagne, Elephas, Deeplearning.scala, Apache SINGA etc. [7, 8] (Table 1).
A single framework cannot be chosen to be the best. There are several factors that
are investigated before zeroing in a library such as modularity, ease of use, speed,
documentation, community support, programming language, popularity, application,
easy to deploy etc. Theano is one of the oldest and a stable framework. Chainer
provides excellent documentation support. With multiple GPUs and CPUs, MXNet
offers decent computational scalability which is great for enterprises. Keras and
PyTorch are easy to use while TensorFlow is the most popular deep learning frame-
work. Caffe is known for its image processing speed. Deeplearning4j is best for
those comfortable with Java. CNTK provides better flexibility and scalability while
operating on multiple machines. Choice of frameworks can be made based on its
application and factors that offers best solution to the business challenge at hand.
4 Deep Learning-Techniques/Architecture
ML and DL algorithms have been used to find solutions for complex health problems
and for providing long term quality care to patients. Medical images are valuable
data available in healthcare that when combined with other demographic information
can give valuable insights. Typical deep learning techniques are composed of algo-
rithms like deep neural network (DNN), deep feed-forward neural network (DFFNN),
recurrent neural networks (RNNs), long short-term memory (LSTM)/gated recur-
rent unit (GRU), convolutional neural networks (CNNs), deep belief networks
(DBN), deep stacking networks (DSNs), generative adversarial networks (GAN),
Autoencoders and many more.
Keras 2015 Python ✔ CNN, RNN ✔ 1. Quick and easy 1. Not always easy to
prototyping customize
possible 2. Backend support is
2. Has built-in support restricted to engines
for training on such as TensorFlow,
multiple GPUs Theano, or CNTK
3. Provides simpler
API for developers
4. Light weight
architecture
(continued)
341
Table 1 (continued)
342
Framework Release year Core language CUDA Support Multi GPU Advantages Disadvantages
support models
Caffe 2013 C++ ✔ CNN, LSTM ✔ 1. Specializes in 1. Lacks
computer vision documentation
2. Offers pre-trained 2. Difficult to compile
models 3. No support for
3. Its scalable, fast and distributed training
lightweight 4. Support for
4. Image processing recurrent networks
and learning speed not adequate
is great 5. Cumbersome for
5. Fast and modular huge networks
CNTK 2016 C++ ✔ CNN, RNN, ✔ 1. Good performance 1. Community support
FFNN and scalability is limited
2. Efficient in resource 2. Its capabilities on
usage mobile is limited
3. Difficult to install
MXNet 2015 C++, Python, R, Julia, JavaScript, ✔ LTSM, RNN, ✔ 1. Highly scalable tool 1. Smaller community
Scala, Go, Perl CNN that can run on compared to other
variety of devices frameworks
2. Supports multiple 2. Learning to use it is
GPUs difficult
3. Code is easy to 3. API is not very
maintain and debug user-friendly
4. Ability to code in a
variety of
programming
language
5. Offers cloud
services
(continued)
P. S. Mathew and A. S. Pillai
Table 1 (continued)
Framework Release year Core language CUDA Support Multi GPU Advantages Disadvantages
support models
PyTorch 2001 Python, C++, CUDA ✔ CNN, RNN ✔ 1. Has many 1. Third party required
pre-trained models for visualization
2. Supports data 2. For production API
parallelism and server is required
distributed learning
model
3. Very flexible and
easy to use
4. It allows complex
architecture to be
built easily
5. Supports dynamic
computation graph
and efficient
memory usage
6. supported by major
cloud platforms
Theano 2007 Python, CUDA ✔ CNN, RNN ✔ 1. Offers decent 1. Difficult to learn
computational using it and
Boosting Traditional Healthcare-Analytics with Deep Learning …
Framework Release year Core language CUDA Support Multi GPU Advantages Disadvantages
support models
Deeplearning4J 2014 Java, Scala, CUDA, C, C++, Python ✔ RBM, DBN, ✔ 1. It is supported with 1. As java is not
CNNs, RNNs, distributed immensely popular,
RNTNs, frameworks such as so cannot rely on
LTSM Hadoop/Spark growing codebases
2. Can process huge
amount of data
without
compromising on
speed
3. More efficient as it
is implemented in
Java
Chainer 2015 Python ✔ RNN, CNN, ✔ 1. Provides good 1. Difficult to debug
RL documentation with 2. Community is
examples relatively small
2. Faster than another
python-oriented
framework
3. Supports CUDA
computation
4. Requires little effort
to run on multiple
GPUs
5. Possible to modify
existing network on
the go
P. S. Mathew and A. S. Pillai
Boosting Traditional Healthcare-Analytics with Deep Learning … 345
One of the most popular type of deep neural network architecture is convolutional
neural network (CNNs) which has multiple applications such as speech recognition,
image/object recognition, image processing, image segmentation, video analysis and
natural language processing [15]. Some CNNs are almost bettering the accuracy of
diagnosticians when detecting important features in diagnostic imaging investiga-
tions. It eliminates the need for manual feature extraction as in the case of ML algo-
rithms. The CNN works by extracting relevant features without being pretrained,
they are learned while the network trains on images making it more accurate for
computer vision assignments. A CNN breaks down an image into several attributes
like edges, contours, strokes, textures, gradients, orientation, color and learn them
as representations in different layers. It achieves this by using tens or hundreds of
hidden layers. Every hidden layer increases the complexity of the learned image
features.
A CNN is composed of an input layer, hidden layer which consists of “convolu-
tional layer” and “pooling layer”, fully connected layer and output layer. In hidden
layer, each “convolutional layer” comes after “pooling layer” [17, 18]. The convolu-
tion phase focuses on extracting high-level features from the input layer. ConvNets
are generally not limited to only one convolutional layer. Typically, the first convolu-
tional layer is responsible for capturing the low-level features such as edges, strokes,
color, gradient, orientation, etc. Each the other layers in the architecture takes care
of the other high-level features, to infer what the image is [7].
The pooling layers are placed between successive convolutional layer [19]. It is
responsible for reducing the spatial size of the convolved Feature. Through dimen-
sionality reduction the computational power and the parameters needed to process
the data is decreased. The convolutional layer and the pooling layer together form
the several layers of a convolutional neural network. Depending on the complexities
of the images, the number of such layers may be increased for capturing low levels
details.
The full connection layer takes the result of convolution/pool layers and use them
to classify image. The classified image is then forwarded to the output layer, in which
every neuron represents a classification label [17].
In short CNN can be thought as a model performing two basic tasks on the input
namely: extraction and classification. The convolution and pooling layers perform
feature extraction while fully connected layers acts as a classifier classifying the
images.
DNN deploys a layered architecture with complex function. For this reason, it
requires much more processing power and powerful hardware for such complex
346 P. S. Mathew and A. S. Pillai
RNN is neural network that can analyze sequence of data with variable length input.
RNNs uses knowledge of its previous state as an input for its current prediction,
and this process is repeated an arbitrary number of steps allowing the network to
propagate information by means of its hidden state through time. Doing this, RNNs
exploit two sources of input, the present and the recent past, to provide the output
of the new data. This feature gives RNNs something like a short-term memory. As
RNN is highly effective working with sequence of data that occur in time, it is widely
used in language translation, language generation, time series analysis, analysis of
text, sound, DNA sequence etc. Although the RNN is a straightforward and robust
model, it suffers from the problem of vanishing and exploding gradient [13, 21].
The two variants of gated RNN architecture that help in solving the gradient
problems of RNNs are Long Short-Term Memory (LSTM) and Gated Recurrent
Unit (GRU). Both variants are efficient in applications that requires processing of
long sequences [13, 21]. However, GRU is a simplified version of LSTM which
came later in 2014. This model gets rid of output gate present in LSTM model.
For several applications, the GRU has performance like the LSTM, but as GRU
is simpler, it can be trained quickly, and execution is efficient and faster. It can be
used in applications of text compression, handwriting recognition, gesture/speech
recognition, image captioning [22].
between visible and hidden units. The two layers are connected by a fully bipartite
graph, where every node in the visible unit is connected to every node in the hidden
unit but no two nodes in the same unit are linked to each other. This restriction allows
for more efficient training algorithms and easy to implement than the Boltzmann
machines. RBMs can be used for dimensionality reduction, classification, regression,
feature learning [13].
GAN are generative model for unsupervised and semi-supervised learning models. It
has two neural networks that compete against themselves to create the most conceiv-
able realistic data. In GAN one of the networks is a generator while the other network
is a discriminator. The generator will generate data which is evaluated by the discrim-
inator. Discriminators training data comes from two instances, the real pictures and
the other fake ones produced by the generator. Generator produces better images
maximizing the probability of the discriminator making a mistake, while the discrim-
inator tries to achieve higher accuracy in classifying the synthetic image and the real
348 P. S. Mathew and A. S. Pillai
image. Some of the applications include image style transfer, high resolution image
synthesis, text-to-image synthesis, image super-resolution, anomaly detection, 3D
object generation, music generation, scientific simulations acceleration and many
more [10].
From the deep learning models discussed, it can be noticed that based on the type
and need of the application one can opt DNN, CNN, and RNN for deep supervised
learning, and DAE, RBM and GAN for unsupervised learning. However sometimes
Table 2 Deep learning architectures used in healthcare applications
DL architecture Network type Training type Author Applications in healthcare Reference no.
CNN FFNN Supervised Pereira et al. [26] Brain tumor segmentation [26]
Prasoon et al. [27] Predicting the risk of [27]
Osteoarthritis (OA)
Gulshan et al. [28] Detect diabetic retinopathy [28]
Esteva et al. [29] Classify skin cancer [29]
Sathyanarayana et al. [30] Predicting the quality of sleep [30]
from physical activity using
wearable data during awake
time
Alipanahi et al. [31] DeepBind: predicting the [31]
specificities of DNA- and
RNA-binding proteins
LSTM RNN RNN Supervised Lipton et al. [32] Diagnose patients in Pediatric [32]
Intensive Unit Care
Wang et al. [33] Predicting Alzheimer’s [33]
Disease stage/progression
Beeksma et al. [34] Predicting life expectancy [34]
with electronic medical
Boosting Traditional Healthcare-Analytics with Deep Learning …
records
GRU RNN Supervised Choi et al. [35] Use patient history to predict [35]
RNN diagnoses and medications
Choi et al. [36] Early detection of heart failure [36]
onset
(continued)
349
Table 2 (continued)
350
DL architecture Network type Training type Author Applications in healthcare Reference no.
RBM FFNN Unsupervised van Tulder and de Bruijne Lung CT analysis [37]
[37]
Brosch et al. [38] Detect modes of variations in [38]
Alzheimer’s Disease
Yoo et al. [39] Segment Multiple Sclerosis [39]
(MS) lesions
GAN FFNN Unsupervised/semi-supervised Phúc [40] Unsupervised anomaly [40]
detection on X-ray images
Learning implicit brain MRI [41]
manifolds
DAE FFNN Unsupervised Suk et al. [42] Alzheimer’s Disease/Mild [42]
Cognitive Impairment
diagnosis
Cheng et al. [43] Diagnose breast nodules and [43]
lesions from ultrasound
imaging
Fakoor et al. [44] Predicting protein backbones [44]
from protein sequences
Che et al. [45] Discovering and detection of [45]
characteristic patterns of
physiology in clinical time
series
Miotto et al. [46] Predicting future clinical [46]
events
DNN FFNN Supervised/unsupervised/semi-supervised Alexander et al. [47] Predicting pharmacological [47]
properties of drugs and drug
repurposing
P. S. Mathew and A. S. Pillai
Boosting Traditional Healthcare-Analytics with Deep Learning … 351
GAN can also be used for semi-supervised learning tasks. Four deep generative
networks are as DBN, Deep Boltzmann Machine (DBM), Generative Adversarial
Network (GAN), and Variational Autoencoder (VAE) [48].
For image classification DNN performs better than ANN because of its multiple
hidden layer features basically which increases accuracy factors, but the problem
of vanishing gradient in DNNs, which requires one to consider Convolution Neural
Networks (CNNs). CNNs are preferred as they provide better visual processing
models. Suvajit Dutta et al., in their work compared the performance of different
NN models for medical image processing without GPU support. The computational
time when compared, FFNN took the highest for training followed by DNN and
CNN. As per their study with the CPU training DNN performed better than FFNN
and CNNs [49].
Tobore et al., in their study points out that RNN and CNN have been increasingly
used in applications over the years, with enormous growth rate in CNN. This can be
attributed to the success recorded in image data and the many available variants of the
model. The CNN and RNN techniques are commonly used DL techniques as they are
often used for solving data problems in the form and shape that are either visual or
time-dependent, which they are known to handle effectively. PET and CT scan image
processing are of many dominant health care applications where CNN has proved
to be processing techniques that provides required performance. Although CNN is
a feedforward NN the flow of information is in forward direction only, in RNN,
the information flows in both direction as it operates on the principle of saving the
output of the previous layer and feeding it back to the input to predict the output. The
principle of operation of CNN is influenced by the consecutive layer organization
so, it is designed to recognize patterns across space. For this reason, CNN is ideal for
images magnetic resonance images [MRI], videos (e.g. moving pattern in organs),
and graphics (e.g. tumor representation) to recognize features. CNN has been used for
several medical applications such as for automatic feature extraction from endoscopy
images, as a classifier for detection of lesion in thoraco-abdominal lymph node and
interstitial lung disease, Alzheimer’s disease detection, Polyp recognition, Diabetic
retinopathy, lung cancer, cardiac imaging especially for Calcium score quantification
and many more. CNN architectures GoogLeNet, LeNet-5 and AlexNet are used for
automatic features extraction and classification achieved 95% and above accuracy
[50]. CNNs can be invariant to transformations such as translation, scale, and rotation
which allows abstracting an object’s identity or category from the specifics of the
visual input enabling the network to recognize a given object even when the actual
pixel values on the image vary significantly [51].
In contrast, RNN technique is suited to recognize patterns across time, making it a
perfect candidate for time series analysis such as sound (e.g. heartbeat, speech), text
(e.g. medical records, gene sequence), and signals (e.g. electrocardiogram (ECG)).
The deep generative models DBN and DBM have been least utilized techniques.
Major problems are it being computationally expensive and low rate of applica-
tion. DBN and DBM architectures are derived from RBM and are initialized by
layer wise greedy training of RBM. However, they are qualitatively different. The
connections between the layers are mixed directed/undirected in case of DBN while
352 P. S. Mathew and A. S. Pillai
Deep learning is a great tool that can be used in discovery and development of
drugs. Alexander et al., in their work proposed a convolution neural network which
used transcriptomic data to recognize pharmacological properties of multiple drugs
across different biological systems and conditions crucial for Predicting Pharmaco-
logical Properties of Drugs and Drug Repurposing, DNN achieved high classification
accuracy and it outpaced the support vector machine (SVM) model [47]. Google’s
AlphaFold [4] was able to predict with excellent precision and speed the 3D structure
of proteins an important assessment in drug discovery. Similarly, A model ChemGAN
Boosting Traditional Healthcare-Analytics with Deep Learning … 353
[53] that uses GANs and DeepChem [54] an open source library has been used for
drug discovery.
Medical imaging techniques such as MRI scans, CT scans, ECG, X-ray are used to
diagnose a variety of diseases. These medical images analysis is able help to detect
different kinds of cancer at earlier stages with high accuracy. It facilitates doctors
to analyze and predict the disease and provide better treatment. The most common
CNNs like AlexNet and GoogleNet, designed for the analysis of natural images, have
proved their potential in analyzing medical images [55]. Some of the areas where
medical images have been used are: According to missingklink.ai, Haenslle et al.
trained a CNN model that could as accurately as an expert dermatologist diagnose
skin cancer by examining digital image of a skin lesion. Researchers at enclitic created
a device to detect lung cancer nodule in CT images [56]. Google AI algorithm LYmph
Node Assistant (LYNA) can quickly and accurately detect metastasized breast cancer
from pathology images when compared to human doctors achieving a success rate
of 99% [57].
EHR does not just store basic information of the patient and administrative tasks,
but they include a range of data, including the patient’s medical history, labora-
tory reports, demographics, prescription and allergies, sensor measurements, immu-
nization status, radiology images, vital signs etc. A medical concept created by
researchers uses deep learning to analyze data stored in EHR and predict heart fail-
ures up to nine months prior to doctors [56]. Till recent past EHR analysis was done
using ML techniques to convert the data available into knowledge, but now it has
been replaced by deep learning techniques.
Leveraging on the EHR, Choi et al.’s Doctor AI framework, was constructed to
predict future disease diagnosis along with subsequent medical interventions [35].
They trained an RNN model on patients observed clinical event and time, to predict
patients diagnoses, prescription of the medicine and future diagnoses. They found
that their system performed differential diagnosis with similar accuracy to physicians,
achieving up to 79% recall@30 and 64% recall@10. They then expanded their work
[36] by training a GRU network on sequences of clinical event derived from the same
skip-gram procedure, and found better performance for predicting the onset of heart
disease.
DeepCare framework takes clinical concept vectors via a skip-gram embedding
approach creating two separate vectors per patient admission: one for diagnosis
codes, and another for intervention codes. After combining these vectors, it is passed
354 P. S. Mathew and A. S. Pillai
into an LSTM network for predicting the next diagnosis, next intervention and future
readmission for both diabetes and mental health cohorts. Another system Deepr based
on CNN operates with discrete clinical event codes for predicting unplanned readmis-
sion following discharge operates. It examines patients’ multiple visits each having
several associated procedures and diagnosis. It demonstrated a superior accuracy and
the capability to learn predictive clinical motifs. Deep Patient framework essentially
uses stacked denoising autoencoders (SDA) to learn the patient data representations
from multi-domain clinical data. Med2Vec is another algorithm that can efficiently
learn code and visit level details by using EHRs datasets improving the prediction
accuracy [58, 59].
Medical insurance fraud claims can be analyzed using Deep learning techniques.
With predictive analytics, it is possible to predict fraud claims that are expected to
happen in the future. Deep learning also helps insurance industry identify their target
patients to send out discounts. Insurance fraud usually occurs in the form of claims.
A claimant can fake the identity, duplicate claims, overstate repair costs, and submit
false medical receipts and bills. According to Daniel and Prakash, RBMs have proven
to have efficiency in identifying cases of treatment overutilization. They recommend
RBMs not just for detecting insurance fraud, but in any situation where there is high
dimensional categorical outlier detection [60].
Alzheimer detection is one of the challenges that medical industry face. Deep learning
technique is used to detect Alzheimer’s disease (AD) at an early stage. Wang et al.,
proposed a predictive modelling with LSTM RNN that can effectively predict AD
progression stage by using patients’ historical visits and medical patterns [33]. Suk
et al., proposed a method for Latent feature representation with stacked auto-encoder
(SAE) for Alzheimer’s disease/Mild Cognitive Impairment diagnosis. It uses two-
step learning scheme of greedy layer-wise pre-training and the fine-tuning, to reduce
the risk of falling into a poor local optimum, which is the major limitation of the
CNN [42].
Nvidia published a study on deep learning model based project, that used NVIDIA
Tesla K80 GPUs with the cuDNN-accelerated Caffe framework to drop the breast
cancer diagnostic error rates to 85% [61]. Rajaraman et al., proposed a customized and
per-trained CNN model that is scalable with end-to-end feature extraction and classi-
fication to aid in improved malaria parasite detection in thin blood smear images [62].
Boosting Traditional Healthcare-Analytics with Deep Learning … 355
The Diabetic Retinopathy (DR) is an eye disease and a diabetes complication which
cause damage to retina resulting in eye blindness. If detected on time, at an early
stage by retinal screening test, it can be controlled and cured without difficulty.
However, Manual screening to detect DR is often time consuming and difficult as
during the early stages patients rarely show any symptoms. Deep learning has proven
to provide useful and accurate solution that can help prevent this condition. A CNN
model can work with data taken from retinal imaging and detect hemorrhages, the
early symptoms and indicators of DR for detecting the condition [50]. Gulshan et al.
[28] applied Deep Convolutional Neural Network (DCNN) on Eye Picture Archive
Communication System (EyePACS-1) dataset and Messidor-2 dataset for automatic
classification and detection of diabetic retinopathy in retinal fundus images. The
authors claimed 97.5% sensitivity and 93.4% specificity on EyePACS-1: and 96.1%
sensitivity and 93.9% specificity on Messidor-1, respectively.
356 P. S. Mathew and A. S. Pillai
Another method used NVIDIA CUDA DCNN library on Kaggle dataset consisting
of above 80,000 digital fundus images. The features vector was fed to Cu-DCNN
which classified the images into 5 classes using features like exudates, hemorrhages
and micro-aneurysms and achieved up to 95% specificity, 30% sensitivity and 75%
accuracy [50]. Chandrakumar and Kathirvel [65] trained a DCNN with publicly
available Kaggle, DRIVE and STARE dataset to classify affected and healthy retinal
fundus reporting an accuracy of 94%. Kaggle dataset include clinician labelled image
across 5 classes: No DR, Mild, Moderate, Severe and Proliferative DR.
Infectious disease occurs when a person is infected by a pathogen transmitted
through another person or an animal. Forecasting an infectious disease can be quite
a daunting task. Methods of deep learning used for predicting infectious disease
can be useful for designing effective models. The aim of the study proposed by
Chae et al. [66] was to design an infectious disease (scarlet fever, malaria, chicken
pox) prediction model that is more suitable than existing models. The performance
of the ordinary least squares (OLS) and autoregressive integrated moving average
(ARIMA) analysis was used to assess the deep learning models. From the results
for DNN and LSTM, both made much better predictions than the OLS and ARIMA
models for all infectious diseases. The best performance was from the DNN models
although the LSTM models made more accurate predictions.
Kollias et al. [67] designed novel deep neural architectures, composed of CNN
and RNN components, trained with medical imaging data, to obtain excellent
performances in diagnosis and prediction of Parkinson’s disease.
the sequence of DNA and RNA binding proteins. DeepBind can be used to create
computer models that can reveal the effects of changes in the DNA sequence. The
information so obtained can be used to develop more advanced diagnostic tools and
medications. DeeperBind, a more recent model predicts the protein-binding speci-
ficity of DNA sequences using a LSTM recurrent convolutional network. It employs
more complex and deeper layers and showed a better performance than DeepBind for
some proteins. Both cannot be used to construct nucleic acid sequences as DeepBind
and DeeperBind are classification models rather than generative models [56, 70].
6.9 Sensing
In recent times remote patient monitoring system, wearable devices and telemoni-
toring systems are frequently used to provide real-time patient care. For such systems
biosensors are an integral part used to capture and transmit vital signs to the health-
care providers. It is used to constantly monitor the patient’s vitals to detect any
abnormalities, to keep track of health status before hospital admission etc. Iqbal
et al. used deep deterministic learning (DDL) to classify cardiac diseases, such as
myocardial infarction (MI) and atrial fibrillation (Af) which require special attention.
First, they detected an R peak based on fixed threshold values and extracted time-
domain features. The extracted features were then used to recognize patterns and
divided into three classes with ANN and finally executed to detect MI and Af. For
cardiac arrhythmia detection Yıldırım et al. proposed a CNN based approach based
on long term ECG signal analysis with long duration raw ECG signals. They used
10 s segments and trained the classifier for 13, 15 and 17 cardiac arrhythmia diag-
nostic classes. Munir et al. introduced Fuse AD an anomaly detection technique, with
streaming data. The initial step was forecasting models for the next timestamp with
Autoregressive integrated moving average (ARIMA) and CNN in each time-series.
Next it checked whether each timestamp was normal or abnormal by feeding the
forecasted results into the anomaly detector module [71].
One of the major preconditions to use deep learning is the need for massive amount
of training dataset as the quality of deep learning-based classifier greatly relies on
amount of the data. Availability of medical imaging data is often limited making
it a hinderance in the success of deep learning in medical imaging [72]. Another
problem that one would face is that biomedicine and health care domain is generally
extremely complicated. The diseases are highly unpredictable with little knowledge
on what causes or triggers them and how they progress in individual patients. Besides,
the number of patients for certain conditions can be limited which can impact the
learning and prediction accuracy of the model [18].
Despite the insights one could get from the huge amount of data stored in EHR system,
there are still risks involved. As sensitive information about patient’s personal infor-
mation such as address, claims details, treatment related matter can be a soft target
for attacks. HIPAA (Health Insurance Portability and Accountability Act) provides
legal rights to patients to protect their health-related data. So proper protocols must
be in place to ensure that EHR data is not misused. For this purpose, automatic deep
patient de-identification and deployment of intelligent tools can be considered as
another area of future research [58]. Although some work in this area is already
available. Dernoncourt et al. built RNN based de-identification system and evalu-
ated their system with i2b2 2014 data and MIMIC de-identification data and showed
better performance than the existing systems. Later RNN hybrid model was devel-
oped for clinical notes de-identification with bidirectional LSTM model deployed
for character-level representation [73].
The longitudinal EHR data describes patient’s health condition over a period giving
a global context and the short-term dependencies among medical events provide
local context to the patient’s history. Contexts impact the hidden relations among
the clinical variables and future patient health outcomes. As the association between
clinical events are complex, it is difficult to identify the true signals from the long-
term context. For such irregularities, LTSM and GRUs can be used to solve the
challenge of extracting true signals from the long-term context due to their abilities
to handle long-term dependencies. Using gated structures [35] is one such example
where LSTMs or GRUs were applied to model long-term dependencies between
clinical events and making predictions. The existing deep learning models for the
Boosting Traditional Healthcare-Analytics with Deep Learning … 359
healthcare domain cannot handle the time factor in an input. Majority of the time,
the clinical data is not a static input as disease activity in a patient keeps changing.
Hence, designing models that can handle such temporal healthcare data can solve the
issue. There is a wealth of EHR data being collected from continuous time series data
available in the form of vital signs and other timestamped data like laboratory test
results. Some of such applications that make use of vital signs are used for predicting
diagnoses, in-hospital mortality, for distinguishing between gout and leukemia etc.
The major concern with this type of EHR data is the irregularity of scale as signals
are measured at different time scales like hourly, monthly or yearly time scale. Such
irregularities can bring down the performance of the model [58].
The nature of the EHR data is heterogenous, as it comes from multiple modalities
such as clinical notes, medical reports, Medical images, billing data, patient demo-
graphic information, continuous time-series data of vital signs laboratory measure-
ments, prescription, patient history and more. Research suggests that finding patterns
between multimodal data can enhance the accuracy of diagnosis, prediction, and
overall performance of the learning system. However, multimodal learning is still a
challenge due to the heterogeneity of the EHR data [73]. Besides disease-specific
multimodal data, some studies used multivariate time series data. For example, CNN
was applied on multivariate encephalogram (EEG) signals for automated classifi-
cation of normal, preictal, and seizure subjects. LSTM based model was developed
using vital sign series from the Medical Information Mart for Intensive Care III
(MIMIC III) for detection of sepsis. Hierarchical attention bidirectional gated recur-
rent unit (GRU) model was used in a study where clinical documents from the MIMIC
III dataset were automatically tagged with associated diagnosis codes.
To provide an explanation to the classification from clinical notes to diagnosis
codes, an interpretable model based on convolution plus attention model architecture
was introduced. In another study to automatically extract the primary cancer sites and
their laterality, DFFNNs and CNNs were applied, to free-text pathology reports [73].
Several studies focused on vector-based representation of clinical concepts to
reduce the dimensionality of the code space and reveal latent relationships between
similar types of discrete codes. Deep EHR methods for code representation include
the NLP-inspired skip-gram technique for predicting heart failure, a CNN model to
predict unplanned readmissions, an RBM based framework for stratifying suicide
risk, DBMs for diagnosis clustering and a technique based on LSTMs for modelling
disease progression and predicting future risk for diabetes and mental health. While
code-based representations of clinical concepts and patient encounters is the method
to deal with the problem of heterogeneity, they ignore many important real-valued
measurements associated with items such as laboratory tests, intravenous medica-
tion infusions, vital signs, and more. In the future, more research should focus on
processing diverse sets of data directly, rather than relying on codes from vocabularies
360 P. S. Mathew and A. S. Pillai
that are designed for billing purposes [58]. Multi-task learning is also known to solve
the problem of heterogeneity of EHR data. It allows models to jointly learn data across
multiple modalities. Some neurons in the neural network model are shared among all
tasks, while other neurons are specialized for specific tasks. Si and Roberts proposed
MTL framework, based on multi-level CNN that demonstrates the feasibility of using
MTL to efficiently handle patient clinical notes and to obtain a vector-based patient
representation across multiple predictions [74].
Medical images are often overly complex. There are several kinds of medical imaging
devices and each would produce images with thousands of image features. Labels
indicates the target of interest such as true clinical outcomes or true disease pheno-
types. Labels are not always captured well in EHR data and thus are not available
for large number of training models. Label acquisition on the other hand requires
domain knowledge or highly trained domain experts. Lack of meaningful labels [75]
on EHR records is considered as a major barrier to deep learning models [73]. For
supervised learning approaches, labels are manually crafted from the occurrences of
codes, such as diagnosis, procedure, and medication codes. Transfer learning could
offer alternative approaches. For example, [35] used LSTM to model sequences of
diagnostic codes and demonstrated that for the same task the learned knowledge
could be transferred to new datasets. An autoencoder variant architecture [71] was
applied to perform transfer learning from source tasks in which training labels are
ample but of limited clinical value to more specific target tasks such as inferring
prescription from diagnostic codes.
Although deep learning models can produce accurate predictions, they are often
termed as black-box models that lack interpretability and transparency. This is a
serious concern for both clinicians and patients as they are unwilling to accept
machine recommendations without proper reasoning to support it. To tackle this
issue, researchers have used a knowledge distillation approach, which compresses
the knowledge learned from a complex model into a simpler model that is easier to
deploy and interpret. Other mechanisms such as attention mechanism and knowledge
injection via attention have been used to add interpretability and to derive a latent
representation of medical codes [73].
Boosting Traditional Healthcare-Analytics with Deep Learning … 361
8 Conclusion
Over the past few years, deep learning for healthcare has grown tremendously.
From the several healthcare papers reviewed, deep learning models yield better
performance in many healthcare applications when compared to traditional machine
learning methods.
Deep learning models require less effort in feature extraction compared to their
ML counterparts. Deep learning in healthcare can help physician’s diagnose disease,
detect and classify tumors, and predict infectious disease outbreaks with higher accu-
racy. Deep learning models have demonstrated success in several areas of healthcare
especially image classification and disease prediction. It can be seen from the review
that CNN is very well suited for medical image analysis other frameworks like LSTM,
GRU and Autoencoders have been effectively used in several other healthcare tasks.
Deep learning can pave way for next generation of healthcare systems that are more
robust, accurate in their predictions, scalable to include huge EHR data, provides
better healthcare outcomes and support clinicians in their activities. Although, deep
learning techniques are performing well in many analytics tasks, there many inac-
curacies and issues in the system such as data heterogeneity, quality, irregularities,
security, lack of labels and temporal modelling that still needs to be addressed in
order to clearly take advantage of Deep Learning systems further.
References
1. Ed, C.: The Real-World Benefits of Machine Learning in Healthcare. Available from: https://
www.healthcatalyst.com/clinical-applications-of-machine-learning-in-healthcare (2017)
2. Anirudh, V.K.: What Are the Types of Artificial Intelligence: Narrow, General, and Super AI
Explained. Available from:https://it.toolbox.com/tech-101/what-are-the-types-of-artificial-int
elligence-narrow-general-and-super-ai-explained (2019)
3. Michael, C., Kamalnath, V., McCarthy, B.: An executive’s guide to AI. Available
from: https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/an-exe
cutives-guide-to-ai
4. Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., et al.: Deep learning applications and
challenges in big data analytics. J. Big Data 2, 1 (2015). https://doi.org/10.1186/s40537-014-
0007-7
5. Richa, B.: Understanding the difference between deep learning & machine learning. Avail-
able from: https://analyticsindiamag.com/understanding-difference-deep-learning-machine-
learning/ (2017)
6. Sambit, M.: Why Deep Learning over Traditional Machine Learning? Available from: https://
towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-learning-
1b6a99177063 (2018)
7. Dino, Q., He, B., Faria, B.C., Jara, A., Parsons, C., Tsukamoto, S., Wale, R.: IBM Redbooks.
International technical support organization. IBM PowerAI: Deep Learning Unleashed on IBM
Power Systems Serve (2018)
8. Mateusz, O.: Deep Learning Frameworks Comparison—Tensorflow, PyTorch, Keras,
MXNet, The Microsoft Cognitive Toolkit, Caffe, Deeplearning4j, Chainer. https://www.net
guru.com/blog/deep-learning-frameworks-comparison (2019)
362 P. S. Mathew and A. S. Pillai
29. Esteva, A., Kuprel, B., Novoa, R., et al.: Dermatologist-level classification of skin cancer with
deep neural networks. Nature 542, 115–118 (2017). https://doi.org/10.1038/nature21056
30. Sathyanarayana, A., Joty, S., Fernandez-Luque, L., et al.: Sleep quality prediction from
wearable data using deep learning. JMIR M health U health 4, e130 (2016)
31. Alipanahi, B., Delong, A., Weirauch, M.T., et al.: Predicting the sequence specificities of DNA-
and RNA-binding proteins by deep learning. NatBiotechnol. 33, 831–838 (2015)
32. Lipton, Z.C., Kale, D.C., Elkan, C., et al.: Learning to diagnose with LSTM recurrent neural
networks. In: International Conference on Learning Representations, SanDiego, CA, USA,
pp. 1–18 (2015)
33. Wang, T., Qiu, R.G., Yu, M.: Predictive modeling of the progression of Alzheimer’s Disease
with recurrent neural networks. Sci. Rep. 8, 9161 (2018). https://doi.org/10.1038/s41598-018-
27337-w
34. Beeksma, M., Verberne, S., van den Bosch, A., et al.: Predicting life expectancy with a long
short-term memory recurrent neural network using electronic medical records. BMC Med.
Inform. Decis. Mak. 19, 36 (2019). https://doi.org/10.1186/s12911-019-0775-2
35. Choi, E., Bahadori, M.T., Schuetz, A., et al.: Doctor AI: predicting clinical events via recurrent
neural networks. arXiv 2015. http://arxiv.org/abs/1511.05942v11 (2015)
36. Edward, C., Schuetz, A., Stewart, W.F., Sun, J.: Using recurrent neural network models for early
detection of heart failure onset. J. Am. Med. Inform. Assoc. 24(2), 361–370 (2017) (Published
online 2016 Aug 13). https://doi.org/10.1093/jamia/ocw112 (PMCID: PMC5391725)
37. van Tulder, G., de Bruijne, M.: Combining generative and discriminative representation learning
for lung CT analysis with convolutional restricted Boltzmann machines. IEEE Trans. Med.
Imaging 35(5), 1262–1272 (2016)
38. Brosch, T., Tam, R.: Manifold learning of brain MRIs by deep learning. Med. Image Comput.
Comput. Assist. Interv. 16(Pt 2), 633–640 (2013)
39. Yoo, Y., Brosch, T., Traboulsee, A., et al.: Deep learning of image features from unlabeled data
for multiple sclerosis lesion segmentation. In: International Workshop on Machine Learning
in Medical Imaging, Boston, MA, USA, pp. 117–124 (2014)
40. Phúc, L.: https://medium.com/vitalify-asia/gan-for-unsupervised-anomaly-detection-on-x-
ray-images-6b9f678ca57d (2018)
41. Bermudez, C., Plassard, A.J., Davis, T.L., Newton, A.T., Resnick, S.M., Landman, B.A.:
Learning implicit brain MRI manifolds with deep learning. Proc. SPIE Int. Soc. Opt. Eng.
10574–105741L (2018). https://doi.org/10.1117/12.2293515
42. Suk, H., Lee, S., Shen, D.: Latent feature representation with stacked auto-encoder for
AD/MCI diagnosis. Brain Struct. Funct. 220, 841–859 (2015). https://doi.org/10.1007/s00429-
13-0687-3
43. Cheng, J.-Z., Ni, D., Chou, Y.-H., et al.: Computer-aided diagnosis with deep learning archi-
tecture: applications to breast lesions in US images and pulmonary nodules in CT scans. Sci.
Rep. 6, 24454 (2016)
44. Fakoor, R., Ladhak, F., Nazi, A., et al.: Using deep learning to enhance cancer diagnosis and
classification. In: International Conference on Machine Learning, Atlanta, GA, USA (2013)
45. Che, Z., Kale, D., Li, W., et al.: Deep computational phenotyping. In: ACM International
Conference on Knowledge Discovery and Data Mining, Sydney, SW, Australia, pp. 507–516
(2015)
46. Miotto, R., Li, L., Kidd, B.A., et al.: Deep patient: an unsupervised representation to predict
the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016)
47. Alexander, A., Plis, S., Artemov, A., Ulloa, A., Mamoshina, P., Zhavoronkov, A.: Deep learning
applications for predicting pharmacological properties of drugs and drug repurposing using
transcriptomic data. Mol. Pharm. 13, 2524–2530. © 2016 American Chemical Society (2016)
https://doi.org/10.1021/acs.molpharmaceut.6b00248
48. Md Zahangir, A., Taha, T.M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M.S., Hasan, M.,
Van Essen, B.C., Awwal, A.A.S., Asari, V.K.: A state-of-the-art survey on deep learning theory
and architectures. Electronics 8, 292. https://doi.org/10.3390/electronics8030292 https://www.
mdpi.com/journal/electronics (2019)
364 P. S. Mathew and A. S. Pillai
49. Suvajit, D., Manideep, B.C.S., Rai, S., Vijayarajan, V.: A comparative study of deep learning
models for medical image classification. In: 2017 IOP Conference Series: Materials Science and
Engineering, vol. 263, pp. 042097 (2017). https://doi.org/10.1088/1757-899x/263/4/042097
50. Muhammad Imran, R., Naz, S., Zaib, A.: Deep Learning for Medical Image Processing:
Overview, Challenges and Future. https://arxiv.org/ftp/arxiv/papers/1704/1704.06825.pdf
51. Dinggang, S., Wu, G., Suk, H.-I.: Deep learning in medical image analysis. Annu. Rev. Biomed.
Eng. 19, 221–248 (2017). https://doi.org/10.1146/annurev-bioeng-071516-044442
52. Slava, K.: Deep Learning (DL) in Healthcare. https://blog.produvia.com/deep-learning-dl-in-
healthcare-4d24d102d317 (2018)
53. Mostapha, B.: ChemGAN challenge for drug discovery: can AI reproduce natural chemical
diversity?. Archiv preprint: 1708.08227v3
54. Ramsundar, B.: deepchem.io. https://github.com/deepchem/deepchem (2016)
55. Fourcade, A., Khonsari, R.H.: Deep learning in medical image analysis: a third eye for doctors.
J. Stomatology Oral Maxillofac. Surg. 120(4), 279–288 (2019)
56. Missinglink.ai. Available From: https://missinglink.ai/guides/deep-learning-healthcare/deep-
learning-healthcare/
57. Yun, L., Kohlberger, T., Norouzi, M., Dahl, G.E., Smith, J.L.: Artificial intelligence-based
breast cancer nodal metastasis detection insights into the black box for pathologists. Arch.
Pathol. Lab. Med. 143(7), 859–868 (2019). https://doi.org/10.5858/arpa.2018-0147-oa (Epub
Oct 8)
58. Benjamin, S., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep EHR: A Survey of Recent Advances in
Deep Learning Techniques for Electronic Health Record (EHR) Analysis. arXiv 1706.03446v2
[cs.LG] 24 Feb 2018. https://arxiv.org/pdf/1706.03446.pdf (2018)
59. Luciano, C., Veltri, P., Vocaturo, E., Zumpano, E.: Deep learning techniques for electronic
health record analysis. In: 2018 9th International Conference on Information, Intelligence,
Systems and Applications (IISA). 978-1-5386-8161-9/18/$31.00 c 2018 IEEE (2018). https://
doi.org/10.1109/iisa.2018.8633647
60. Daniel, L., Santhana, P.: Deep Learning to Detect Medical Treatment Fraud Proceedings of
Machine Learning Research. KDD 2017: Workshop on Anomaly Detection in Finance, vol.
71, pp. 114–120 (2017)
61. Tony, K.: Deep Learning Drops Error Rate for Breast Cancer Diagnoses by 85%. Nvidia. Avail-
able From: https://blogs.nvidia.com/blog/2016/09/19/deep-learning-breast-cancer-diagnosis/
(2016)
62. Rajaraman, S., et al.: Pre-trained convolutional neural networks as feature extractors toward
improved malaria parasite detection in thin blood smear images. PeerJ (2018). https://doi.org/
10.7717/peerj.4568
63. Ragab, D.A., Sharkas, M., Marshall, S., Ren, J.: (2019) Breast cancer detection using deep
convolutional neural networks and support vector machines. PeerJ 7, e 6201 (2019). https://
doi.org/10.7717/peerj.6201
64. Khushboo, M., Elahi, H., Ayub, A., Frezza, F., Rizzi, A.: Cancer diagnosis using deep learning:
a bibliographic review. Cancers 11(9), 1235 (2019) https://doi.org/10.3390/cancers11091235
65. Chandrakumar, T., Kathirvel, R.: Classifying diabetic retinopathy using deep learning archi-
tecture. Int. J. Eng. Res. Technol. (IJERT) 05(06) (2016). http://dx.doi.org/10.17577/IJERTV
5IS060055
66. Chae, S., Kwon, S., Lee, D.: Predicting infectious disease using deep learning and big data. Int.
J. Environ. Res. Public Health 15(8), 1596 (2018). https://doi.org/10.3390/ijerph15081596
67. Kollias, D., Tagaris, A., Stafylopatis, A., et al.: Deep neural architectures for prediction in
healthcare. Complex Intell. Syst. 4, 119 (2018). https://doi.org/10.1007/s40747-017-0064-6
68. Wolfgang, K., Monti, R., Tamburrini, A., Ohler, U., Akalin, A.: Janggu—Deep Learning for
Genomics. bioRxiv preprint (2019) https://doi.org/10.1101/700450
69. Naveen, J.: Top 5 applications of deep learning in healthcare. Allerin. Available From: https://
www.allerin.com/blog/top-5-applications-of-deep-learning-in-healthcare (2018)
70. Im, J., Park, B., Han, K.: A generative model for constructing nucleic acid sequences binding
to a protein. BMC Genom. 20, 967 (2019). https://doi.org/10.1186/s12864-019-6299-4
Boosting Traditional Healthcare-Analytics with Deep Learning … 365
71. Dubois, S., Romano, N., Jung, K., Shah, N., Kale, D.C.: The Effectiveness of Transfer Learning
in Electronic Health Records Data. Available From: https://openreview.net/forum?id=B1_
E8xrKe (2017)
72. Sayon, D.: A 2020 Guide to Deep Learning for Medical Imaging and the Healthcare
Industry. Nanonets. Available From: https://nanonets.com/blog/deep-learning-for-medical-ima
ging/ (2020)
73. Xiao, C., Choi, E., Sun, J.: Opportunities and challenges in developing deep learning models
using electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 25(10),
1419–1428 (2018). https://doi.org/10.1093/jamia/ocy068
74. Si, Y., Roberts, K.: Deep patient representation of clinical notes via multi-task learning for
mortality prediction. In: AMIA Joint Summits on Translational Science Proceedings. AMIA
Joint Summits Transl. Sci. 2019, 779–788 (2019)
75. Gloria, H.-J.K., Hui, P.: DeepHealth: Deep Learning for Health Informatics. arXiv:1909.00384
[cs.LG] (2019)
Factors Influencing Electronic Service
Quality on Electronic Loyalty in Online
Shopping Context: Data Analysis
Approach
1 Introduction
According to News24 [1], online learning tools for instance Cisco System’s WebEx
Zoom, Instructure’s Canvas, and many other educational tech companies across the
U.S. reported instant increase by 700% in sales. While, the online booking and
travel sites drops by 20.8%, online grocery sector increases by 19.9%, and there is
a fluctuation increase/decrease in the other sectors (Impact of the Coronavirus on
Digital Retail: An Analysis of 1.8 Billion Site Visits—Content square, 2020). To
maintain this growing shopper’s interest in online shopping, retailers should take
advantage and invest more in the adoption of e-commerce [2–4]. The world fair of
coronavirus encourage them to offer their products and services through e-channels,
and to enhance their customer’s e-loyalty [5–7]. This challenge for retailers to step
forward and attract the attention of the safety of e-channels transactions [8–10]. The
rapid growth of online shopping has brought competitive attention to provide higher
service quality to ensure retaining customers [11–13]. The improvement and usage
of internet e-commerce has become the most popular types of doing business. The
importance of this study is improving the elements of electronic channel for the
© The Editor(s) (if applicable) and The Author(s), under exclusive license 367
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_16
368 A. Al-Khayyal et al.
service provider to enhance the customer’s e-loyalty. This study raises the following
research questions:
• RQ1: What are the electronic service quality main factors influences customer’s
e-satisfaction?
• RQ2: What are the electronic service quality main factors influences customer’s
e-trust?
• RQ3: What is the impact of customer’s e-satisfaction and customer’s e-trust on
e-shopping?
• RQ4: How customers’ e-shopping related to customer’s e-loyalty?
Following [14–19], this study discusses the related literature review, methodology,
result and discussion, and conclusion. This study concludes that electronic service
quality highly influences customer e-satisfaction and e-trust. As a result, customer
proceeds online shopping while developing the electronic loyalty.
2 Literature Review
A careful literature review is a crucial phase before conducting any further research
study. It shapes the basis for knowledge accumulation, which in turn eases the theo-
ries’ developments and enhancements, closes the existing gaps in research, and
reveals areas where previous research has missed [15].
Electronic service quality is defined as ‘the extent to which a web site facilitates effi-
cient and effective shopping, purchasing, and delivery of products and services’ [20,
21]. The importance of studying the electronic service quality dimensions arisen from
the extraordinary growth of online shopping [22–24]. An extraordinary perceived
service quality will definitely affect customer satisfaction and trust [2]. Similarly,
lower service quality will lead to adverse word-of-mouth and therefore reduce sales
and revenues as the customer’s transfer to rivals [2, 25, 26]. The literature studied on
this topic brings attention to two scopes. First, how to evaluate, measures, produce
scales for electronic service quality [21]. Second, how electronic service quality
influences other factors such as satisfaction, trust, shopping, and loyalty [2].
Factors Influencing Electronic Service Quality on Electronic … 369
Scholars have adopted and adjusted different measurements for electronic service
quality. This study has focused on three electronic service quality scales, namely
E-S-QUAL, eTransQual, and eTailQ. The E-S-QUAL is a multiple-item scale for
assessing Electronic Service Quality [21], this scale has four dimensions, namely
efficiency, compliance, system availability, and privacy. Where efficiency describe
the easy and fast access using the site, implementation is defined as the extent of
the promises the website makes regarding the delivery of orders and availability of
goods are fulfilled, while system availability indicates for the correct site technical
performance, and privacy scale is the site’s degree of security protects customer
information (Fig. 1).
The eTransQual by [27] have five dimensions, namely responsiveness, reliability,
process, functionality/design, and enjoyment. It’s a transaction process-based
approach for capturing service quality in online shopping. This model demonstrates
a significant positive impact on important outcome variables such as customer
satisfaction and perceived value. Also, it reported that enjoyment is a factor affecting
both relationship duration and repurchase intention as major drivers of a customer’s
lifetime value.
The eTailQ model by [28] has four dimensions including Fulfillment/Reliability,
Website design, Privacy/Security, and Customer Service. This scale is described
as shown in Fig. 2. This model strongly predicts customer judgments of quality,
satisfaction, customer loyalty and attitudes towards the site.
3 Methods
The inclusion and exclusion criteria described in Table 1 were applied on all
researches done on the previous section. Therefore, all included articles in this study
should meet these requirements.
Factors Influencing Electronic Service Quality on Electronic … 371
The articles comprised in this systematic review were collected on February 2020
from five databases: ProQuest One Academic, EBSCO Host, Google scholar,
Emerald, Science Direct. The search terminology includes the keywords (“Electronic
service quality” AND “Electronic loyalty”), (“E-service quality” AND “E-loyalty”),
(“E-service quality” AND “The E-S-QUAL”), (“E-service quality” AND “eTailQ
model”), (“Electronic service quality” AND “The E-S-QUAL”), and (“Electronic
service quality” AND “eTailQ model”). These searches found 126 articles using the
above-mentioned keywords. There were 21 articles found as duplicated and removed
from the total result. Therefore, the total number reduced and becomes 106 articles.
Table 2 shows the results from the selected databases with numbers of included
articles. Figure 3 shows the detailed process of selecting articles for this study.
A quality assessment checklist with eight questions was identified to provide a means
for assessing the quality of the articles that were downloaded for additional analysis
(N = 26) as seen Table 3. The checklist in Table 3 was adapted from those used by
[14, 17, 30–33] who adapted from [34]. Each assessment question was scored with
three-point scale, considering “Yes” as 1 point, “No” as 0 point, and “Partially” as
0.5 point. Therefore, each article could score between 0 and 8, with the higher the
total score an article achieves, the higher the degree of quality to which this article
372 A. Al-Khayyal et al.
related to the research questions. Table 4 shows the quality assessment results for all
the 26 articles. As, it is clear that all the articles have passed this assessment. This
is indicates that all the articles are qualified enough to be used for further analysis
and discussion. This assessment done with fully respect to other researcher work and
efforts.
Table 4 shows the analysis of the selected articles to assess their quality using the
quality assessment questions.
Based on the 26 research articles from 2015 to 2020, included in this systematic
review, this study reported that most of the studies on electronic services quality
were conducted on 2016. Figure 4 summarizes and report the numbers of studies
conducted between 2015 and 2020. Additionally, Fig. 5 shows the numbers of studies
in terms of countries. Figure 6 summarizes the main factors selected frequencies. The
results of this systematic review are stated based on the four research questions.
12
10
0
2019 2018 2017 2016 2015
5
3
2
1
Numerous studies examined the relationship between the electronic service quality
and the customer electronic satisfaction. This systematic review reported that elec-
tronic service quality factors influences customer electronic trust [2, 40, 41, 45, 47].
On the other hand, [37] stated that electronic service quality is partially mediate
the electronic service quality and customer electronic satisfaction. However, [50]
reported that customer electronic trust not directly influences from electronic service
quality. Nevertheless, customers develop electronic trust, when they start have sense
that website is trustworthy [51–53].
– What is the impact of customer’s e-satisfaction and customer’s e-trust on
e-shopping?
– How customers’ e-shopping related to customer’s e-loyalty?
According to [40, 50, 54] the customers should be satisfied and trust the website to
proceed shopping online and develop their electronic loyalty.
376 A. Al-Khayyal et al.
This analysis aims to review and synthesize the literature about electronic service
quality, illustrates what is known about the topic, and proposed a study model
including the hypothesis for future research. After careful review of 26 studies,
it was obvious that the main factors influences the electronic service quality are
website design, privacy, security, efficiency, and customer service. Limitation for
this review includes time allocated for this project, few numbers of studies, and the
researcher little experience. The researcher may repeat this review by changing the
inclusion/exclusion criteria, expanding the time, and changing the search terms for
future research. Online shopping is one of the growing trends among many other
online markets. These markets are heading towards a major technological transfor-
mation due to new and innovative tools such as Artificial Intelligence (AI), Data
Science, Virtual Reality (VR) and Augmented Reality (AR). Customer experience
management is greatly influenced by obtaining customer satisfaction through inte-
grated artificial intelligence technology to provide efficient customer service. These
technologies will be maintained in future work.
References
1. Al-Maroof, R.S., Salloum, S.A.: An Integrated model of continuous intention to use of google
classroom. In: Recent Advances in Intelligent Systems and Smart Applications, pp. 311–335.
Springer, Cham
2. Al-dweeri, R., Obeidat, Z., Al-dwiry, M., Alshurideh, M., Alhorani, A.: The impact of e-service
quality and e-loyalty on online shopping: moderating effect of e-satisfaction and e-trust. Int. J.
Mark. Stud. 9(2), 92–103 (2017)
3. Alshurideh, Masa’deh, R., Al Kurdi, B.: The effect of customer satisfaction upon customer
retention in the Jordanian mobile market: an empirical investigation. Eur. J. Econ. Financ.
Adm. Sci. 47(12), 69–78 (2012)
4. Alshurideh: Do electronic loyalty programs still drive customer choice and repeat purchase
behaviour? Int. J. Electron. Cust. Relatsh. Manag. 12(1), 40–57 (2019)
5. Alshurideh, M., Salloum, S.A., Al Kurdi, B., Al-Emran, M.: Factors affecting the social
networks acceptance: an empirical study using PLS-SEM approach. In: 8th International
Conference on Software and Computer Applications (2019)
6. Alshurideh, M., Al Kurdi, B., Abumari, A., Salloum, S.: Pharmaceutical promotion tools effect
on physician’s adoption of medicine prescribing: evidence from Jordan. Mod. Appl. Sci. 12(11),
210–222 (2018)
7. Alshurideh, et al.: The impact of Islamic Bank’s service quality perception on Jordanian
customer’s loyalty. J. Manag. Res. 9 (2017)
8. Obeidat, Z.M., Alshurideh, M.T., Al Dweeri, R., Masa’deh, R.: The influence of online revenge
acts on consumers psychological and emotional states: does revenge taste sweet? In: 33 IBIMA
Conference Proceedings, 10–11 April 2019, Granada, Spain
9. Kurdi, B.A., Alshurideh, M., Salloum, S.A., Obeidat, Z.M., Al-Dweeri, R.M.: An empirical
investigation into examination of factors influencing university students’ behavior towards
e-learning acceptance using SEM approach. Int. J. Interact. Mob. Technol. 14(2) (2020)
10. Ingham, J., Cadieux, J., Mekki Berrada, A.: E-shopping acceptance: a qualitative and meta-
analytic review. Inf. Manag. (2015)
Factors Influencing Electronic Service Quality on Electronic … 377
32. Salloum, S.A.S., Shaalan, K.: Investigating students’ acceptance of e-learning system in higher
educational environments in the UAE: applying the extended technology acceptance model
(TAM). The British University in Dubai (2018)
33. Alhashmi, S.F.S., Salloum, S.A., Abdallah, S.: Critical success factors for implementing artifi-
cial intelligence (AI) projects in Dubai Government United Arab Emirates (UAE) health sector:
applying the extended technology acceptance model (TAM). In: International Conference on
Advanced Intelligent Systems and Informatics, pp. 393–405 (2019)
34. Kitchenham, S., Charters, B: Guidelines for performing systematic literature reviews in
software engineering. Softw. Eng. Group, Sch. Comput. Sci. Math. Keele Univ. 1–57 (2007)
35. Stamenkov, G., Dika, Z.: Bank employees’ internal and external perspectives on e-service
quality, satisfaction and loyalty. Electron. Mark. 26(3), 291–309 (2016)
36. Ayo, C.K., Oni, A.A., Adewoye, O.J., Eweoya, I.O.: E-banking users’ behaviour: e-service
quality, attitude, and customer satisfaction. Int. J. Bank Mark. 34(3), 347–367 (2016)
37. Kundu, S., Datta, S.K.: Impact of trust on the relationship of e-service quality and customer
satisfaction. EuroMed J. Bus. 10(1), 21–46 (2015)
38. Tiwari, P., Tiwari, S.K., Singh, T.P.: Measuring the effect of e-service quality In online banking.
Prestige Int J Manage IT-Sanchayan, 6(1), 43–52 (2017)
39. Jauw, A.L.J., Purwanto, E.: Moderation effects of cultural dimensions on the relationship
between e-service quality and satisfaction with online purchase. Qual. Access to Success
18(157), 55–60 (2017)
40. Rita, P., Oliveira, T., Farisa, A.: The impact of e-service quality and customer satisfaction on
customer behavior in online shopping. Heliyon 5(10), e02690 (2019)
41. Kao, T.W., Lin, W.T.: The relationship between perceived e-service quality and brand equity:
a simultaneous equations system approach. Comput. Human Behav. 57, 208–218 (2016)
42. İdil, K.A.Y.A., Akbulut, D.H.: Big data analytics in financial reporting and accounting.
PressAcademia Procedia, 7(1), 256–259 (2018)
43. Khan, M.A., Zubair, S.S., Malik, M.: An assessment of e-service quality, e-satisfaction and
e-loyalty: case of online shopping in Pakistan. South Asian J. Bus. Stud. 8(3), 283–302 (2019)
44. Jeon, M.M., Jeong, M.: Customers’ perceived website service quality and its effects on e-loyalty.
Int. J. Contemp. Hosp. Manag. 29(1), 438–457 (2017)
45. Goutam, D., Gopalakrishna, B.V.: Customer loyalty development in online shopping: an inte-
gration of e-service quality model and commitment-trust theory. Manag. Sci. Lett. 8(11),
1149–1158 (2018)
46. Manaf, P.A., Rachmawati, I., Witanto, M., Nugroho, A.: E-satisfaction as a reflection of e-
marketing and e-sequal in influencing e-loyalty on e-commerce. Int. J. Eng. Technol. 7(4.44),
94 (2018)
47. Li, H., Aham-Anyanwu, N., Tevrizci, C., Luo, X.: The interplay between value and service
quality experience: e-loyalty development process through the eTailQ scale and value
perception. Electron. Commer. Res. 15(4), 585–615 (2015)
48. Zeglat, D., Shrafat, F., Al-Smadi, Z.: The impact of the E-service quality of online databases
on users’ behavioral intentions: a perspective of postgraduate students. Int. Rev. Manag. Mark.
6(1), 1–10 (2016)
49. Mohammed, E.M., Wafik, G.M., Abdel Jalil, S.G., El Hasan, Y.A.: The effects of e-service
quality dimensions on tourist’ s e-satisfaction. Int. J. Hosp. Tour. Syst. 9, 12–21 (2017)
50. Chek, Y.L., Ho, J.S.Y.: Consumer Electronics e-retailing: why the alliance of vendors’ e-service
quality, trust and trustworthiness matters. Procedia Soc. Behav. Sci. 219, 804–811 (2016)
51. Al Dmour, H., Alshurideh, M., Shishan, F.: The influence of mobile application quality and
attributes on the continuance intention of mobile shopping. Life Sci. J. 11(10), 172–181 (2014)
52. Al-Dmour, H., Alshuraideh, M., Salehih, S.: A study of Jordanians’ television viewers habits.
Life Sci. J. 11(6), 161–171 (2014)
53. Kurdi: Healthy-food choice and purchasing behaviour analysis: an exploratory study of families
in the UK. Durham University (2016)
54. Kalia, P., Arora, R., Kumalo, S.: E-service quality, consumer satisfaction and future purchase
intentions in e-retail. e-Service J. 10(1), 24 (2016)
IOT within Artificial Intelligence and Data
Science
IoT Sensor Data Analysis and Fusion
Mohamed Sohail
1 Introduction
The Internet of Things (IoT) and Artificial intelligence (AI) are considered the new
decade transformative technologies that hold great promises for a tremendous soci-
etal, technological, and economic benefits [1]. AI has a great potential to revolutionize
how we live, work, learn, discover, and communicate. AI modern research can further
our national priorities and day to day habits, including increased economic prosperity,
improved educational opportunities, enhancing our quality of life, and boosting our
national and homeland security [2]. Because of these potential benefits, many govern-
ments and corporations have invested heavily in AI research and initiatives for many
years [3]. Yet, as with any significant technology in which all the industries have
interest, there are not only tremendous opportunities but also several challenges that
should be considered in guiding the overall directions R&D in the next decade [4].
One of the industries that is resistant to change is the Data Center industry,
but it cannot afford to resist too much. In large data centers, it is complicated
to monitor everything, and it takes too much effort to maintain the environment
in a steady state to support the production, self-healing, and self-optimizing data
centers became a necessity in this era of the data explosion [5]. A self-healing data
center is a data center that is capable of autonomously detecting potential compo-
nent/module/hardware/software failures before they occur and adjust its configura-
tion in order to achieve optimal function at all times [6]. The concept of detecting
malfunctions and data center failures before they happen is called condition-based
maintenance (CBM) or predictive maintenance (PdM) [7]. The approach requires
a combination of historical, simulation, real-time information, expert heuristics,
M. Sohail (B)
Senior Engineer, Solutions Architecture, Dell Technologies, Cairo, Egypt
e-mail: [email protected]
© The Editor(s) (if applicable) and The Author(s), under exclusive license 381
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_17
382 M. Sohail
and model design data to enable such capabilities, as we can show in Fig. 1, the
diagram below show the correlation between the PdM and CBM that can provide
early detection of an event or failure.
The idea of using sensors and actuators to analyze and capture data, then act
on all critical events and relevant data issues like (temperature, pressure, humidity,
vibration, proximity, etc.) and align it with AI and machine learning algorithms
accompanied with smart big data analytics is the basis for any approach that is
unique in addressing the concept of self-healing mechanisms specially in the data
center industry [8, 9].
One of this decade’s facts is that IoT grows at a very fast pace across all industries
[10]. The number of connected “things” (from the simple to the most complex) will
continue to expand over the near future.
All organizations now recognize the power of their data’s value and its impact on
their business either to extract value from it or even use it to optimize their operations
[11]. The ability to maximize their competitiveness will require deep analytics of all
data from all available sources to them. IoT takes that ability to the next level with the
very huge number of connected objects as predicted and estimated by many experts
and industry main players [12].
IoT Sensor Data Analysis and Fusion 383
According to Gartner [13], between 2018 and 2019, organizations that have
deployed artificial intelligence (AI) grew from 4 to 14%, as shown in Fig. 2 of
Gartner hyperscale for AI. According to Gartner’s 2019 CIO Agenda survey. So, we
are in front of an inevitable opportunity in almost all the industries.
In this chapter, we will show a breakdown of how we can combine AI techniques
with IoT sensor data analysis to create a robust model of condition based & predictive
maintenance in a self-healing IoT-enabled Data Center. The self-healing model will
leverage data analytics techniques, trend analysis and anomaly detection algorithms
that aim at achieving a simple and easy approach to provide high accuracy by lever-
aging the sensors in the form of IoT devices themselves via crowdsourcing fashion
to provide high quality results and insights.
384 M. Sohail
The sensor Data Fusion is defined as a mechanism of dealing with a subset of syner-
gistic combination of data derived from various types of data sources like IoT sensors,
to provide a better understanding of the needed insights to solve stringent problems
and gives a value to the generated data. The usage of sensor/data fusion methodologies
has provided many advantages such as enabling the combination of multi-sourced
information in order to form a unified data lake, and a complete picture of a specific
issue thanks to the amount of collected information. Data fusion concepts now are
widely adopted in many areas, due to the arising need to know what is behind a
specific issue.
No one can ever ignore hat Data Fusion became a very wide subject with a lot
of implementations and is used in many use cases. In our treated case study, we are
stressing on applying it in the data center industry, where it can bring a lot of values
provided that the resistant nature of this old industry. We chose the Data Center field,
as it has a lot of potentials from data generation point of view and represents the
corner stone of the private–hydride–public–cloud concepts.
In this chapter present an approach that has been tested with a prototype. A goal is
to expand the implementation to cover other heterogeneous components including
a wide array of different technologies, including servers, coolers, and third-party
storage arrays, as shown in Fig. 3 on the high-level structure for aggregating sensors’
data; starting from the sensor nodes, transport nodes, edge, and data storage. This
can include also the fusion of generated logs from tested components plus externally
implemented sensors in order to have multiple sources of data which can be analyzed
to produce insightful results.
The need to achieve a self-healing and self-optimized data center is long-standing
[14]. This new era of data explosion and specifically the rising adoption of AI and IoT
data analysis, has made that far dream a reality [15, 16]. Not like in the past, novel
techniques are very welcomed nowadays for a variety of reasons including cost
savings, optimal functioning, and dynamic configurations of data center different
resources [17]. It is also crucial for the data center decision makers to have mean-
ingful, precise and real-time information about the current conditions of their data
centers state, in order to make the meaningful decisions about a technology reno-
vation, or an administrative strategy for both the time and efforts needed for the
business continuity without costly disruptions and recovery time objectives [18].
IoT Sensor Data Analysis and Fusion 385
To achieve this goal, collect and process the data, including historical,
streaming/real-time and simulation data, we used an analytics engine by integrating
various big data tools. The analytics engine output is a prediction of the future state
of targeted modules, putting into consideration future releases of firmware, hardware
lifecycle, and simulation of failures analysis.
The prototype implementation included the following sensors: temperature,
current, humidity, and vibration to measure the current working conditions in data
center. The framework is elastic enough to include other types of sensors that can be
added for enrich the results.
4 Challenges
IoT data can be generated quite rapidly, the volume of data can be huge, and the
types of data can be various. We are facing series of challenges. Especially for the
type of generated data with the need to an efficient way of sensors’ data fusion. The
ubiquitous sensors, RFID readers, and other devices involved in the IoT systems
can generate data rapidly, so that the data must be stored with a high throughput.
Furthermore, because the volume of the data is very large, and can increase rapidly,
a data storage solution for the IoT data must not only be able to store massive
data efficiently, but also support horizontal scaling, and adoption of various data
format due to the implementation of the data fusion. Moreover, the IoT data can
386 M. Sohail
be collected from many different sources and consisted of various structured and
unstructured data. Data storage components are expected to have the ability to deal
with heterogeneous data resources.
Today’s emerging problems.
• 59% of fortune 500 companies loose nearly 46$ million dollars per year.
• Reactive rather than pro-active troubleshooting.
• Lack of heterogenous data fusion capabilities.
• Alerts are too late for faster actions.
• Root Cause analysis is done after the issue is raised.
Why now?
• Pressure for more business agility, and abstraction of Data Center hardware.
• Rapid adoption of Software-Defined Data centers.
• Big Data analytics as highest priority for all level customers.
• Infrastructure complexity requires AI based automation to deliver self-healing
capabilities.
• The data center industry is resistant to change due to complexity and sophisticated
nature of the data centers.
• Data centers are often composed of systems silos, similar to the Tower of Babel,
wherein every system is communicating in a different language. Which makes it
difficult to get a unified language to communicate in.
• Data center operators will not have enough trust in these systems to rely on
them. A primary function of the operators is to ensure the availability of data
center services. Operators are rightfully hesitant to turn that responsibility over
to machines until they are confident that those machines can be trusted.
• The proposed solution will rely on CBM and PdM.
• The solution will rely on some of the innovative ideas and a spatial prediction
model for system administrators to get produce proactive notifications about Data
Center assets such as drives, arrays, performance, and hosts failures through
Software defined Storage (SDS), by integrating multiple data sources.
• The solution is also based on a micro services architecture. It includes indepen-
dently deployable services that can deliver business value.
In some cases, some storage systems continue to use legacy code and algorithms
to deliver features like volume management, protection from media failure and
snapshot management. The use of the legacy code and algorithms means that the
implementation of each feature adds latency.
IoT Sensor Data Analysis and Fusion 387
ENTERPRISE TIER
In this scenario we use our backend data center management software to deter-
mine the areas that consume more power and apply our best practices to reduce this
power or seek the peak power saving from the device depending on these operational
statistics, this would be illustrated in the Enterprise tier as shown in Fig. 4.
Analysis of the historical data of disk failures will increase productivity, accuracy
of results, and can identify the faults in categories. It will help the expected data
center facilities impact and identify underlying issues such as:
• RACK location problem (Overheat, High Humidity, Power Interruption, Dusts)
which can indicate where most failures/downtimes come from.
• Manufacturer defects affecting disk lifetime, including random failures from
different locations.
We employed many algorithms to help us in this journey. The algorithms analyzed
the data extracted from the sensors of the data center equipment—servers, arrays,
switches, etc.—with the data of some extra sensors within the center like racks,
cooling systems, and any possible factors that can affect the data center operations.
This will give us a broad understanding of the environment and will enrich the
analytics engine with more data that will help in getting more accurate results.
388 M. Sohail
The extracted data will be processed into a private cloud for more proactive
analysis for possible failures. This will include an automatic intervention with many
systems, for example, the cooling systems and the power plants. In this test phase
we used temperature, humidity, and current sensors.
The self-healing algorithm will be responsible for adjusting the cooling system
based on the data read from the equipment and the implemented sensors.
The framework was developed through three phases. Figure 5 illustrates the corre-
lation between the decision and push notification phases and the maintenance phase.
Here we will depend on three phases including
• “Simulation analysis”: In this phase, the framework simulates failures based
on data generated from the data center to learn how to detect failures and take
appropriate action based on the failure type. Also, in this phase, the framework
can use simple data mining and machine learning algorithms to do classification
and clustering processes on the data received from the data center components
and present all the results in a smart visualization form. Indeed, the importance
of this step is to evaluate performance and accuracy for all the algorithms used in
this phase before applying it on the real environment.
• “Historical analysis”: This phase applied all algorithms used in the previous
phase—simulation analysis—plus the mining of the equipment logs and historical
data. The historical data means all the data saved on the log files for the tested
racks. In this phase, the framework can be used to analyze historical data, detect
any type of previous failures, apply machine-learning algorithms, and visualize
results, in order to build up insights about the reasons of failures that happened
in the past. However, we cannot use this framework in the stream processing or
real-time processing to handle any failure that might occur in real time. We tackled
this challenge in the next phase.
IoT Sensor Data Analysis and Fusion 389
might face hardware failure due to the high temperature before getting a fire event
notification.
To address this problem, we enhanced our framework by using multi-agent tech-
nology. The multi-agent technology allowed for the agent to move through different
machines in the cluster with the required data, code, and status. This helps HDFS to
support hard real time processing because the agent can run the code on the nearest
machine without any permission from the name node on HDFS. Furthermore, there
is another advantage from using an agent system a type of agent, called an adaptive
agent, which works as a learning agent and helps the system take the best solu-
tion from history stored in agent. The Also, there is another type of agent called an
interactive agent, which can detect any failure in the real time and move to adaptive
agent through to a fire events detected by an adaptive agent to make decisions in real
time. All machine-learning algorithms used by the multi-agent system are based on
Mahout.
IoT Sensor Data Analysis and Fusion 391
#include <SPI.h>
#include <Ethernet.h>
#include "DHT.h"
#define DHTPIN 2
#define DHTTYPE DHT22
void setup() {
Serial.begin(9600);
Serial.println("DHTxx test!");
dht.begin();
//delay(1000);
Ethernet.begin(mac, ip);
Serial.print("My IP address: ");
Serial.println(Ethernet.localIP());
}
void loop() {
float h = dht.readHumidity();
float t = dht.readTemperature();
float f = dht.readTemperature(true);
lastConnected = client.connected();
}
void httpRequest() {
if (client.connect(server, 80)) {
Serial.println("connecting...");
client.println("GET /envparams?temp=t&humidity=h
HTTP/1.1");
client.println("Host: xx-server.com");
client.println("User-Agent: arduino-ethernet");
client.println("Connection: close");
client.println();
lastConnectionTime = millis();
}
else {
Serial.println("connection failed");
Serial.println("disconnecting.");
client.stop();
}
}
IoT Sensor Data Analysis and Fusion 393
In this section we are going to add more illustration how the idea looks like. We
preferred to give a good imagination of the data different stages “from creation
till execution”. This includes detailed information about the workflow and how the
sensor data aggregation and fusion should act.
In the Fig. 7 shows the data journey from the perspective of the different available
tiers. Starting from the data generation from multiple sensor data, the edge/gateway,
till the data center/cloud storage.
In Fig. 8, we illustrate in detail the various steps in the solution, the process starts
with the data generation till reaching the data visualization phase.
In Fig. 8 step number 1, we are providing the workflow of the analytics engine to
show up how it runs.
Fig. 8 Holistic view of the sensor data fusion in a self-healed data center
394 M. Sohail
6 Benefits
• Prediction of problem ahead of time and providing proactive solutions rather than
reactive fixes.
• Quick corrective actions can be taken based on the prediction to avoid issues in
the data center.
• Historical trending data is utilized to predict patterns.
• Helps troubleshooting engineer to visualize the statistical data in a datacenter
for their troubleshooting—proactive than reactive trouble shooting by Customer
Support before customer notice.
• Reduced downtime cost per year and datacenter maintenance cost.
• Increased reliability of the data center components due to early prediction of issues
and correction.
• Improved visibility into Key Performance Indicators “KPIs” for customers.
• Networks, storage and virtual infrastructure inside a data center will be better
prepared to handle a flood of events, latency issues, and breakdowns.
• The prediction will be used as a self-healing technique using data analytics.
7 Conclusion
The idea of achieving self-optimized and self-healed data center presents its own
flavor of technical challenges. With the novel techniques of the IoT sensor data
fusion and analysis, we gained some lands towards our goal of making our dream
of a self-optimized data center comes true. Self-optimization as concept requires a
high degree of intelligence built-into data center systems by leveraging the Fusion
analysis and sophisticated AI techniques, because we are never optimizing against
just one variable, we optimize for a whole set of a changing land scape of issues every
day. For this reason, employing the AI should be seriously taken into consideration.
Optimization requires balancing and prioritizing sometimes-competing variables,
and that introduces a high degree of complexity we are not close to address today,
but we are on the right direction due to the novel and advanced techniques we have
now.
In this chapter, we wanted to fill the reader’s imagination gap towards the idea of
self-healing IoT enabled data center and emphasize the importance of shifting some
operations from being manual tasks to AI automated based ones using the IoT sensor
data analysis and fusion technology.
References
1. Allam, Z., Dhunny, Z.A.: On big data, artificial intelligence and smart cities. Cities 89, 80–91
(2019). https://doi.org/10.1016/j.cities.2019.01.032
396 M. Sohail
2. Kirchner, F.: A survey of challenges and potentials for AI technologies. In: Kirchner, F., Straube,
S., Kühn, D., Hoyer, N. (eds.) AI Technology for Underwater Robots, pp. 3–17. Springer
International Publishing, Cham (2020)
3. Benedikt, L., Joshi, C., Nolan, L., et al.: Human-in-the-loop AI in government. In: Proceedings
of the 25th International Conference on Intelligent User Interfaces. ACM, New York, NY, USA,
pp. 488–497 (2020)
4. Laï, M.-C., Brian, M., Mamzer, M.-F.: Perceptions of artificial intelligence in healthcare: find-
ings from a qualitative survey study among actors in France. J. Transl. Med. 18, 14 (2020).
https://doi.org/10.1186/s12967-019-02204-y
5. Hosseini Shirvani, M., Rahmani, A.M., Sahafi, A.: A survey study on virtual machine migra-
tion and server consolidation techniques in DVFS-enabled cloud datacenter: Taxonomy and
challenges. J. King Saud Univ. Comput. Inform. Sci. 32, 267–286 (2020). https://doi.org/10.
1016/j.jksuci.2018.07.001
6. Munn, L.: Injecting failure: data center infrastructures and the imaginaries of resilience. Inform.
Soc. 1–10 (2020). https://doi.org/10.1080/01972243.2020.1737607
7. Fadaeefath Abadi, M., Haghighat, F., Nasiri, F.: Data center maintenance: applications and
future research directions (2020). Facilities ahead-of-p: https://doi.org/10.1108/F-09-2019-
0104
8. El-Din, D.M., Hassanien, A.E., Hassanien, E.E.: Information integrity for multi-sensors data
fusion in smart mobility. In: Hassanien, A.E., Bhatnagar, R., Khalifa, N.E.M., Taha, M.H.N.
(eds.) Toward Social Internet of Things (SIoT): Enabling Technologies, Architectures and
Applications: Emerging Technologies for Connected and Smart Social Objects, pp. 99–121.
Springer International Publishing, Cham (2020)
9. Jindal, R., Kumar, N., Nirwan, H.: MTFCT: a task offloading approach for fog computing and
cloud computing. In: 2020 10th International Conference on Cloud Computing, Data Science
& Engineering (Confluence), pp. 145–149. IEEE (2020)
10. Poongodi, T., Rathee, A., Indrakumari, R., Suresh, P.: IoT sensing capabilities: sensor deploy-
ment and node discovery, wearable sensors, wireless body area network (WBAN), data acquisi-
tion. In: Peng, S.-L., Pal, S., Huang, L. (eds.) Principles of Internet of Things (IoT) Ecosystem:
Insight Paradigm, pp. 127–151. Springer International Publishing, Cham (2020)
11. Bonesso, S., Bruni, E., Gerli, F.: The organizational challenges of big data. In: Behavioral
Competencies of Digital Professionals: Understanding the Role of Emotional Intelligence,
pp. 1–19. Springer International Publishing, Cham (2020)
12. Hussein, D.M.E.-D.M., Hamed, M., Eldeen, N.: A blockchain technology evolution between
business process management (BPM) and Internet-of-Things (IoT). Int. J. Adv. Comput. Sci.
Appl. 9, 442–450 (2018). https://doi.org/10.14569/IJACSA.2018.090856
13. Gartner. https://www.gartner.com/en. Accessed 18 Apr 2020
14. Gupta, P., Gupta, P.K.: Tools for fault and reliability in multilayered cloud. In: Trust & Fault in
Multi Layered Cloud Computing Architecture, pp. 181–194. Springer International Publishing,
Cham (2020)
15. Singla, S.: AI and IoT in healthcare. In: Raj, P., Chatterjee, J.M., Kumar, A., Balamurugan, B.
(eds.) Internet of Things Use Cases for the Healthcare Industry, pp. 1–23. Springer International
Publishing, Cham (2020)
16. Amanullah, M.A., Habeeb, R.A.A., Nasaruddin, F.H., et al.: Deep learning and big data tech-
nologies for IoT security. Comput. Commun. 151, 495–517 (2020). https://doi.org/10.1016/j.
comcom.2020.01.016
17. Singh, A., Singh, R., Bhattacharya, P., et al.: Modern optical data centers: design challenges
and issues. In: Giri, V.K., Verma, N.K., Patel, R.K., Singh, V.P. (eds.) Computing Algorithms
with Applications in Engineering, pp. 37–50. Springer Singapore, Singapore (2020)
18. Hauksson, E., Yoon, C., Yu, E., et al.: Caltech/USGS Southern California Seismic Network
(SCSN) and Southern California Earthquake Data Center (SCEDC): data availability for the
2019 Ridgecrest sequence. Seismol. Res. Lett. (2020). https://doi.org/10.1785/0220190290
Internet of Things Sensor Data Analysis
for Enhanced Living Environments:
A Literature Review and a Case Study
Results on Air Quality Sensing
Gonçalo Marques
1 Introduction
G. Marques (B)
Instituto de Telecomunicações, Universidade da Beira Interior, Covilhã 6201-001, Portugal
e-mail: [email protected]
© The Editor(s) (if applicable) and The Author(s), under exclusive license 397
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_18
398 G. Marques
improve our daily routine and promote health and well-being [16]. IoT architectures
can provide ubiquitous and pervasive methods for environmental data acquisition
and use wireless communication technologies for enhanced connectivity [17–19].
Air quality has a material adverse impact on occupational health and well-being.
Therefore, the air quality supervision is determinant for enhanced living environ-
ments and should be a requirement of all buildings and consequently, an integral
part of the smart city context [20, 21]. Numerous research studies state the relevant
adverse health effects associated with reduced air quality levels, such as premature
death, respiratory, and cardiovascular disease [22].
On the one hand, air quality not only has a material responsibility in human expo-
sure to pollutants but is also crucial for specific groups such as older adults and people
with disabilities [23, 24]. On the other hand, numerous scientific evidence associ-
ated with the negative impacts on health and well-being, particularly on children and
older adults related to reduced air quality levels is available on the literature [25].
Every year, air quality concentration levels are responsible for 3.2 million deaths and
a relevant increase in heart and asthma attacks, dementia, as well as cancer [26, 27].
The consequences of poor air quality are most severe in developing countries
where there is no regulation to control pollutants emissions. However, air quality
levels are also a problem in developed countries. Every year in the USA, approxi-
mately 60,000 premature deaths are reported and linked to reduced air quality levels,
and the healthcare costs related to air quality diseases in healthcare costs reach $150
billion [28]. According to the European Environment Agency, in 2016, air pollu-
tion was responsible for 400,000 premature deaths in the European Union (EU).
The particulate matter caused 412,000 premature deaths in 41 European countries,
and 374,000 occurred in the EU [29]. Moreover, the cost related to the air pollu-
tant emissions effect caused by industrial facilities in 2012 has been estimated as at
least 59–189 billion euros in the EU [30]. Even in locations with good air quality,
levels are reported in situations of short-term exposure which conduct relevant health
symptoms related to sensitive groups such as elderly and children with asthma and
cardiovascular problems [31, 32].
On the one hand, outdoor air quality is a significant public health challenge taking
into account all the before mentioned facts [33]. On the other hand, indoor air quality
(IAQ) is also a critical problem for occupational health. The Environmental Protec-
tion Agency (EPA) recognises that indoor air quality values can be one hundred
times higher than outdoor pollutant levels and established air quality on the top five
environmental hazards to well-being [34].
Moreover, IAQ affects the most underprivileged people worldwide that remain
exposed and therefore, can be compared with other public health problems such as
sexually transmitted infections [25]. IAQ supervision must be seen as an effective
method for the study and assessment of occupational health and well-being. The IAQ
data can be used to detect patterns on the indoor quality and to design interventions
plans to promote health [35]. The proliferation of IoT devices and systems makes
it possible to create automatic methods with sensing, connection and processing
capabilities for enhanced living environments [36].
Internet of Things Sensor Data Analysis for Enhanced Living … 399
Air quality sensing must be an essential element of smart cities and smart homes.
On the one hand, the IAQ can be estimated by providing CO2 supervision system for
enhanced public health and well-being [37]. The CO2 indoor levels can not only be
used to evaluate the necessity of fresh air in a room but also can be used as a secondary
sign of unusual concentrations of other indoor air pollutants [38]. On the other hand,
cities are responsible for a relevant portion of greenhouse emissions. Therefore,
CO2 monitoring must be provided on both indoors and outdoors environments. The
concentrations of CO2 are growing to 400 ppm, achieving further records every time
after they started to be analyzed in 1984 [39]. The outdoor CO2 monitoring can
be relevant to plan interventions on traffic and to detect the emission of abnormal
amounts of CO2 in real-time [40]. Moreover, real-time monitoring of CO2 levels in
smart cities at different locations allows the identification of points of interest to plan
interventions to decrease the greenhouse gases.
This paper presents not only present a literature review on IoT sensor data analytics
for enhanced living environments but also describes the design development a CO2
monitoring system based in IoT architecture for enhanced living environments and
public health. The main contribution of this paper is to present a comparison summary
regarding state of the art in IoT systems for air quality monitoring and present the
results of the design and implementation of a cost-effective system for air quality
analytics using IoT sensor data.
The rest of the document is written as follows: Sect. 2 presents a literature review
on air quality monitoring systems; Sect. 3 presents a case study results of the design
and implementation of a CO2 monitoring system based on IoT, and the conclusion
is presented in Sect. 4.
Several IAQ management systems are presented in state of the art; a review summary
is presented in this section. Numerous low-cost IoT methods for air quality manage-
ment that supports open-source and wireless communication for data collection and
transmission but also allow different places supervision at the same time through
mobile computing technologies are available in the literature.
From the past few years, numerous researchers have contributed to this field;
however, it is not possible to include all those studies in this paper. This section
provides highlights of 12 studies conducted on air quality monitoring in recent years.
The research studies are selected by following these six criteria: (1) present a real case
study of design or implementation of IoT architecture for air quality monitoring; (2)
use low-cost sensors; (3) incorporate various open-source technologies; (4) based on
IoT architecture; (5) indexed in Science Citation Index Expanded and (6) published
on or after 2019.
400 G. Marques
The authors of [48], propose an air quality monitoring system based on IoT, which
incorporates a low-cost sensor for sulfur dioxide (SO2 ), NO2 , CO, ozone (O3 ) and
PM in real-time. The processing unit is based on a Raspberry Pi microcontroller, and
an ESP8266 is used for data transmission using Wi-Fi communication technologies.
The proposed system provides web compatibility for data consulting. The Kalman
Filter (KF) algorithm is used to improve sensors accuracy by 27%.
An air quality monitoring systems to supervise the indoor living environments
of vulnerable groups such as older adults, children, pregnant women and patients
is presented in [49]. The proposed model includes an artificial-intelligent-based
approach for multiple hazard gas detector (MHGD) system that is installed on
a motor vehicle-based robot. This motor vehicle can be controlled remotely and
is developed with an Arduino microcontroller. The system includes iOS, Android
and desktop software for data consulting. The users used are capable of detecting
C2 H5 OH, trimethylamine (CH3 ) and hydrogen (H2 ). The proposed method is opti-
mized to identify hazardous gases such as ethanol, tobacco smoke and off-flavour
from spoiled food.
The authors of [50] propose an IoT architecture for vehicle pollution monitoring.
This system uses a Raspberry Pi as a processing unit and an MQ-2 gas sensor
for LPG and CO sensing features. Wi-Fi communication technologies are used for
data transmission. The data collected is stored in a ThingSpeak cloud service. The
proposed method provides a web software for data consulting and incorporated e-mail
notifications.
An IoT monitoring system which incorporates PM, CO2 , formaldehyde, Volatile
Organic Compounds (VOCs), Benzene (C6 H6 ), NO2 , O3 air quality sensors and
humidity, temperature, light and noise sensing capabilities is presented in [51].
A Raspberry Pi microcontroller is used as a processing unit and uses Ethernet
technology for the Internet connection.
A study presented by the authors of [52] propose the use of a LoRa communication
technology for PM supervision in Southampton, a city of the United Kingdom. The
proposed IoT cyber-physical systems are based on the Raspberry Pi microcontroller
and incorporate four different PM sensors.
Air quality sensing is an essential requirement for smart cities and further for
global public health. Consequently, the design of cost-effective supervision solutions
is a trending and relevant research field. Table 1 present a comparison summary of
the studies analysed in the literature review.
The distribution of the number of studies according to the processor unit used
is presented in Fig. 1. The most used microcontroller in the analyzed studies is the
Raspberry Pi (N = 4) which corresponds to 30%, followed by ESP8266 (N = 2) and
Arduino (N = 2) responsible for 17% each.
Figure 2 presents the number of studies distributed according to the gas sensor
used. Considering the studies analyzed in this literature review, the CO is the most
used sensor (N = 6) followed by NO2 (N = 5) and PM (N = 5). The least used
sensors are NH3 , SO2 , CH3 and VOC, which are used in only one study.
402 G. Marques
In sum, the literature review presents several automatic systems and projects for
air quality sensing, which support the relevance of the IoT architecture and mobile
computing technologies as having a significant role in promoting health and well-
being.
An air quality monitoring system has been developed to provide real-time CO2
supervision. The proposed system has been designed using an ESP8266 microcon-
troller which provides built-in Wi-Fi connectivity capabilities and includes an SGP30
Multi-Pixel Gas sensor as a sensing unit for CO2 monitoring. Moreover, the proposed
architecture incorporates an AM2315 temperature and humidity sensor. The system
architecture is based on IoT. The sensors are connected to the ESP8266 via I2C serial
communication protocol. A 32-bit MCU and a 10-bit analogue-to-digital converter
are incorporated in the ESP8266 microcontroller which also supports built-in Wi-Fi
connection features. For continuous humidity compensation of the CO2 sensor, an
AM2315 temperature and humidity sensor have been used to promote data accuracy
[53]. The SGP30 is a calibrated sensor developed by Sensirion. The technical data
of this sensor is available in [54, 55]. The SGP30 Multi-Pixel Gas sensor is a metal-
oxide gas sensor developed which provides a calibrated output with 15% of accuracy
[54]. The sensor range for eCO2 and TVOC concentration is 400–60,000 ppm and
0–60,000 ppb, respectively. The eCO2 output is based on a hydrogen measurement.
The sensor sampling rate is 1 second, and the average current consumption is 50 mA.
Internet of Things Sensor Data Analysis for Enhanced Living … 405
The main objective of the proposed architecture is to promote health and well-
being for enhanced living environments. The proposed method provides a contin-
uous stream of IoT data than can be accessed using mobile computing technologies.
Furthermore, this data for air quality management to improve public health and for
enhanced safety (Fig. 4). The proposed cyber-physical system based on the ESP8266
module that supports the IEEE 802.11 b/g/n protocol provides data acquisition which
is stored in a Microsoft SQL Server database using ASP.NET web services.
The acquisition module firmware incorporates the Arduino Core, which is an
open-source framework that intends to extend the Arduino libraries support to the
ESP8266 microcontrollers.
Furthermore, the proposed data acquisition module has been developed using on
open-source frameworks, and is a cost-effective system, with numerous benefits when
associated with actual methods. The ES8266 module used is based on FireBeetle
ESP8266 (DFRobot) board which is used as a processing unit. The CO2 sensor is a
the SGP30 gas sensor module (Adafruit) and the temperature and humidity sensor is
incorporated in an encased I2C module (Adafruit) which provides protection from
wind and rain for outdoor use. Table 2 provides a technical description of the sensors
used.
The system operation is presented in Fig. 5. The data collected is saved in real-time
every 30 seconds using web services.
The proposed method offers a history of the variations of the monitored envi-
ronment to provide IoT sensor data in a continuous mode. The collected data can
support the home or city administrator to deliver an accurate examination of the
environment. The data collected can also be applied to assist decision-making on
potential arbitrations for enhanced public health and well-being.
The proposed provides an easy configuration of the wireless network. The system
is configured as a Wi-Fi client. However, if it is incapable of connecting to the wireless
network or if no Wi-Fi networks are ready, the proposed system turn to hotspot mode
and start a wireless network. At this time, this hotspot can be used to configure the
Wi-Fi network to which the system will be connected through the introduction of the
network access credentials.
A mobile application has been developed to provide real-time data to the collected
data. This application is compatible with iOS 12 or newer and is represented in Fig. 6.
406 G. Marques
The mobile application is used for two primary purposes. This software provides data
consulting features in real-time but also allows the user to receive notifications.
As people typically keep mobile phones with them for everyday use, the proposed
applications is a powerful tool to access air quality levels. Consequently, people
can hold the parameters of chronicle records for additional examination [56]. The
introduced method is a decision-making tool to project arbitrations sustained by the
data obtained for enhanced public health.
The mobile software provides fast and intuitive access to the collected data as
mobile phones have today high performance and storage capacities. In this way, the
Internet of Things Sensor Data Analysis for Enhanced Living … 407
building or city manager can transport the air quality data of their environment for
further analysis.
The mobile application provides data consulting as graphical or statistical modes.
The graphical representation of the data collected is presented in Fig. 7.
On the one hand, outdoor air quality is conditioned by the levels of numerous
substances such as PM, nitrogen dioxide, hydrocarbons, carbon monoxide and ozone
which result from combustion sources [57]. On the other hand, air quality is also influ-
enced by meteorology and ventilation conditions such as low wind and convention
that promote pollutant concentration. This phenomenon is linked to the increase in
carbon dioxide [58].
408 G. Marques
Fig. 8 Carbon dioxide mean levels collected during the research conducted grouped by hour
Fig. 9 Carbon dioxide mean levels collected during the research conducted grouped by weekday
According to the collected data, the higher levels of CO2 are collected between 12
and 14 hours, which can be associated with the lunch break. At this time, people
typically use vehicles to leave their jobs and went away for lunch or are going back
to work.
Figure 9 (ppm values) represent mean levels collected during the research
conducted grouped by weekday. The high concentration values are collected on
Tuesday, and the lower values are registered on Sunday. The concentration of CO2
can be associated with the activities carried at the location of the data collection.
Moreover, this data can be used to study the behaviour of people, and activities
carried. Sundays are typically a quiet day when people stay in their homes since they
do not work. Moreover, industry and companies are typically closed.
The results state that the proposed air quality monitoring system can be used to
detect unhealthy scenarios at low-cost. This IoT sensor data can be accessed using a
mobile application which provides an improved inspection of observed parameters
behaviours when compared with the statistical presentation. This IoT architecture
offers chronological data evolution for improved air quality assessment, which is
particularly appropriate to identify unhealthy scenarios and project interferences to
improve health and well-being.
Numerous air quality monitoring systems are high-priced and are established
only provide a random sampling. Therefore, IoT sensor-based architectures can
provide cost-effective solutions to provide continuous data collection in real-time
and significantly promotes enhanced living environments.
The proposed architecture is scalable since new modules can be added according
to the requirements of the scenario. The results are promising as the proposed system
can be used to provide a correct air quality assessment at low-cost.
410 G. Marques
4 Conclusion
IoT sensor data analysis in smart cities will powerfully improve people daily routine.
Cyber-physical systems can be used to address multiple complex challenges in
multiple fields as IoT can provide the operability between different systems to develop
a centralised urban-scale framework for enhanced living environments.
Furthermore, IoT architectures can provide ubiquitous and pervasive methods
for environmental data acquisition and provide connectivity for data transmission.
These systems incorporate multiple sensors which provide a continuous stream of
relevant data. The data collected from IoT cyber-physical systems by their own
will not have a high impact on public health and well-being, but the analysis and
the adoption of data science methods supported by artificial intelligence methods
will significantly contribute for enhanced living environments. The data collection
conducted by sensors connected to IoT systems provide an active and continuous
stream of data to support multiple activities and significantly improve people daily
routine. The adverse effects of poor air quality on occupational health and well-
being IoT sensor data analytics must be seen as an integral part of society’s everyday
activities and must be incorporated in smart cities for enhanced living environments.
Nevertheless, IoT systems have some limitations regarding the quality of the
collected data from low-cost sensors. However, several authors are adopting artificial
intelligence methods to improve data accuracy that is relevant for specific domains.
This paper has presented a literature review on IoT architectures for air quality
monitoring and presents a case study of an IoT sensor data analytics. The results
are promising and state CO2 sensor data from IoT architectures as an effective and
efficient method to promote public health.
Internet of Things Sensor Data Analysis for Enhanced Living … 411
References
1. Giusto, D. (ed.): The Internet of Things: 20th Tyrrhenian Workshop on Digital Communica-
tions. Springer, New York (2010)
2. Marques, G., Pitarma, R., Garcia, N.M., Pombo, N.: Internet of things architectures, tech-
nologies, applications, challenges, and future directions for enhanced living environments and
healthcare systems: a review. Electronics 8, 1081 (2019). https://doi.org/10.3390/electronics8
101081
3. Atzori, L., Iera, A., Morabito, G.: The internet of things: a survey. Comput. Netw. 54, 2787–2805
(2010). https://doi.org/10.1016/j.comnet.2010.05.010
4. Marques, G.: Ambient assisted living and internet of things. In: Cardoso, P.J.S., Monteiro, J.,
Semião, J., Rodrigues, J.M.F. (eds.) Harnessing the Internet of Everything (IoE) for Accelerated
Innovation Opportunities, pp. 100–115. IGI Global, Hershey, PA (2019). https://doi.org/10.
4018/978-1-5225-7332-6.ch005
5. Caragliu, A., Del Bo, C., Nijkamp, P.: Smart cities in Europe. J. Urban Technol. 18, 65–82
(2011). https://doi.org/10.1080/10630732.2011.601117
6. Abdelaziz, A., Salama, A.S., Riad, A.M., Mahmoud, A.N.: A machine learning model for
predicting of chronic kidney disease based internet of things and cloud computing in smart
cities. In: Hassanien, A.E., Elhoseny, M., Ahmed, S.H., Singh, A.K. (eds.) Security in Smart
Cities: Models, Applications, and Challenges, pp. 93–114. Springer, Cham (2019). https://doi.
org/10.1007/978-3-030-01560-2_5
7. Schaffers, H., Komninos, N., Pallot, M., Trousse, B., Nilsson, M., Oliveira, A.: Smart cities
and the future internet: towards cooperation frameworks for open innovation. In: Domingue,
J., Galis, A., Gavras, A., Zahariadis, T., Lambert, D., Cleary, F., Daras, P., Krco, S., Müller, H.,
Li, M.-S., Schaffers, H., Lotz, V., Alvarez, F., Stiller, B., Karnouskos, S., Avessta, S., Nilsson,
M. (eds.) The Future Internet, pp. 431–446. Springer, Berlin (2011). https://doi.org/10.1007/
978-3-642-20898-0_31
8. Ahlgren, B., Hidell, M., Ngai, E.C.-H.: Internet of things for smart cities: interoperability and
open data. IEEE Internet Comput. 20, 52–56 (2016). https://doi.org/10.1109/MIC.2016.124
9. Chourabi, H., Nam, T., Walker, S., Gil-Garcia, J.R., Mellouli, S., Nahon, K., Pardo, T.A.,
Scholl, H.J.: Understanding Smart Cities: An Integrative Framework (2012). https://doi.org/
10.1109/HICSS.2012.615 (Presented at the January)
10. Allam, Z., Dhunny, Z.A.: On big data, artificial intelligence and smart cities. Cities 89, 80–91
(2019). https://doi.org/10.1016/j.cities.2019.01.032
11. Caravaggio, N., Caravella, S., Ishizaka, A., Resce, G.: Beyond CO2 : a multi-criteria analysis
of air pollution in Europe. J. Clean. Prod. 219, 576–586 (2019). https://doi.org/10.1016/j.jcl
epro.2019.02.115
12. Talari, S., Shafie-khah, M., Siano, P., Loia, V., Tommasetti, A., Catalão, J.: A review of smart
cities based on the internet of things concept. Energies 10, 421 (2017). https://doi.org/10.3390/
en10040421
13. Batty, M., Axhausen, K.W., Giannotti, F., Pozdnoukhov, A., Bazzani, A., Wachowicz, M.,
Ouzounis, G., Portugali, Y.: Smart cities of the future. Eur. Phys. J. Spec. Top. 214, 481–518
(2012). https://doi.org/10.1140/epjst/e2012-01703-3
14. Ning, H., Liu, H., Yang, L.T.: Cyberentity security in the internet of things. Computer 46,
46–53 (2013). https://doi.org/10.1109/MC.2013.74
15. Hernández-Muñoz, J.M., Vercher, J.B., Muñoz, L., Galache, J.A., Presser, M., Hernández
Gómez, L.A., Pettersson, J.: Smart cities at the forefront of the future internet. In: Domingue,
J., Galis, A., Gavras, A., Zahariadis, T., Lambert, D., Cleary, F., Daras, P., Krco, S., Müller, H.,
Li, M.-S., Schaffers, H., Lotz, V., Alvarez, F., Stiller, B., Karnouskos, S., Avessta, S., Nilsson,
M. (eds.) The Future Internet, pp. 447–462. Springer, Berlin (2011). https://doi.org/10.1007/
978-3-642-20898-0_32
16. Rashidi, P., Mihailidis, A.: A survey on ambient-assisted living tools for older adults. IEEE J.
Biomed. Health Inform. 17, 579–590 (2013). https://doi.org/10.1109/JBHI.2012.2234129
412 G. Marques
17. Adams, M.D., Kanaroglou, P.S.: Mapping real-time air pollution health risk for environmental
management: combining mobile and stationary air pollution monitoring with neural network
models. J. Environ. Manage. 168, 133–141 (2016). https://doi.org/10.1016/j.jenvman.2015.
12.012
18. Cetin, M., Sevik, H.: Measuring the impact of selected plants on indoor CO2 concentrations.
Pol. J. Environ. Stud. 25, 973–979 (2016). https://doi.org/10.15244/pjoes/61744
19. Shah, J., Mishra, B.: IoT enabled environmental monitoring system for smart cities. In: 2016
International Conference on Internet of Things and Applications (IOTA), pp. 383–388. IEEE,
Pune, India (2016). https://doi.org/10.1109/IOTA.2016.7562757
20. Marques, G., Saini, J., Pires, I.M., Miranda, N., Pitarma, R.: Internet of things for enhanced
living environments, health and well-being: technologies, architectures and systems. In: Singh,
P.K., Bhargava, B.K., Paprzycki, M., Kaushal, N.C., Hong, W.-C. (eds.) Handbook of Wireless
Sensor Networks: Issues and Challenges in Current Scenario’s, pp. 616–631. Springer, Cham
(2020). https://doi.org/10.1007/978-3-030-40305-8_29
21. Saini, J., Dutta, M., Marques, G.: A comprehensive review on indoor air quality monitoring
systems for enhanced public health. Sustain. Environ. Res. 30, 6 (2020). https://doi.org/10.
1186/s42834-020-0047-y
22. Stewart, D.R., Saunders, E., Perea, R.A., Fitzgerald, R., Campbell, D.E., Stockwell, W.R.:
Linking air quality and human health effects models: an application to the Los Angeles air
basin. Environ. Health Insights 11, 117863021773755 (2017). https://doi.org/10.1177/117863
0217737551
23. Walsh, P.J., Dudney, C.S., Copenhaver, E.D.: Indoor Air Quality. CRC Press, New York (1983)
24. Marques, G., Pitarma, R.: mHealth: indoor environmental quality measuring system for
enhanced health and well-being based on internet of things. J. Sens. Actuator Netw. 8, 43
(2019). https://doi.org/10.3390/jsan8030043
25. Bruce, N., Perez-Padilla, R., Albalak, R.: Indoor air pollution in developing countries: a major
environmental and public health challenge. Bull. World Health Organ. 78, 1078–1092 (2000)
26. World Health Organization: Ambient (outdoor) air quality and health. 2014. Retrieved World
Health Organ. Available https://www.who.int/en/news-room/fact-sheets/detail/ambient-(out
door)-air-quality-and-health. Last Accessed 26 Nov 2015 (2016)
27. Wild, C.P.: Complementing the genome with an “Exposome”: the outstanding challenge of envi-
ronmental exposure measurement in molecular epidemiology. Cancer Epidemiol. Biomarkers
Prev. 14, 1847–1850 (2005). https://doi.org/10.1158/1055-9965.EPI-05-0456
28. National Weather Service: Why air quality is important. https://www.weather.gov/safety/air
quality. Accessed 2019 July 21
29. European Environment Agency: Air quality in Europe: 2019 report (2019)
30. Holland, M., Spadaro, J., Misra, A., Pearson, B.: Costs of air pollution from european industrial
facilities 2008–2012—an updated assessment. EEA Technical report (2014)
31. Kaiser, J.: Epidemiology: how dirty air hurts the heart. Science 307, 1858b–1859b (2005).
https://doi.org/10.1126/science.307.5717.1858b
32. Weuve, J.: Exposure to particulate air pollution and cognitive decline in older women. Arch.
Intern. Med. 172, 219 (2012). https://doi.org/10.1001/archinternmed.2011.683
33. Liu, W., Shen, G., Chen, Y., Shen, H., Huang, Y., Li, T., Wang, Y., Fu, X., Tao, S., Liu, W.,
Huang-Fu, Y., Zhang, W., Xue, C., Liu, G., Wu, F., Wong, M.: Air pollution and inhalation
exposure to particulate matter of different sizes in rural households using improved stoves in
central China. J. Environ. Sci. 63, 87–95 (2018). https://doi.org/10.1016/j.jes.2017.06.019
34. Seguel, J.M., Merrill, R., Seguel, D., Campagna, A.C.: Indoor air quality. Am. J. Lifestyle Med.
1559827616653343 (2016)
35. Bonino, S.: Carbon dioxide detection and indoor air quality control. Occup. Health Saf. Waco
Tex. 85, 46–48 (2016)
36. Marques, G., Miranda, N., Kumar Bhoi, A., Garcia-Zapirain, B., Hamrioui, S., de la Torre Díez,
I.: Internet of things and enhanced living environments: measuring and mapping air quality
using cyber-physical systems and mobile computing technologies. Sensors 20, 720 (2020).
https://doi.org/10.3390/s20030720
Internet of Things Sensor Data Analysis for Enhanced Living … 413
37. Lu, C.-Y., Lin, J.-M., Chen, Y.-Y., Chen, Y.-C.: Building-related symptoms among office
employees associated with indoor carbon dioxide and total volatile organic compounds. Int. J.
Environ. Res. Public Health 12, 5833–5845 (2015). https://doi.org/10.3390/ijerph120605833
38. Zhu, C., Kobayashi, K., Loladze, I., Zhu, J., Jiang, Q., Xu, X., Liu, G., Seneweera, S., Ebi,
K.L., Drewnowski, A., Fukagawa, N.K., Ziska, L.H.: Carbon dioxide (CO 2 ) levels this century
will alter the protein, micronutrients, and vitamin content of rice grains with potential health
consequences for the poorest rice-dependent countries. Sci. Adv. 4, eaaq1012 (2018). https://
doi.org/10.1126/sciadv.aaq1012
39. Myers, S.S., Zanobetti, A., Kloog, I., Huybers, P., Leakey, A.D.B., Bloom, A.J., Carlisle, E.,
Dietterich, L.H., Fitzgerald, G., Hasegawa, T., Holbrook, N.M., Nelson, R.L., Ottman, M.J.,
Raboy, V., Sakai, H., Sartor, K.A., Schwartz, J., Seneweera, S., Tausz, M., Usui, Y.: Increasing
CO2 threatens human nutrition. Nature 510, 139–142 (2014)
40. Afshar-Mohajer, N., Zuidema, C., Sousan, S., Hallett, L., Tatum, M., Rule, A.M., Thomas, G.,
Peters, T.M., Koehler, K.: Evaluation of low-cost electro-chemical sensors for environmental
monitoring of ozone, nitrogen dioxide, and carbon monoxide. J. Occup. Environ. Hyg. 15,
87–98 (2018). https://doi.org/10.1080/15459624.2017.1388918
41. Marques, G., Ferreira, C.R., Pitarma, R.: Indoor air quality assessment using a CO2 monitoring
system based on internet of things. J. Med. Syst. 43 (2019). https://doi.org/10.1007/s10916-
019-1184-x
42. Marques, G., Pitarma, R.: A cost-effective air quality supervision solution for enhanced living
environments through the internet of things. Electronics 8, 170 (2019). https://doi.org/10.3390/
electronics8020170
43. Dhingra, S., Madda, R.B., Gandomi, A.H., Patan, R., Daneshmand, M.: Internet of things
mobile-air pollution monitoring system (IoT-Mobair). IEEE Internet Things J. 6, 5577–5584
(2019). https://doi.org/10.1109/JIOT.2019.2903821
44. Marques, G., Pires, I., Miranda, N., Pitarma, R.: Air quality monitoring using assistive robots for
ambient assisted living and enhanced living environments through internet of things. Electronics
8, 1375 (2019). https://doi.org/10.3390/electronics8121375
45. Taştan, M., Gökozan, H.: Real-time monitoring of indoor air quality with internet of things-
based E-nose. Appl. Sci. 9, 3435 (2019). https://doi.org/10.3390/app9163435
46. Zhao, L., Wu, W., Li, S.: Design and implementation of an IoT-based indoor air quality detector
with multiple communication interfaces. IEEE Internet Things J. 6, 9621–9632 (2019). https://
doi.org/10.1109/JIOT.2019.2930191
47. Kaivonen, S., Ngai, E.C.-H.: Real-time air pollution monitoring with sensors on city bus. Digit.
Commun. Netw. 6, 23–30 (2020). https://doi.org/10.1016/j.dcan.2019.03.003
48. Lai, X., Yang, T., Wang, Z., Chen, P.: IoT implementation of Kalman Filter to improve accuracy
of air quality monitoring and prediction. Appl. Sci. 9, 1831 (2019). https://doi.org/10.3390/app
9091831
49. Wu, Y., Liu, T., Ling, S., Szymanski, J., Zhang, W., Su, S.: Air quality monitoring for vulnerable
groups in residential environments using a multiple hazard gas detector. Sensors 19, 362 (2019).
https://doi.org/10.3390/s19020362
50. Gautam, A., Verma, G., Qamar, S., Shekhar, S.: Vehicle pollution monitoring, control and
Challan system using MQ2 sensor based on internet of things. Wirel. Pers. Commun. (2019).
https://doi.org/10.1007/s11277-019-06936-4
51. Sun, S., Zheng, X., Villalba-Díez, J., Ordieres-Meré, J.: Indoor air-quality data-monitoring
system: long-term monitoring benefits. Sensors 19, 4157 (2019). https://doi.org/10.3390/s19
194157
52. Johnston, S.J., Basford, P.J., Bulot, F.M.J., Apetroaie-Cristea, M., Easton, N.H.C., Davenport,
C., Foster, G.L., Loxham, M., Morris, A.K.R., Cox, S.J.: City scale particulate matter moni-
toring using LoRaWAN based air quality IoT devices. Sensors 19, 209 (2019). https://doi.org/
10.3390/s19010209
53. Pedersen, T.H., Nielsen, K.U., Petersen, S.: Method for room occupancy detection based on
trajectory of indoor climate sensor data. Build. Environ. 115, 147–156 (2017). https://doi.org/
10.1016/j.buildenv.2017.01.023
414 G. Marques
54. Rüffer, Daniel, Hoehne, Felix, Bühler, Johannes: New digital metal-oxide (MOx) sensor
platform. Sensors 18, 1052 (2018). https://doi.org/10.3390/s18041052
55. Sensirion: Datasheet SGP30 Sensirion gas platform. https://www.sensirion.com/fileadmin/
user_upload/customers/sensirion/Dokumente/0_Datasheets/Gas/Sensirion_Gas_Sensors_S
GP30_Datasheet.pdf. Accessed 08 Feb 2020
56. Marques, G., Pitarma, R.: Smartwatch-based application for enhanced healthy lifestyle in
indoor environments. In: Omar, S., Haji Suhaili, W.S., Phon-Amnuaisuk, S. (eds.) Computa-
tional Intelligence in Information Systems, pp. 168–177. Springer, Cham (2019). https://doi.
org/10.1007/978-3-030-03302-6_15
57. Gurney, K.R., Mendoza, D.L., Zhou, Y., Fischer, M.L., Miller, C.C., Geethakumar, S., de la
Rue du Can, S.: High resolution fossil fuel combustion CO2 emission fluxes for the United
States. Environ. Sci. Technol. 43, 5535–5541 (2009). https://doi.org/10.1021/es900806c
58. Abas, N., Khan, N.: Carbon conundrum, climate change, CO2 capture and consumptions. J.
CO2 Util. 8, 39–48 (2014). https://doi.org/10.1016/j.jcou.2014.06.005
59. Marques, G., Pitarma, R.: IAQ evaluation using an IoT CO2 monitoring system for enhanced
living environments. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) Trends and
Advances in Information Systems and Technologies, pp. 1169–1177. Springer, Cham (2018).
https://doi.org/10.1007/978-3-319-77712-2_112
60. Karagulian, F., Barbiere, M., Kotsev, A., Spinelle, L., Gerboles, M., Lagler, F., Redon, N.,
Crunaire, S., Borowiak, A.: Review of the performance of low-cost sensors for air quality
monitoring. Atmosphere 10, 506 (2019). https://doi.org/10.3390/atmos10090506
61. Honeycutt, W.T., Ley, M.T., Materer, N.F.: Precision and limits of detection for selected
commercially available, low-cost carbon dioxide and methane gas sensors. Sensors 19, 3157
(2019). https://doi.org/10.3390/s19143157
62. Marques, G., Pitarma, R.: Using IoT and social networks for enhanced healthy practices
in buildings. In: Rocha, Á., Serrhini, M. (eds.) Information Systems and Technologies to
Support Learning, pp. 424–432. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-
03577-8_47
63. Marques, G., Pitarma, R.: An internet of things-based environmental quality management
system to supervise the indoor laboratory conditions. Appl. Sci. 9, 438 (2019). https://doi.org/
10.3390/app9030438
Data Science and AI in IoT Based Smart
Healthcare: Issues, Challenges and Case
Study
The idea of Internet of Things (IoT) comes from connecting various physical objects
and devices to world using internet. A basic example of such object and device can
be HVAC (Heating, Ventilation, and Air Conditioning) monitoring and controlling
which enables a smart home. There are plenty other domains where IoT plays an
important role that improve our quality of living. In recent years this technology
has been vastly adopted by industries, healthcare, transportation, agriculture etc. IoT
enables the physical objects to see, hear perform specific jobs, which makes the
objects smart. Over the time, A lot of home and business applications will be based
on IoT which not only improve quality of living but also grow the world’s economy.
For instance, smart home can help it residents to automatically turn on the lights, fans
© The Editor(s) (if applicable) and The Author(s), under exclusive license 415
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_19
416 S. Saif et al.
when they reach home, control the air conditioning system etc. Smart healthcare can
reduce the need of visit to physicians for regular health checkups. Smart Agriculture
can help the farmers to sprinkle water to the plants automatically. Figure 1 shows the
various trending application domain of IoT.
IoT is one of the essential piece of technology that is set to improve significantly
throughout the time. There are several advantages of having things connected with
each other. With the help of sensors, it can collect a huge amount of data which
are beneficial for analysis. For example analyzing the data of a smart refrigerator,
information like power consumption, temperature etc. can be extracted so the power
efficiency can be increased. So the gathered data can be helpful to make decisions
another benefit can be ability to track and monitor things in a real time manner.
A patient suffering from critical diseases can be remotely monitored and with the
help of Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL) it
is possible to design personalized drugs. IoT Application can lighten the workload
with the help of automation. For Example in a smart home environment, doors of the
garage can be opened automatically when the owner reaches the home. So this can
greatly reduce the human effort. IoT applications can increase efficiency in terms of
saving money and other resources. For instance, Smart Lights can turn themselves
Data Science and AI in IoT Based Smart Healthcare … 417
off if no one present in the room, this can cut the electricity bills. IoT applications can
improve the quality of lifestyle, health, wellness. For example, Light weight wearable
health bands can help a person to track his body vitals, such as temperature, heart
rate, oxygen levels.
Basic architecture of any IoT application consists of four layers such as, Sensing,
Network, Processing and Application. Figure 2 depicts the traditional architecture
of an IoT Application. Sensing layer contains various sensors, such as biomedical,
environmental, electrical etc. Network layer is consists of heterogeneous networking
de-vices such as access point, router, switch etc. Components of processing layers
are CPU, GPU, cloud servers etc. Application layer is the output of feedback layer.
Data science is a relatively new term introduced to identify the technique to find
the hidden patterns that may be present in a collection of data. Essentially, data
science works with a substantially large collection of data, often termed as big data.
So, handling big data becomes an integral part of data science. Many algorithms
have been incorporated to deal with a massive collection of data of importance. On
the other hand, smart applications need to primarily deal with Internet of Things
(IoT), which may be a source of terabytes of data, if not more. Now-a-days, smart
cities, smart healthcare systems etc. have become commonplace specifically in the
developed nations. So handling that huge amount of data generated by IoT or smart
devices as a whole has become a topic of concern and hence a buzzing topic of
research as well. This chapter discusses about the utility of data science and different
algorithms that can analyze data through data science in the field of smart computing.
Now-a-days, the use of big data has become quite commonplace because most of the
business organizations, especially the e-commerce sites, social networking sites need
to deal with huge collection of data in order to understand their business requirements.
Ideally, big data means the volume of data that cannot be processed by traditional
database methodologies and algorithms. Big data should be able to deal with data
in heterogeneous formats. So data can be structured, semi-structured and unstruc-
tured. In fact, most of the gathered data come in a massive format where unstructured
information. So there may be another way of defining big data as the amount of data
that cannot be stored and managed efficiently with the traditional storage medium.
A big data set is traditionally identified by three V’s—Volume, Velocity and Variety.
Volume identifies the amount of data stored in the form of big data. Velocity iden-
tifies the speed of data acquisition and aggregation. Finally, Variety identifies the
420 S. Saif et al.
heterogeneity in the nature of acquired data. The major challenge with the present
scenario with big data is that the current growth rate with which the data are acquired
is astounding. So, the IT researchers and practitioners are facing the problem with
this exceedingly fast growth rate. The related challenges are to design an appropriate
system to handle the data effectively as well as efficiently. At the same time, another
major issue comes into the picture regarding the analysis of big data in connection
with the extraction of relevant meaning for the decision making.
A number of issues come up to understand and discuss the challenges while
working with big data. An extensive research work is required to exactly point out
the areas to be highlighted. A significant point has been put forward in [1] where
the researchers mentioned that the usefulness of the big data depends on the areas
where the big data are to be used. It means that the utility of the big data depends
on the requirements of the concerned enterprise. Not only this, the availability of the
technologies tackling the big data is also a point of concern [1]. In this context, it
is quite possible that the newly envisioned data are neither known to the knowledge
workers nor to the designers. It leads to an additional challenging situation where
the technical persons need to improvise on the interfaces, application organizations,
metaphors, conceptual models and metadata in such a way that it can be presented in a
comprehensive manner to the end users of the system. There may be a tricky problem
arising out of unknown challenges that will arise with the increase in the scale and
development of new analytics [2]. Some of these challenges may be intractable with
the tools and techniques presently dealing with big data.
A major demanding issue raised in big data design is the output process as pointed
out in [3] where it has been briefly stated that it has been always easier to get data
into a system than to get the required output from the system. The same research
work has also shown that the traditional transaction processing system is not at all
suitable to deal with the big data analytics as the latter demands OnLine Analytical
Processing (OLAP) instead of OnLine Transaction Processing (OLTP). So new tools
and techniques are required to process and analyze big data. This leads to another
challenging factor to work with big data.
The very basic nature of big data is its heterogeneity. This creates another problem
in context with big data as for the analysis purpose, all the gathered data are to be
converted into a suitable format. Converting heterogeneous data sets in the form of
structured, semi-structured and unstructured formats into a common format needs
an extensive and complex methodology.
Periodically updating the content of a big data set leads to another issue concerning
the handling of big data. As the outcome of the analysis has to be effectively and
periodically updated, the corresponding data set needs to be modified as and when
the case arises. Now the most challenging issue comes regarding this is to find
the exact amount of time after which the modification has to be reflected in the
data set. Different researchers have raised different factors concerning this, but the
major factor lies in the fact that the decision support system that is running in the
background of the system should be responsible to identify the duration of this period
and fundamentally this duration can be both static as well as dynamic as the case
may be.
Data Science and AI in IoT Based Smart Healthcare … 421
Data science and data analytics are together being utilized in different applications
related to smart computing. Any smart computing means adding a computing power
to all the connected devices like mobile phone, television, watch etc. These devices
become smart devices when they are provided with computing power and internet
connectivity and put together they are often termed as Internet of Things (IoT). The
main focus area of smart computing is to provide a real time solution to some real
422 S. Saif et al.
Fig. 3 Data feed from an IoT infrastructure for the analytics through a big data storage medium
world problems with a seamless integration of hardware, software and network plat-
forms through which the devices are connected. This section discusses about the
usefulness of the methodologies provided by both data science and data analytics in
the domain of smart computing.
Figure 3 depicts the association between an IoT platform and the data analysis
part. The massive amount of data generated in an IoT infrastructure are fed and
stored into a big data storage platform. The rightmost component of Fig. 3 collects
the data from the data storage platform and uses the data for the analysis purpose
and for successive report generation. This component utilizes the methodologies of
data science and data analytics. The usefulness of this diagram is further explained
next.
The smart devices are capable enough to produce an enormous volume data. The
problem lies in a rational, useful as well efficient analysis of these data. If they can
be properly analyzed, a huge benefit may usher in the daily life. From the business
organization’s point of view, the effectiveness may bring a more profitable scenario.
One of the application areas that is much talked about of late is in the field of power
grid. Since the traditional fossil fuels are facing the problem of depletion and the
de-carbonization demands the power system to reduce the carbon emission, smart
grid system has been identified to be an effective solution to provide electrification in
the society [6]. Different methodologies provided by data science and data analytics
are being utilized for a better regulation and dispatch of renewable energy and this
has become a major area of research topic among the researchers engaged in the
development of smart applications. Smart applications can further be extended in
the electric billing process [7]. Whereas the traditional billing system depends on
the manual collection of data, a smart grid system can be used to analyze the usage
pattern and the status of the electricity networks. This analysis can later be utilized
for demand forecasting and energy generation optimization as well.
Another significant area of application of smart computing using data science and
data analytics is in the field of smart healthcare system. With a tremendous growth of
IoT devices, tracking data from different smart devices has become quite a common-
place affair especially in the areas considered to be smart cities [8]. Data science and
data analytics are together utilized to provide innovative ideas and solutions to the
healthcare problems after analyzing the data received through smart devices present
Data Science and AI in IoT Based Smart Healthcare … 423
at the patients’ ends. The effective union of data science, data analytics and IoT
with telemedicine, e-health, m-Health for patient monitoring has been significantly
improved in the last decade and thus reformed the personalized healthcare system
by constantly monitor-ing the patients’ conditions irrespective of his/her locations
[9]. At the same time, the system is cost effective as well. So the usefulness of
data science and data analytics has been tremendous and the subsequent innovations
are considered to resolve significant issues of remote patient monitoring. The real
challenge lies in constantly acquiring data related to patients’ health and thereafter
analyzing the same. For this, different algorithms of data science and data analytics
are serving the purpose. Various biomedical sensors as IoT devices have also been
used in the patient monitoring system which provides vital biological and genetic
information. Different health-related parameters like body temperature, blood pres-
sure, pulse rate, blood glucose level, ECG, EEG etc. can be tracked using these
smart devices associated with the smart healthcare system. The data received from
the smart healthcare devices through a network are then analyzed in a centralized
system through numerous data analytic techniques in order to realize the status of the
concerned patients and accordingly either a diagnosis or an advice is prescribed. This
application is specifically useful in the areas where there is a scarcity of medical staff
or the number of ageing population is substantially more. According to an estima-
tion of The World Health Organization (WHO) the requirement and actual number
of the medical staff throughout the world will be about 82 million and 67 million,
respectively, by 2030 [10]. This huge amount of shortage in the medical staff can
be fruitfully tackled with the help of smart healthcare systems in collaboration with
data science and data analytics as mentioned above. The table shown in the next
page mentions some emergent areas of research in the field of smart healthcare
system and the corresponding data science or data analytics algorithms used. The
table also shows the citations of the work. The issues and their respective remedial
methodologies as depicted in Table 1 are discussed next.
A random forest-based algorithm for the prediction of the type-2 diabetes has
been proposed in [11] where the authors have highlighted the use of a technique
called density-based spatial clustering. In the same research paper, the researchers
have proposed another technique called synthetic minority over-sampling to predict
the hypertension issue prevalent among the patients. The techniques mentioned had
shown a significant improvement in precision and accuracy using three benchmark
datasets.
For the prediction of tuberculosis, a research work has shown the use of both the
regression model and Cuckoo search optimization algorithm [12]. This novel work
has shown that the goodness-of-fit with adjusted R-squares was more than 0.96.
Another application of data analytics was seen in the detection of an optic disk
to be applied in the retinal images [13]. In this research work, the authors have
used the statistical edge detection algorithm where more than 250 samples were
tested and the accuracy rate was more than 95%. According to the authors, the
proposed methodology could be utilized in the detection of various retinal diseases
like glaucoma, diabetic retinopathy etc.
A combined approach using deep convolution neural network and support vector
machine was taken up for the classification of lung cancers in [14]. This research
work has used more than 2000 pulmonary computed tomography images to use in
the algorithm and for the analysis of the obtained result. In this work also, the authors
have claimed an accuracy of over 90%.
A research article titled “Skin aging estimation scheme based on lifestyle and
ceroscopy image analysis” has shown the use of two data science algorithms, viz.,
polynomial regression and color histogram intersection [15] to attack the problems
related to the topics of skin condition tracing and skin texture aging estimation based
on lifestyle. The authors have involved more than 350 volunteers in the performance
evaluation and the results indicated an accuracy of more than 90%. Another research
work [16] that has been shown in Table 1 had talked about a multi-objective opti-
mization problem that was formulated for the support vector machine classification
of organ inflammations and solved by a genetic algorithm. In this research work, to
achieve a better accuracy, the authors had used a combination of kernels of support
vector machine. Effectively, the accuracy obtained through this approach was shown
to be more than 90%.
An emerging area of application of data science and data analytics is in the devel-
opment of a smart city. A smart city is a relatively new term to define a city having
some smart solutions in its infrastructure including electric supply, water supply, solid
waste management, traffic mobility, health, education, e-governance etc. According
to [17], a smart city should consist of the following dimensions:
• Smart people
• Smart mobility
• Smart environment
• Smart living
• Smart governance
• Smart economy
Data Science and AI in IoT Based Smart Healthcare … 425
Smart Applications refer to those applications that assimilate data-driven and action
oriented insights with the experience of the user. These insights are delivered as
features in smart applications that empower the users to complete a task more effi-
ciently and effectively. For these smart applications IoT is the key enabler and the
confluence of AI with IoT is believed to bring myriad changes in technological
space around mankind. IoT sensors generate and transmit a huge amount of data
(Big Data), and the conventional processes of collection, analysis and processing
this huge volume of data are falling short day by day. This urges the need for AI
techniques to be introduced along with IoT Smart Applications. With the help of
AI, machines acquire the ability to simulate the basic and complex cognitive func-
tion of the human being, through hardware implementations, software programs,
and applications. Combining AI with IoT can improve the operational efficiency of
the smart applications, through simple steps like tracking (collection of data), moni-
toring (analyzing the data), controlling, optimizing (training the model), and auto-
mating (testing and predicting). When programmed with ML (its subset) AI can be
426 S. Saif et al.
used widely in IoT for analyzing huge data generated by it and make appropriate
decisions.
As IoT generates a huge amount of data, it is essential to ensure that only adequate
data is collected and processed, as otherwise it becomes time-consuming and affects
operational efficiency. AI with several steps of data mining like data integration, data
selection, data cleaning, data transformations, and pattern evaluation can accom-
plish the task. A significant number of applications are powered by AI in IoT like
the smart healthcare industry where it can be useful for analyzing the symptoms,
prediction, and detection of severe diseases. Smart agricultural industries can use
AI techniques to perform several tasks like identifying and predict crop rotation,
assess risk management, evaluate soil properties, predict climate change, and many
more. Smart homes are provided with many features that might be enabled with
AI like enhanced security, automation of household equipments, and recognition of
activities of the dwellers especially the elders and the disabled.
give accurate results. Therefore it is suitable for Big Data analytics because as datasets
grow, Deep Learning can give better outcomes and exhibit improved efficiency. It is
used in various tasks like natural language processing, image recognition, language
translations, etc. [21].
Thus patients detected with a high risk of stroke can be diagnosed and sent for
additional checkups.
Apart from disease detection, motion analysis is widely used in computer vision
to pick behavior of moving objects in image sequences. In cardiology, images of
heart, taken using cardiac magnetic resonance imaging or CMRI, can be used to
detect cardiac patterns using Deep Learning that reveals risks of heart failure for
the patient [27]. Patients with ACHD (Adult Congenital Heart Disease) can also be
identified, based on the severity and complexity of the disease, using machine learning
algorithms, trained on huge datasets, for prognosis assessments and guide therapy
[28]. Artificial Intelligence can be useful in Ambient Assisted Living, especially
for elder and disabled people as HAR (Human Activity Recognition) has emerged
substantially in modern times. HAR can combine features from the Smartphone
accelerometer, as well as heart-rate measuring sensors worn by users, and finer
variations of heart-rate can be estimated in comparison to the resting heart-rate of
the same users using an ensemble model based on several classifiers [19, 29].
When it comes to severe neurodegenerative disorders like Parkinson’s, there are
quite a few Artificial Intelligent applications that can diagnose, monitor, and manage
the disease using data that describes the motion of lower and upper extremities. Based
on this body kinematics, therapy assessment and progress prediction of Parkinson’s
Disease can be done, especially in the early stages [30]. Another disease that is
common in occurrence nowadays is Alzheimer’s that involves high dimensional data
in computer vision and Deep Learning, with its advanced neuro-imaging techniques,
can detect and automate the classification of Alzheimer’s Disease. The combination
of SAE (Stacked Auto Encoder) along with Machine Learning, for feature selec-
tion, can give accuracy more than 98% for Alzheimer’s Disease and 83% for MCI
(Mild cognitive impairment), which is just the prodromal stage to the disease. Deep
learning approaches with fluid biomarkers and multimodal neuroimaging can give
results with better accuracy. [31]. Diabetes is another severe ailment that causes other
disorders in the human body. DKD or Diabetic Kidney Disease is one of them that
can be detected using natural language as well as longitudinal Big-data with Machine
Learning. Artificial Intelligence when combined with the Deep learning concept of
Convolutional Neural Network, could extract features for more than 60,000 diabetic
patients, based on their medical records, using convolutional auto-encoder and logical
regression analysis which resulted in more than 70% of accuracy [32]. Nevertheless,
one of the biggest threats to mankind is Cancer that causes death at any stage of
its severity. Researches reveal that Deep Learning technologies like Convolutional
Neural Network can be applied in cancer imaging that assists pathologist to detect
and classify the disease at earlier stages, thus improving chances of the patient to
survive, especially for lung, breast and thyroid cancer [33]. Contributions of AI in
the healthcare paradigm as discussed above in detail are presented in Fig. 6.
Data Science and AI in IoT Based Smart Healthcare … 431
In last few years, there is an exponential growth of fields like ecommerce, Internet
of Things, healthcare, social media etc. which generates more than 2.5 quintillion
bytes of data every day. Possible sources of data can be the purchase made though
online shopping, various sensors installed in the smart cities, health information of
human body etc. These huge data are termed as Big Data and processing those data
to extract meaningful information is known as Data Science. Some of the potential
applications has been shown in Fig. 7. In this section we have done few case studies
on the application of Data Science in smart applications.
In [34], Authors has used clickstream data to predict the shopping behavior of
ecommerce users. This can reduce the cost of digital marketing and increase the
revenue. They have observed that supervised machine learning (SML) technique is
not suitable for analysis of these kind clickstream data due its sequential structure.
They have also applied a model based on Recurrent Neural Network (RNN) to the
real world e-commerce data for campaign targeting. Experimental results show the
comparison between the RNN based method and traditional SML based techniques.
Based on the results they have developed an ensemble of sequence and conventional
classifiers which outperforms in terms of accuracy.
One of the potential application of IoT is Smart home where energy consumption
are monitored through some sensors. Authors [35] has proposed an energy manage-
ment scheme which can decrease the cost of residential, commercial and industrial
sectors still after meeting the energy demand. Some sensors have been in home appli-
ances used to gather the energy consumption data and those data are sent centralized
432 S. Saif et al.
server. At the server end further processing and analysis has been done thorough
Big Data analytics package. Since 60% of the energy demand is generated from Air
conditioning in Arab Countries, author has considered managing the devices related
with air conditioning. They has made a prototype and also deployed that in a small
area for testing.
In Smart healthcare, wearable sensors generate enormous data, which need to
be stored securely as well processed for knowledge extraction. An architecture has
been proposed by authors [36] to store and process scalable health data for health
care applications. Proposed architecture consists of two sub architecture namely
Meta Fog-Redirection (MF-R) and Grouping and Choosing (GC) architecture. Where
Apache Pig and Apache HBase has been used in MF-R architecture to efficiently
collect and store health data. On the other hand GC is responsible for to secure the
integration of Fog with cloud computing. Key management and categorization of
data such as Normal, Critical and Sensitive is also ensured by GC architecture. Map
Reduce based prediction model has been applied on the health data for heart disease
prediction.
In Recent years, Diabetes diagnosis and personalized treatment has been proposed
by Author [37] where vital data of patients suffering from Diabetes has been used.
Body vitals such as blood glucose, temperature, ecg and blood oxygen has been
collected using wearable sensors. 12,366 persons were involved in the experiment
and 757,732 number health vitals were collected. By removing the irrelevant data
they found 716,173 records of 9594 persons. Now they have identified that only
469 persons are suffering from Diabetes and rest of them are normal. Based on these
dataset three machine learning algorithms has been used to establish different models
for diagnosis of diabetes and to provide personalized treatment.
In [38], authors have proposed a method to detect urban emergency through
crowdsourcing technique. Crowdsourcing is a technique of collection, integration
and analysis of enormous data generated by various sources such as devices, vehi-
cles, sensors, human etc. Due to growth in use of social media, it has become a big
Data Science and AI in IoT Based Smart Healthcare … 433
data store, authors has used geographic data and events if users to identify urban
emergency event in a real time manner, their proposed model is called 5 W (What,
Where, When, Who and Why).They have used a social media called “Weibo”, which
is similar to “Twitter” and popular in China. They have gathered the data based on 5 W
for emergency event detection. Case studies on real dataset has been also conducted
by them and results show that their proposed model obtained good performance and
effective ness in terms of analysis and detection of such emergency events.
Fraud detection in finance sector is an important application of data science.
Authors have introduced a Fraud Risk Management model for well-known organiza-
tion Alibaba by analyzing Big Data [39]. They have prepared a fraud risk monitoring
system based on real-time big data and processing models. It can identify frauds by
analyzing huge volume of user behavior and network data. They have termed the
system as AntBuckler which is responsible to identify and prevent all kind of mali-
cious activity of users related to payment transactions. It uses RAIN score engine to
determine the risk factor of a user and shows it through user friendly visualization.
When it comes Smart manufacturing, Data science has a big contribution in it. In
[40], author has presented a system architecture for analyzing manufacturing process
based on event logs. Well known big data toolbox Hadoop has been used for process
mining. They have also prepared a prototype where an event log is need to supplied,
and it can discover the manufacturing process and prepare animation depending on
those discovered process model. Based on the activities it shows the working time
of each activity, total execution time for each case.
Machine Learning (ML) is a one of the technique of data analysis which is used for
analytical model building. It is an application of Artificial Intelligence (AI) that makes
a system learn automatically and improve without being programmed every time.
Similarly Deep Learning (DL) is a branch of Machine Learning based on Artificial
Neural Network. It is more capable of unsupervised learning where data is mostly
unlabeled or unstructured. For ML based technique it requires structured data, so
the main difference between ML and DL is the way to represent data in the system.
Table 2 shows a small comparison between ML and DL, basic architecture has been
shown in Fig. 8. In this section we have made few case studies on application of ML
and DL.
Machine Learning and Deep Learning has been widely adopted by various do-
mains such as, Healthcare, Banking, Natural Language Processing, Information
Retrieval etc.
One of the popular application of Machine Learning is Drug Discovery. In [41]
authors has discussed about biological-structure-activity-relation (QSAR) modeling,
which is a ML based drug discovery method. It is very much effective in identifying
potential biological active modules from a lot of candidate compounds. Commercial
Drugs such as CCT244747, PTC725, RG7800, and GDC-0941 has been discovered
by ML technique in the year 2012,2014,2016,2015 respectively. In recent years drug
discovery has been replaced by Deep learning methods. Since DL has ability of
powerful and parallel computing using GPU, so it helps in accumulation of massive
amount of biomedical data.
Another Drug Discovery based on chemo informatics has been proposed in the
year 2018. Here authors in [42] has shown the use of compound database to extract
the chemical features by characterizing them in chemical substructure fragments.
Based on the absense or presense of substructure fragments. Chemical fingerprint
has been created and finally QSAR approach has been used to train the machine
learning model so the compound property can be predicted.
A framework for prediction of chronic disease such as heart failure, kidney failure
and stroke has been proposed by Authors in [43] using Deep Learning Techniques.
Electronic Health Record (EHR) information has been used in their work which
contains free-text medical notes and structured information. This framework can
Data Science and AI in IoT Based Smart Healthcare … 435
accept negations and numerical values present in the text and it does not require
any feature extraction based on disease. Authors has compared the performance of
various deep learning architecture such as CNN, LSTM etc. Experimental results
on cohort of 1 million patient shows that the models using text performs better than
model using structured data.
Smart Grid is one of the potential application of Machine Learning. Authors
[44] a hybrid technique to predict the electricity consumption of air conditioner
for the next day. This forecast helps the power grids to efficiently balance the power
demand. Their technique is a combination of linear autoregressive integrated moving
average model and a nonlinear nature-inspired meta-heuristic model. They have
collected Training and Testing data using various environmental sensors, infrared
sensor and smart meters from an office. Experimental results shows that the hybrid
model performs better than traditional conventional linear and nonlinear models in
terms of prediction accuracy. Proposed model has achieved correlation factor R =
0.71 and Error rate = 4.8%.
Deep Learning algorithms has been used by authors [45] propose a Distributed
Intelligent Video Surveillance system (DIVS). Proposed system uses a three layer
architecture, first consists of monitoring cameras connected with edge servers, in
second layer another edge servers are present, in the third layer cloud server are
present which are connected the edge servers present in second layer. Proposed
system uses parallel training, model synchronization and workload balancing. This
parallel processing technique accelerates the video analysis process. CNN model
has been used for vehicle classification and for traffic flow prediction LSTM has
been applied. Both of this models are working on edge nodes which results parallel
training.
Anomaly detection in network traffic is another important application of Machine
Learning. In [46], authors proposed anomaly detection in wide area network meshes.
Two machine learning algorithm, Boosted Decision Tree (BDT) and a simple feed
forward neural network has been applied on network data. Authors has used PerfSO-
NAR servers data which are collected using Open Science Grid. Objective of this
work is increase the network performance by reducing anomalies, such as packet loss,
throughput etc. Performance evaluation depicts that Boosted Decision Tree works
better in terms of speed and accuracy in detection of anomalies.
Deep Learning approaches has been widely adopted in IoT, One of the recent
work [47] shows the use Deep Learning technique for traffic load prediction and
efficient channel allocation for IoT Application. Since IoT generates a huge volume
of data, hence more transmission speed as an necessity. In conventional method
fixed channels are used for data transmission which is ineffective now days. Authors
has proposed an intelligent technique based on deep learning for Traffic Load (TL)
Prediction system to forecast the future traffic load and congestion in the network.
Also a Channel Allocation algorithm terms as DLPOCA which is based on Partially
Overlapping Channel Assignment (POCA) has been described by them. They have
three types of proposed traffic load prediction such as Central controlbased Traffic
load Prediction (CTP), Semi-Central control Traffic load Prediction(S-CTP), and
436 S. Saif et al.
Distributed control Traffic load Prediction (DTP). Deep Belief Architecture (DBA)
and Deep Convolutional Neural Network (CNN) has be used for this purpose.
In [48], Authors has applied machine learning algorithms such as Decision Tree,
naive Bayes and maximum entropy level for detection of radiology reports where
follow-up is mentioned. This is one of the application of Natural Language Processing
using feature engineering and machine learning technique. This application is very
much effective for the patients who are from possible cancer and require follow-up.
Most of the radiology reports contain follow-up information. A dataset containing
6000 free text reports has been used for training and testing, 1500 features has been
determined using NLP. Experimental results show that decision tree achieves F1
score of 0.458 and accuracy of 0.862 which is better than other two algorithms.
6 Conclusion
In this literature a review on Internet of things, ML, DL and AI based smart health-
care ensuring improved human living specially for elderly and diseased people with
support has been presented. For continuous monitoring of health vitals for regular
health checkup purposes of elderly suffering from age related ailments without phys-
ically seeing a makes this IoT enabled smart applications usable and convenient for
them. Health vitals such as blood pressure, blood sugar, heart rate, pulse, body temper-
ature, ECG signal etc. can easily be transmitted to remote medical cloud server where
from doctor can access the analysis report of health vitals processed using AI, ML
and DL techniques or directly health vitals can be seen by the doctor if it is need-ed.
Accordingly if any medicinal advice is sufficient then that easily be given to the
patient or the relative through SMS or smartphone applications (APP). Otherwise if
hospitalization is required that also can immediately be intimated from remote end.
Health data analysis using ML, DL and AI has open up several revolutionary research
wings such as disease, predictions, disease detection, drug discovery etc. Also smart
health sensor based motion monitoring helps to identify activity or posture and fall
event from remote end maintaining privacy, accordingly alert can be generated for
ensuring timely support. While realizing all these features to live improved quality
of life comes with a challenge to handle big health data efficiently taking care of
issues like data redundancy, erroneous or insignificant data, accuracy of diagnosis
or analysis by executing necessary processing like data integration, data selection,
data cleaning, data transformations, and pattern evaluation applying AI with several
steps of data mining. Several case studies on application of data science and machine
learning helps to get practical systems and applications realized through Internet of
things, data science and AI support.
Data Science and AI in IoT Based Smart Healthcare … 437
References
1. Stonebraker, M., Hong, J.: Researchers’ big data crisis; understanding design and functionality.
Commun. ACM 55(2), 10–11 (2012)
2. Kaisler, S., Armour, F., Espinosa, J.A., Money, W.: Big Data: issues and challenges moving
forward. In: Proceedings of 46th International Conference on System Sciences, pp. 995–1004
(2013)
3. Jacobs, A.: Pathologies of Big Data. Commun. ACM 52(8), 36–44 (2009)
4. Saif, S., Biswas, S., Chattopadhyay, S.: Intelligent, secure big health data management using
deep learning and blockchain technology: an overview, in deep learning techniques for
biomedical and health informatics. Studies Big Data 68, 187–209 (2020)
5. Kaisler, S.: Advanced analytics. CATALYST Technical Report, i_SW Corporation, Arlington,
VA (2012)
6. Ak, R., Fink, O., Zio, E.: Two machine learning approaches for short-term wind speed time-
series prediction. IEEE Trans Neural Netw. Learn. Syst. 27(8), 1734–1747
7. Zhang, Y., Huang, T., Bompard, E.F.: Big data analytics in smart grids: a review. Energy Inform.
(2018)
8. Lytras, M.D., Chui, K.T., Visvizi, A.: Data analytics in smart healthcare: the recent develop-
ments and beyond. Appl. Sci. (2019)
9. Lytras, M.D., Visvizi, A.: Who uses smart city services and what to make of it: toward
interdisciplinary smart cities research. Sustainability (2018)
10. Scheffler, R., Cometto, G., Tulenko, K., Bruckner, T., Liu, J., Keuffel, E.L., Preker, B., Stilwell,
B., Brasileiro, J., Campbell, J.: Health workforce requirements for universal health coverage
and the sustainable development goals. World Health Organization (2016)
11. Ijaz, M.F., Alfian, G., Syafrudin, M., Rhee, J.: Hybrid prediction model for type 2 diabetes
and hypertension using DBSCAN-based outlier detection, synthetic minority over sampling
technique (SMOTE), and random forest. Appl. Sci. (2018)
12. Wang, J., Wang, C., Zhang, W.: Data analysis and forecasting of tuberculosis prevalence rates
for smart healthcare based on a novel combination model. Appl. Sci. (2018)
13. Ünver, H.M., Kökver, Y., Duman, E., Erdem, O.A.: Statistical edge detection and circular hough
transform for optic disk localization. Appl. Sci. (2019)
14. Polat, H., Mehr, H.D.: Classification of pulmonary CT Images by using hybrid 3D-deep
convolutional neural network architecture. Appl. Sci. (2019)
15. Rew, J., Choi, Y.H., Kim, H., Hwang, E.: Skin aging estimation scheme based on lifestyle and
dermoscopy image analysis. Appl. Sci. (2019)
16. Chui, K.T., Lytras, M.D.: A novel MOGA-SVM multinomial classification for organ inflam-
mation detection. Appl. Sci. (2019)
17. Moustaka, V., Vakali, A., Anthopoulos, L.G.: A systematic review for smart city data analytics.
ACM Comput. Surv. 51(5), 103–0143 (2018)
18. Nuaimi, E.A., Neyadi, H.A., Mohamed, N., Al-Jaroodi, J.: Applications of big data to smart
cities. J. Internet Serv. Appl. 6–30 (2015)
19. Shapiro, S.C.: Encyclopedia of Artificial Intelligence, 2nd edn. Wiley, New York (1992)
20. Reddy, S.: Use of artificial intelligence in healthcare delivery. In: eHealth-Making Health Care
Smarter. IntechOpen (2018)
21. Jakhar, D., Kaur, I.: Artificial intelligence, machine learning and deep learning: definitions and
differences. Clin. Exp. Dermatol. 45(1), 131–132 (2020)
22. Cullell-Dalmau, M., Otero-Viñas, M., Manzo, C.: Research techniques made simple: deep
learning for the classification of dermatological images. J. Invest. Dermatol. 140(3), 507–514
(2020)
23. Liu, Y., Gadepalli, K., Norouzi, M., Dahl, G.E., Kohlberger, T., Boyko, A., Venugopalan, S.,
et al.: Detecting cancer metastases on gigapixel pathology images. arXiv preprint arXiv:1703.
02442 (2017)
438 S. Saif et al.
24. Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., et al.: Chexnet:
radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:
1711.05225 (2017)
25. Ismael, S.A.A., Mohammed, A., Hefny, H.: An enhanced deep learning approach for brain
cancer MRI images classification using residual net-works. Artif. Intell. Med. 102 (2020)
26. Cheon, S., Kim, J., Lim, J.: The use of deep learning to predict stroke patient mortality. Int. J.
Environ. Res. Pub. Health 16(11) (2019)
27. Bello, G.A., Dawes, T.J.W., Duan, J., Biffi, C., De Marvao, A., Howard, L.S.G.E., Simon, J.,
Gibbs, R., et al.: Deep-learning cardiac motion analysis for human survival prediction. Nat.
Mach. Intell. 1(2), 95–104 (2019)
28. Diller, G.-P., Kempny, A., Babu-Narayan, S.V., Henrichs, M., Brida, M., Uebing, A., Lammers,
A.E., et al.: Machine learning algorithms estimating prognosis and guiding therapy in adult
congenital heart disease: data from a single tertiary centre including 10 019 patients. Eur. Heart
J. 40(13), 1069–1077 (2019)
29. Saha, J., Chowdhury, C., Roy Chowdhury, I., Biswas, S., Aslam, N.: An ensemble of condition
based classifiers for device independent detailed human activity recognition using smartphones.
Inf. MDPI 9(4), 94, 1–22 (2018)
30. Belić, M., Bobić, V., Badža, M., Šolaja, N., Ðurić-Jovičić, M., Kostić, V.S.: Artificial intelli-
gence for assisting diagnostics and assessment of Parkinson’s disease–a review. Clin. Neurol.
Neurosurg. (2019)
31. Jo, T., Nho, K., Saykin, A.J.: Deep learning in Alzheimer’s disease: diagnostic classification
and prognostic prediction using neuroimaging data. Front. Aging Neurosci. 11 (2019)
32. Makino, M., Yoshimoto, R., Ono, M., Itoko, T., Katsuki, T., Koseki, A., Kudo, M., et al.:
Artificial intelligence predicts the progression of diabetic kidney disease using big data machine
learning. Sci. Rep. 9(1), 1–9 (2019)
33. Coccia, M.: Deep learning technology for improving cancer care in society: new directions in
cancer imaging driven by artificial intelligence. Technol. Soc. 60 (2020)
34. Koehn, D., Lessmann, S., Schaal, M.: Predicting online shopping behaviour from clickstream
data using deep learning. Expert Syst. Appl. 150 (2020)
35. ALi, A.R.Al., Zualkernan, I.A., Rashid, M., Gupta, R., Alikarar, M.: A smart home energy
management system using IoT and big data analytics approach. IEEE Trans. Consumer
Electron. 63(4), 426–434 (2017)
36. Manogaran, G., Varatharajan, R., Lopez, D., Kumar, P.M., Sundarasekar, R., Thota, C.: A
new architecture of Internet of Things and big data ecosystem for secured smart healthcare
monitoring and alerting. Future Gener. Comput. Syst. 375–387 (2017)
37. Chen, M., Yang, J., Zhou, J., Hao, Y., Zhang, J., Youn, C.: 5G-smart diabetes: toward person-
alized diabetes diagnosis with healthcare Big Data clouds. IEEE Commun. Mag. 56(4), 16–23
(2018)
38. Xu, Z., Liu, Y., Yen, N., Mei, L., Luo, X., Wei, X., Hu, C.: Crowdsourcing based description
of urban emergency events using social media big data. IEEE Trans. Cloud Comput. (2016)
39. Chen, J., Tao, Y., Wang, H., Chen, T.: Big data based fraud risk management at Alibaba. J.
Finance Data Sci. 1(1), 1–10 (2015)
40. Yang, H., Park, M., Cho, M., Song, M., Kim, S.: A system architecture for manufacturing
process analysis based on big data and process mining techniques. In: IEEE International
Conference on Big Data (Big Data). Washington, DC, pp. 1024–1029 (2014)
41. Zhang, L., Tan, J., Han, D., Zhu, H.: From machine learning to deep learning: progress in
machine intelligence for rational drug discovery. Drug Discov. Today 22(11), 1680–1685 (2017)
42. Lo, Y.-C., Rensi, S.E., Torng, W., Altman, R.B.: Machine learning in chemoinformatics and
drug discovery. Drug Discov. Today 23(8), 1538–1546 (2018)
43. Liu, J., Zhang, Z., Razavian, N.: Deep EHR: chronic disease prediction using medical notes.
arXiv:1808.04928, https://arxiv.org/abs/1808.04928
44. Chou, J., Hsu, S., Ngo, N., Lin, C., Tsui, C.: Hybrid machine learning system to forecast
electricity consumption of smart grid-based air conditioners. IEEE Syst. J. 13(3), 3120–3128
(2019)
Data Science and AI in IoT Based Smart Healthcare … 439
45. Chen, J., Li, K., Deng, Q., Li, K., Yu, P.S.: Distributed deep learning model for intelligent video
surveillance systems with edge computing. IEEE Trans. Ind. Inform. (2019)
46. Zhang, J., Gardner, R., Vukotic, I.: Anomaly detection in wide area network meshes using two
machine learning algorithms. Future Gener. Comput. Syst. 93, 418–426 (2019)
47. Tang, F., Fadlullah, Z.M., Mao, B., Kato, N.: An intelligent traffic load prediction-based adap-
tive channel assignment algorithm in SDN-IoT: a deep learning approach. IEEE Internet Things
J. 5(6), 5141–5154 (2018)
48. Lou, R., Lalevic, D., Chambers, C., Zafar, H.M., Cook, T.S.: Automated detection of radiology
reports that require follow-up imaging using natural language processing feature engineering
and machine learning classification. J. Digit Imaging 33, 131–136 (2020)
IoT Sensor Data Analysis and Fusion
Applying Machine Learning
and Meta-Heuristic Approaches
1 Introduction
A. Saha
Department of Information Technology, Techno Main Salt Lake, Kolkata, West Bengal, India
e-mail: [email protected]
C. Chowdhury
Computer Science and Engineering, Jadavpur University, Kolkata, India
e-mail: [email protected]
M. Jana
Department of Computer Science, Bijay Krishna Girls’ College, Howrah, India
e-mail: [email protected]
S. Biswas (B)
Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology,
Kolkata, West Bengal, India
e-mail: [email protected]
© The Editor(s) (if applicable) and The Author(s), under exclusive license 441
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_20
442 A. Saha et al.
By definition, (IoT) is the network of physical objects that contain embedded tech-
nology to communicate and sense or interact with their internal states or the external
environment [1]. IoT generates massive amount of heterogeneous data, and the collec-
tion, processing, and storage of such big volume of data is cumbersome for humans.
Further, tasks like proper classification of data collected by IoT sensors, defining
accurate patterns & behaviors, processing them for assessment and prediction, and
finally sending it back to the device for decision making-all these need additional
mechanisms like ML to provide embedded intelligence to IoT. ML encompasses a
wide range of applications like computer vision, bio-informatics, fraud detection,
IDS, face recognition, speech recognition and many more. In this chapter, several
smart application domains that require IoT data analysis are discussed, followed by
influence of ML in such smart applications. Further, it is discussed how data anal-
ysis of IoT sensors can be improved by applying meta-heuristics approaches, and
subsequently, a detailed analysis of the overall effect of ML and Meta-Heuristics on
smart applications is presented for ready reference.
IoT applications mostly follow a 3-tier architecture (as shown in Fig. 1) where the
first layer is the end user layer consisting of the sensors carried by people or placed at
certain dedicated places. Smartphone is a potential source of IoT data. For smart home
applications, consumer electronic devices with embedded sensors are also potential
data sources along with the Smartphones. The sensed data from the end users are
sent to an edge device for transmitting it to cloud servers over the Internet. The
edge devices can also perform certain computations, such as, data preprocessing and
extracting knowledge from data so that the cloud servers only receive the extracted
knowledge and not the actual data. In this way, not only the Internet traffic gets
reduced but also data privacy over the Internet could be preserved. Cloud servers
actually perform data analysis to find meaningful patterns and accordingly provide
services back to the end user layer.
volumes (millions of terabytes), called Big Data, that has varied characteristics and
velocities. It can be audio, video, images, logs, posts, emails, health records, social
networking interactions and many more. The various steps of data analysis can be to
capture, store, manage, analyze and visualize these data in massive volumes, which
will efficiently and effectively support decision making process in present and future.
Several components of a smart city need data analysis for separate reasons. For
example, in a Smart Parking system, which are real time solutions to detect empty
parking spaces in a geographical area with the help of sensors and cameras and place
vehicles at the right position, data analysis can inform a driver about the nearest and
most suitable parking slot for his car. In Smart Traffic, proper management of traffic
signals on road, with help of IoT sensors, an efficient real time analysis can be bene-
ficial for both the citizens as well as government. Traffic information received from
these sensors can provide useful information like number of vehicles on road, current
location and the distance between two vehicles at any instance, of time. Further in
case of an accident, sensors can send timely alerts to the police as well as the doctors
simultaneously. Smart cities must ensure a Smart Environment which refers to an
ecosystem that has embedded sensors, actuators and other computational devices to
monitor and control environmental factors like air pollution, noise, effective water
supply, planning green areas, and many more. So environmental IoT sensors should
be able to inform about several environmental anomalies like increment of toxic
gases in air, excess noise in environment, or even the proper balance of natural gases
like O2 , CO, CO2 , SO2 etc. Smart Waste management is another important compo-
nent of a smart city where sensors are placed inside waste bins all across the city,
to measure filling levels, and provide notification when they need to be emptied.
IoT can optimize the data driven collection services by analyzing the historically
collected data to identify the filling patterns, especially in densely populated area
of the city. IoT can also optimize driver routes, garbage pick up schedules, and this
significantly reduce operational costs.
Other than Smart city, data analysis is utmost essential for other application
domains like Smart Home and Smart Healthcare. Smart home refers to a home
where appliances (lights, fans, fridge, TV) and devices are controlled automatically
and remotely through Internet connections, via smart-phones or other networked
devices. The home may be monitored on a continuous basis through several sensors
like smoke detectors for file alarms, and temperature detectors for extreme weathers.
Sensors can also inform about power electricity consumptions and gas usage, water
consumption etc. for a trouble free household. Smart Healthcare defines the tech-
nology and devices (sensors, actuators) that provides better tools for diagnosis and
better treatments for patients as a whole. The technology used in Smart Healthcare
can record health information through deployed sensors, store and process them
automatically and send these to remote medical servers through Internet [2]. Smart
Agriculture is another application domain where data analytics can play a huge
role. It is a concept that refers to manage agriculture through ICT, sensors, which
can collect valuable information about soil, light, water, temperature, humidity of the
farm and many more, thus minimizing or eliminating human labor needed to process
these.
IoT Sensor Data Analysis and Fusion Applying Machine Learning … 445
Machine Learning and Deep Learning algorithms have gained immense popularity
in various applications such as smart healthcare, computer vision, surveillance
natural language processing etc. These techniques are resource hungry specially deep
learning algorithms demanding higher computational power, memory and battery
power requirements etc., due to algorithmic complexity. Also it demands high level
of resources for training deep learning models to get trained using huge dataset
created by edge devices such as smartphone sensors or Internet of things enabled
sensors. Now for learning at cloud, these huge dataset needs to be transferred to cloud
server through internet which is vulnerable to security attacks and threats which may
violate security and privacy of sensitive data such as patients’ health vitals. Also long
distance transmission causes latency between data acquisition to data processing and
consumes higher bandwidth, energy etc. Along with that one more vital point gets
added in this era of Internet of things is scalability issue which can better be addressed
by computing learning algorithms through edge computing [3, 4] by exploiting hier-
archical structure formed by edge devices edge compute node such as local server
and cloud server at distance. This also helps to avoid single point failure due to central
cloud server failure issue. So in short, advantages learning at advanced edge devices
equipped with high volume of resources e.g. memory, processing ability, energy
source, IoT compatibility etc. are: (i) Localized learning hence low latency, (ii) Data
gets processed for training learning models close to point of origin in learning at edge
hence security attacks and threats towards confidentiality, integrity, authentication,
data privacy while getting transmitted over the open internet gets much reduced. (iii)
Correctness/accuracy in learning at edge devices: Learning using acquired dataset
at edge devices reduces probability of erroneous data due to illegal modifications by
security attacks and threats at open wireless channel during transmission from edge
devices to cloud server. Thus learning using correct data ensures proper and correct
knowledge building.
In spite of a number of obvious advantages of edge computing, it is only not being
used rather learning at cloud server or a combination of edge devices and cloud server
(as shown in Fig. 3) are preferred due to several reasons which are being reported
here. Edge devices such as sensor nodes with computation ability, memory and
battery power, smartphone, tablet computer, laptop have limited resources. Machine
and deep learning models need extensive training using huge dataset (70–80% of total
dataset) for knowledge building. These training algorithms are resource hungry. Also
trained models need to be tested using 20%–30% of total dataset which is also not
of very low volume. Sometimes to execute such volume of computations in resource
446 A. Saha et al.
constrained edge devices may not be possible at all or may be possible for few times.
Besides this, some applications such as smart healthcare or smart transportation etc.
are time critical hence latency tolerance is very low. Such specialized applications
may be based on edge computing.
Deep learning neural network models are prevalent so edge computing. Cloud
server execution of heavyweight deep learning and machine learning models using
high volume of Electronic health record (EHR) has proved to be convenient in terms
of resource consumption and rigorous learning supported by as complex algorithms
can be repeatedly executed to find optimum result with as many rounds or hidden
layers or data set to avoid problems of overfitting or underfitting so that higher accu-
racy can be achieved. Deep learning models execution in edge computing needs
comparative performance evaluation by computing models on heterogenous hard-
ware devices such as including the simple devices (e.g., Raspberry Pi), smartphones,
home gateways, and edge servers. Much of the current work has focused on either
on powerful servers or on smartphones [5].
In past few years Internet of Things (IoT) has rapidly developed for people’s comfort.
The IoT basically provides the connection between various physical devices, hard-
ware and software through internet and the main purpose is exchange of data among
IoT Sensor Data Analysis and Fusion Applying Machine Learning … 447
these devices. The IoT enhances the standard of people’s living like smart cities have
so many opportunities- to predict traffic jams, reduces congestion and pollution,
detection of traffic violations, searching of suitable parking space etc.; smart home
services improving the safety measures, connection between the home appliances;
healthcare- continuous monitoring and reporting, assurance of real time medical
data delivery, disease prediction and identification, remote medical assistance. For
efficient processing and intelligent analysis of IoT sensor data so that evitable infor-
mation extraction can be done towards monitoring, support, disease identification,
prediction and diagnosis. Besides, smart healthcare, there are several other smart
applications are there in which enabling technologies such as sensor, IoT, cloud etc.
are there for data acquisition, transmission and storage. But to handle huge data intel-
ligent ML algorithms are being applied. Followings are some of the ML applications
in handling IoT sensor data in several smart applications.
Smart Healthcare: Smart Healthcare is one of the most important paradigms under
IoT. Now a days, it is not only limited to patients who are getting admitted to hospitals
but also for those patients who are taken care at their homes or any care-giving centers.
Hence we have to design a framework to give equal importance to all patients. IoT
together with cloud computing facility give us an opportunity to provide a patient-
centric environment where high quality treatment can be given in low cost budget. In
general, the IoT architecture consists of sensor devices to collect real time data of a
patient, a suitable transmission medium (e.g. Wi-Fi, Bluetooth etc.) to transmit data
in real-time environment, cloud platform which is a large storage area with complex
computing capacity, various machine learning approaches are used to predict and
detect diseases, the front-end user-friendly accessibility where a doctor can easily
monitor a lot of patients at the same time. To get the result with high accuracy
IoT-cloud computing integration should be done effectively.
There are several types of sensor devices used in IoT environment for monitoring
patients’ activities. It may be temperature sensor, pressure sensor, blood-oxygen
level measurement sensor, blood-glucose level measurement sensor, accelerometer,
Electrocardiogram (ECG), Electroencephalogram (EEG), altimeter, GPS location
tracker etc. They can either work individually or as a combination of sensors. In case
of heart related problem the ECG signal is used to detect it. In [6], authors represent
how ECG signals with SVM and other ML classifiers are used for arrhythmic beat
classification, as shown in Fig. 4. Generally two common heart diseases- Myocar-
dial ischaemia and Cardiac arrhythmias are reported which can be identified using
electrocardiogram (ECG) based analysis. The collected ECG signals from MIT-BIH
databases are preprocessed using ML based filters i.e. LMS algorithm. The delayed
LMS algorithm is used to calculate updated weights (Weight update equation: w(n
+ 1) = w(n) + µx(n-kD)e(n-kD) where, n, w(n), w(n + 1) denote time step, old
weight, updated weight respectively. M is the step size, x (n) is the filter input and e
(n) is the error signal.), error calculation (e(n-kD) = d(n-kD)−y(n-kD)) and output
estimation (y(n-kD) = wT (n-kD) x(n-kD) kD defines number of delays used in the
pipelining stage.) in pipelined form. Finally a Normalized LMS (NLMS) algorithm
is used for coefficient update equation (w(n + 1) = w(n) + µn x(n) e(n) where, µn
448 A. Saha et al.
is the normalized step size which is defined as µn = [µ/p + xt (n) x(n)]). After noise
removal ECG signal is used for feature extraction using Discrete Wavelet Transform
(DWT) technique. Authors have chosen coiflet wavelet for extraction of R-peak. This
R-peak is based heart rate function to calculate RR interval i.e. continuous moni-
toring heart rate within a minute; practically it is ±1 ms. This R-peak is used to
determine heart rate variability (HRV). This extracted HRV features can be repre-
sented in time domain or frequency domain. Authors used 14 time domains and
frequency domain HRV features. The support vector machine (SVM) is used to clas-
sify the extracted HRV feature based on frequency domain (Fig. 4). Three frequency
bands (VLF, LF, and HF) and frequency ratio (LF/HF) are used. Total 200 HRV data
is classified among them 180 (90%) is training data and 20 (10%) is test data set.
The SVM classifier gives 96% accurate results which is higher than any other ML
technique. Therefore, on SVM classifier HRV data can be easily classified as normal
or arrhythmic.
Generally, machine learning techniques are categorized into two types- 1. Super-
vised learning, 2. Unsupervised learning. In Supervised learning, a set of person’s
statistical data that is height, weight, smoking status, lifestyle disease (e.g. diabetes)
is successfully trained to a model. The dataset may be discrete or continuous. For
discrete category, the model should answer either positive or negative based on prob-
ability. Any probability greater than threshold value is classified as a positive. For
continuous value, regression algorithm is used to predict a person’s life expectancy or
dose of chemotherapy. In unsupervised learning, datasets are analyzed and grouped
IoT Sensor Data Analysis and Fusion Applying Machine Learning … 449
into clusters which represent similar types of data points. Other uncommon data are
just excluded. Here authors follow supervised learning and develop a model using
R statistical programming environment. They use “Breast cancer Wisconsin Diag-
nostic Dataset” which is available from University of California Irvine (UCI). It is a
clinical dataset of the fine-needle aspiration (FNA) of cell nuclei taken from Breast
masses. The features of FNA samples which are available as digitize image are
extracted using linear programming, pattern separation or Decision tree construction
etc. the collected FNA sample is divided into Malignant and benign depending on
FNA. Furthermore, the classification is done with respect to features of the dataset.
Now the machine learning approach is used on the dataset. The downloaded dataset
from UCI repository are represented as a matrix and missing data are removed. The
67% of whole dataset is used as training and 33% is used for evaluation. Various
machine learning algorithms are used for training. The first one generalized linear
model (GLM) is used to reduce features because the number of features is greater
than number of instances. Generally the features are selected using (LASSO). Now
the Support vector machine (SVM) algorithm is used to separate two classes using
a hyper plane. Artificial neural network (ANN) takes features in the dataset as input
and passes it into hidden layers before presenting the final decision at output layer.
For error minimization ANN follows the following equation to reduce error
Y = activation (weight × input) + bias
Also Deep Neural Network is used for high dimensional data like image
recognition. The ML algorithms are classified into three performance parameters
Sensitivity = true positives/true negatives
Specificity = true negatives/actual negatives
Accuracy = (true positives + true negatives)/total predictions
A confusion matrix is represented using SVM and is compared to true classifica-
tions. The result will generate ether positive or negative based on threshold value.
Now, new data is applied to the trained model with various parameters like thickness,
cell size etc. the trained model returns predictions nearly accurate.
The increasing number of neurological disorder reaches to a significant number
where approx. 700 million people in the world currently suffering from Epilepsy.
The Electroencephalograms (EEG) signals are used not only to determine this neuro-
logical disorder but also it is reused to study patient’s clinical history for disease
diagnosis. In [7], authors present “Differentiation between Normal and Epileptic
EEG using k-Nearest-Neighbors technique”. This paper shows the classification
of EEG signals are either normal or epileptic in nature using k-nearest-neighbor
(kNN) algorithm. International 10–20 system electrodes are used to collect EEG
signals. The EEG database is used from http://www.oracle.com/technetwork/java/
index.html. The collected data is sampled into five sets with respect to some physical
activities i.e.
450 A. Saha et al.
tree, and Random tree are used. The accuracy for this system is 96.11%. The vocal
impairments of PD patient are analyzed using ML technique like Neural Network,
DMneural, Regression and Decision Tree where Neural network gives the accu-
racy of 92.9%. Also the vocal features are extracted to discriminate between healthy
people with PD patient. Therefore all methods are combined together to predict PD
with accuracy ranging from 57.1 to 77.4%. So there is a chance to improve predic-
tion accuracy. Here also particular dataset is used with respect to time and all ML
techniques are used on it. But if there is a chance to collect more datasets coming
from different sensor devices and fuse those heterogeneous data sets. Meta machine
learning can be used to identify which ML algorithm is appropriate for a particular
context. Hyper heuristics method can be applicable to identify a compatible ML
algorithm.
Sometimes heterogeneous sensor devices are used to collect complex signals
generated from human body. Smartphones have been extensively used for pervasive
computing especially for Human Activity Recognition, as discussed in [9], where a
two phase HAR framework has been proposed to identify both dynamic and static
activities, through feature selection and ensemble classifiers. Alhussein et al. [10]
proposes the framework that takes input from the sensors or smart devices and gives
output on a patient’s state based on Deep learning algorithm. The framework consists
of various types of sensors like Accelerometer, ECG, EEG, GPS location, Altimeter,
Thermometer etc. The collected data is sent to the smart devices that is Smartphone
or laptop via Local Area Network. These smart devices transmit data to the cloud
via 4G, 5G, Wi-Fi. Further this data is analyzed for seizure detection and detailed
analysis is sent to doctors. The authors used the freely available CHB-MIT dataset
foe EEG epileptic seizure detection collected from Children’s Hospital, Boston. It
contains dataset of 23 epilepsy patients and total 686 EEG recordings are collected,
out of which 198 contain one or more seizures. Deep CNN algorithm is used to extract
features from raw data and higher-level features are represented as combination of
lower level features. In CNN, convolution layer consists of filters or kernels which
takes 2D input and gives output as feature maps. Now, a three-layer neural network or
auto-encoder is used to train the input which is mapped to the hidden layer as well as
the output of hidden layer is mapped to the output layer. This is known as unsupervised
learning algorithm. Authors used cropped training technique to train the input data
but there are chances to over fit. Therefore the results of this architecture are
Average value of sensitivity = 93%
Average value of recognition accuracy = 99.2%
Overall accuracy of Deep CNN-stacked auto-encoder = 99.5%
At present, adapting a meta-heuristic approach for IoT sensor data analysis can be
a smart move for future. To solve optimization problems like Travelling Salesman
problem, where the best solution needs to be extracted from all feasible solutions,
heuristic or meta-heuristic approaches are more suitable than linear or non-linear
programming. Heuristic algorithms look for acceptable and reasonable solutions but
it is quite subjective and depends on the type of optimization problem [15].
The term “meta” in meta-heuristics refers to a higher level heuristic approach that
explores and exploits a larger search space, performing multiple iterations on the
subordinate heuristic and finding a near optimal solution in lesser time. These itera-
tions evolve multiple solutions and continue until solutions close to the predefined
converging criteria is found. This newly found near optimal solution is accepted and
system is considered to have reached to a converged state.
IoT Sensor Data Analysis and Fusion Applying Machine Learning … 453
In most cases, heuristic approaches don’t guarantee the best or global optima of a
problem. For example in hill climbing method, while going uphill, the best functional
value often gets stuck in the local optima, as the search space is limited and movement
is influenced by the initial point. Meta-heuristics techniques like simulated annealing
solve this problem by expanding the search space, with possible movement in both
uphill and downhill from any random starting point. Thus getting trapped in local
optima can be restricted in meta-heuristics which widens the horizon of feasible
solution to an optimization problem.
There are several meta-heuristic algorithms like genetic algorithms, evolu-
tional algorithms, random optimizations, local search, reactive search, greedy algo-
rithm, hill-climbing, best-first search, simulated annealing, ant colony optimization,
stochastic diffusion search, harmony search, variable neighborhood search and many
more, which are widely used by researchers to solve optimization problems. Out of
these, the most popular ones are as follows, which are mainly inspired by nature or
social behavior:
Genetic Algorithm: This algorithm is a classic choice for problems which are
complex, non-linear, stochastic, and non-differentiable. Genetic algorithm proves
to be a good optimization technique which iterates across different stages as selec-
tion of individuals, and applies genetic operators like crossover and mutation to
obtain an optimal solution. Based on the age old theory of the survival of the fittest, it
simply uses natural selection as its base. A set of individuals known as initial popu-
lation is chosen where each such individual is characterized by a set of parameters
or variables called genes. These genes are joined to form a string or chromosomes
which are a potential solution to the optimization problem and are binary encoded.
A fitness function is applied on a set of chromosomes within the population and thus
the fitness of each solution is calculated. The survival or selection of an individual to
the next level of iteration is purely determined by the fitness value of the same and
will be known as parents. The individuals that possess the least fitness value will be
eliminated from the population. Genetic operators will work on these parents and
thus good characteristics of the parents will be propagated to the next generation
(off-springs).
Particle Swarm Optimization: Similar to Genetic Algorithm, Particle Swarm Opti-
mization is another popular technique to solve complex optimization problems that
considers a population of individuals or swarm, rather than a single individual. Every
member in the swarm is a particle, with a predefined position and velocity that modi-
fies at each step of the algorithm, in the search space, based on its own best previous
location (p-best) as well as best previous location of the entire swarm (g-best).
Ant Colony Optimization: Ant Colony Optimization is another population based
meta-heuristic approach where ants (software agents) find the best solution to an
optimization problem which is analogous to finding the best path on a weighted
graph. The agents move on that graph, and are influenced by pheromone model (set
of parameters associated to edges/nodes of the graph), and final solution is reached
by continuously modifying these set of parameters during runtime.
454 A. Saha et al.
The application meta-heuristics can give substantial advantages to Big data analysis
in IoT. The power to solve complex optimization problem gives meta-heuristics
the capability to ameliorate the data intensive processes of IoT. In addition, these
approaches are suitable to provide security and, robustness to IoT data apart from
making them process faster and more efficiently than heuristic approaches.
Smart City: There is an enormous contribution of IoT technologies and application
for the urbanization of cities, to make them “Smart” (as shown in Fig. 5). However,
meta-heuristic approaches can help IoT framework to get one step ahead with its
ability to solve complex optimization problems faced during analysis of data collected
by IoT devices and in decision making processes. The various aspects of Smart city
where Meta-heuristics can be widely accepted are as follows:
Smart Parking: Smart parking, a component of a smart city, can use IoT and meta-
heuristic algorithm like Genetic Algorithm, to solve problems of congestion in shop-
ping malls. Genetic Algorithm can provide an optimal solution to find out the best
parking space as well as the parking slot for any vehicle entering the mall [16]. The
parking space is considered to be a chromosome and the parking slots as genes. With
the use of Roulette wheel selection method, 3 parent crossover operator and bit string
mutation operator, the optimum parking space and slot can be found out which will
reduce the waiting time of the mall visitors. Other meta-heuristic approaches like
Ant Colony Optimization [17] can also be useful in finding free parking slots where
the current location of the car and the desired location of parking determines the
right location.
Smart Traffic Management: Meta-Heuristic algorithms can treat on- road traffic
management as a complex optimization problem and find optimal solutions for
reducing congestions and prioritize traffic as per real time demands. Genetic Algo-
rithm is particularly useful to develop intelligent traffic management at intersections.
In [18], from the large number of vehicles on road at intersections (initial popula-
tion), the fittest can be selected by human community Genetic Algorithm, from the
data collected by cameras, fitted at the intersections. The genes are represented by
various traffic parameters like vehicle mean speed, traffic flow and traffic density.
Smart Environment: In highly populated cities, air pollution is a major problem.
For that, proper air monitoring system can be designed through energy efficient and
low cost sensors, as proposed in [19] and Hierarchical based Genetic Algorithm can
be used to optimize energy dissipation for extended network lifetime. It can be used
for real time air quality monitoring as well for IoT devices
Smart Waste Management: The smart city concept allows municipal waste manage-
ment more effective and efficient through IOT infrastructure [20]. Here, a meta-
heuristic approach like Genetic Algorithm can be used as an effective tool for garbage
collection optimization, by tracking garbage trucks especially at overloaded places in
the city. Further it is important to identify locations to dump as well as process solid
waste, violation of which may cause unwanted hazards to health, environment and
ecosystem. In [21], this issue is solved through Genetic Algorithm that can predict
locations through Geo-points in the state of Coimbatore, from where the data is
collected. The fitness criterion is formulated by Ministry of Urban Development,
Government of India.
Apart from Smart City, other IoT application domains have profound influence
of meta-heuristic approaches while trying to improve the functionality and utility of
the framework. They are as follows:
Smart Home: Meta-heuristic approaches like Genetic Algorithm can be used
to design an automated and adaptive lighting system in home that can turn on
or off lights automatically, provide user’s desired brightness, and also thus save
energy. A control algorithm is designed based on Genetic Algorithm that considers
a light turning pattern as a chromosome containing genes, with fitness function
that decides which light turning pattern will be eliminated after crossover and
mutation [22]. Other meta-heuristic approaches like Particle Swarm Optimization
can also contribute much towards betterment of a Smart Home. In [23], Particle
456 A. Saha et al.
Swarm Optimization is applied with ZigBee Tree network in smart homes to ensure
minimum energy consumption of the deployed sensor nodes and minimum cost
path of the same. The application of meta-heuristics in Zigbee reduces its router
dependency, increases energy efficiency as well as network lifetime. Interestingly
Genetic Algorithm can also be applied with Zigbee along with Particle Swarm
Optimization, for routing so that the optimal route can be meta-heuristically searched
when the complexity of the network increases [24].
Smart Healthcare: Meta-heuristic approaches like Particle Swarm Optimization and
Genetic Algorithm can play a significant role in Smart Healthcare. Genetic Algorithm
can be extensively used for ambulance response time for emergency cases in a partic-
ular geographical area [25]. The algorithm can compare the number of ambulances
with their current and future optimal location and reduce response time for patients
by conducting an adequate spatial analysis. Genetic Algorithm can also effectively
handle home caregiver scheduling and routing in a smart city [26]. Here the chromo-
somes (individuals) are represented by multiple vectors. In IoT based smart health-
care it is also important to store the medical data of the patient for further analysis and
future monitoring. Hence a virtual machine Selection and migration algorithm, along
with Particle Swarm Optimization is proposed in [27] that effectively stores data in
minimal virtual machines, with lesser energy consumptions and response time thus
serving more users at a time. At this point, security is a major concern, especially in
IoT framework, while storing medical images of a patient, which can be provided by
Particle Swarm Optimization, by optimizing standard cryptographic solutions [28].
Capturing data from mobile devices (as users requiring smart healthcare are moving)
to cloud resources is a challenging task in IoT framework. Meta-heuristics can play
an important role here to migrate data to a virtual machine for mobile patients, with
Ant Colony Optimization based joint VM migration model [29].
Smart Agriculture: Researches have revealed a significant contribution of meta-
heuristic approaches in Smart agriculture. Genetic Algorithm can be typically applied
for tasks like weather prediction before harvesting crops. Farmers can plan before-
after agricultural strategies based on the predicted weather conditions, and can also
get crop damage alerts as well as water management [30]. Swarm Intelligence is also
a major contributor in smart farming, like in [31], the authors have show how sensor
node deployment in agriculture faces path loss as a major threat due to presence of
tall trees and dense grass, in a farm. Particle Swarm Optimization can be used to
combine with two path loss models in order to find out the optimal co-efficient of
functions, for improving the models accuracy. This in turn improves the RSSI of the
propagated signal, for proper data communication among the deployed sensor nodes
in the farming land. In developing countries like China, agricultural afforestation
is essential for carbon reduction in the environment, but that would affect food
productivity.
IoT Sensor Data Analysis and Fusion Applying Machine Learning … 457
There are several other IoT application domains where meta-heuristic approaches
can help them to solve optimization problem in a better manner. They are as follows:
Smart Energy: Energy management is vital for both facilities and utilities as the
demand for energy supply is increasing day by day. IoT is capable of revolutionizing
energy management by making it “Smart”. In smart cities, energy management in
buildings (smart buildings) can be controlled by an efficient control scheme. Multi-
agent control systems can be developed with stochastic intelligent optimization, and
Genetic Algorithm can be the best choice for this [32].
Smart Industry: Smart Industry (Industry 4.0) may be defined as an industrial
framework with radical digitalization, and connecting people, products, machines
through upcoming technology like IoT. Supply chain management is an integral
part of any industry and to optimize the inventory cost of these supply chain, an
improved Genetic Algorithm with a better search efficiency is proposed that can deal
with IoT based supply chains [33]. Other meta-heuristics approach like Simulated
Annealing can be used in combination with Genetic Algorithm, to obtain better path
of AGV transportation system, through optimizations [34]. This can be beneficial
for a manufacturing industry for product automation, reduce unnecessary labor cost,
and get promoted to Industry 4.0.
Smart Business: Meta-heuristic approaches like Genetic Algorithm can be imple-
mented to maximize economic profit, in a business against traditional methods like
Lewis Model. The real parameter objective function can be optimized by using
Genetic Algorithm to trade of between Total revenue and Total cost [35]. Genetic
Algorithm can be applicable in hospitality industry as well to understand customer
preferences and satisfaction, with Online Travel Agencies websites. As mentioned
in application of Genetic Algorithm can help the business to understand customer
preferences based on different criteria for different segments [36].
This is summarized in Fig. 6. It is apparent from the figure that GA is applied in
almost every aspect of smart applications considered in this chapter.
in the constructive phase of GRASP [37]. While evaluating the initial population,
machine learning techniques like Markov Fitness Model, neural networks, Polyno-
mial regression can be utilized to find out computationally inexpensive objective
functions for a problem. For example in [38], Extreme Learning Machine, is used as
a proposed surrogate model in order to approximate the fitness values of majority of
individuals in the population in Genetic Algorithm and in [39], a fuzzy logic based
multi-objective fitness function, is proposed for Genetic Algorithm to perform better.
For population management in evolutionary algorithms, clustering analysis is one
of the most common techniques used, in practice. Meta-Heuristic algorithms like
Genetic Algorithm can use machine learning techniques like clustering analysis, in
order to manage the population and also select parents for producing off-springs [40].
Lately, meta-heuristics have been widely used in order to improve machine learn-ing
tasks in various aspects of its application. For example, in Classification problems
of supervised Learning, several tasks like feature selection & extraction, and param-
eter fine tuning can be done using meta-heuristics. In computer vision problems
like Bag of Visual Words [41], an evolutionary algorithm has been proposed that
can automatically learn and detect supervised as well as unsupervised weighting
schemes. Meta-heuristic approaches like Genetic Algorithm can be employed to
select important features in Intrusion Detection System for Machine Learning clas-
sifiers like decision tree [42]. In ANN, the difficulty to choose optimal network
parameters is considered to be a major challenge, which can be solved by adaptive
multi-objective genetic evolutionary algorithms for optimizing such parameters and
escape local optimum [43]. Meta-heuristic approaches like Genetic Algorithm are
powerful enough for feature selection as well as parameter optimization simultane-
ously for Support Tucker Machine. As proposed in [44], the algorithm can delete
irrelevant information from tensor data and provide better accuracy in general. The
influence of meta-heuristics on machine learning can be useful in health care as
well. For example in [45], a multi-objective Genetic Algorithm is proposed to select
appropriate features for microarray cancer data classification using SVM classi-
fier. Interestingly other meta-heuristic approaches like Particle Swarm Optimization
can also be used for feature selection and maximizing performance in multi-label
classification problems [46].
In order to provide a better solution in Regression problems, meta-heuristics
approaches can be reasonably viable. To begin with, neural networks are evolved
using Genetic Algorithm, a process known as Neuro-evolution, with an objective
to recreate the CNS of a natural organism, for an artificial creature in a virtual
environment. Genetic Algorithm can create the architecture of the artificial creator
based on fitness function and also train it to behave like living organisms [47]. Deep
Learning is an emerging field in machine learning and meta-heuristic approaches
like Genetic Algorithm can competitively train deep neural networks so that
460 A. Saha et al.
The classic combination of machine learning with meta-heuristics has the power to
make IoT framework even more effective and efficient for the society (as shown in
Fig. 7). For example in Smart Traffic management it is possible to reduce traffic
IoT Sensor Data Analysis and Fusion Applying Machine Learning … 461
of this meta-heuristic approach with machine learning technique also exhibits anti-
jamming capability with fault tolerance and can forecast accuracy with a shortened
operation time.
Researchers have also found out extensive contributions of meta-heuristic
approaches combined with machine learning algorithms in ensuring Smart Envi-
ronment, around. A real time fine grained IoT air quality sensing system is proposed
in [58] that minimizes average joint error of real time air quality map. Here deep Q-
Learning solution is used for power control problem while sensing the tasks online,
whereas the genetic algorithm is used for proper location selection problems while
efficiently deploying the sensors. The location selection problem is done using k-
means clustering for initialization and the genetic algorithm is used for improvement.
Air pollutants like benzene is hazardous to human health, leading to blood cancer.
Swarm Intelligence meta-heuristic approaches like Particle Swarm Optimization can
be used to predict the concentration of this harmful pollutant in air, along with
machine learning techniques like Fuzzy logic, considering triangular mutation oper-
ator. The proposed model successfully improves searching capabilities of Particle
Swarm Optimization and increases the speed of convergence [59]. In Smart Cities,
airborne pollution can be dangerous for human health and the non linear behavior
of such pollution paves the way for meta-heuristics to be applied in order to combat
with them. In [60], machine learning techniques like neuro-fuzzy systems have been
combined with improved Ant Colony Optimization, in order to improve prediction
of airborne pollution. It is observed that this approach can predict contaminants in
air through shortest search and uses pheromone to approximate real time data for
accurate results.
Meta-heuristics and machine learning can also make a Smart Home even smarter.
The most important criterion of a smart home is ensuring security and that can be
accomplished through IDS (Intrusion Detection System) in Smart homes. This setup
is beneficial for elderly people for detecting their daily activities to monitor and
intervene when required. In [61], the authors have employed a Genetic K-means
algorithm to detect unknown patterns (human behaviors) and thus help in accurate
detection of attacks, if any, through pattern analysis techniques. The combination of
Genetic Algorithm with K-means helps in eliminating redundant input features for
accurate prediction of malicious patterns, and faster response. In [61], authors have
proposed a recurrent output neural network that predicts any unusual behavior of the
elderly person under observation. The Genetic Algorithm is integrated in the learning
step in order to get a better accuracy of the results so that the actual situation can
be appropriately monitored remotely. The algorithm works on the basis of real time
data fetched from IoT sensors. In a smart home, it is possible to detect locating users
inside the premises (residents or criminals) through the data collected by routers
based on Wi-Fi signal strength, from personal devices worn by people. Machine
learning techniques like ANN can be trained by using weights obtained from Particle
Swarm Optimization and an accurate optimization strategy can be developed [62].
Ant Colony Optimization can be implemented in Smart Homes along with machine
learning technique like Decision Tree, as seen in [63]. This combination can be useful
IoT Sensor Data Analysis and Fusion Applying Machine Learning … 463
in home automation system, speech recognition, and device control with embedded
system.
A substantial amount of work is done in the field of Smart healthcare which takes
into account both meta-heuristics and machine learning concepts. It is observed that in
smart cities, a global vision can merge several aspects together like AI, ICT, machine
learning, Big data and IoT and this confluence can successfully predict and detect
certain deadly diseases like Alzheimer’s, Tuberculosis and Dementia. Genetic Algo-
rithms can be extensively used along with SVM, KNN, RF in order to accomplish
the task [64]. Common health issues like diabetes are faced by almost millions of
people all across the world. Hence an ensemble (group) learning approach is proposed
through reinforcement learning that accurately predicts diabetes mellitus and meta-
heuristic approach like Genetic algorithm is used for selecting hybrid features for
the task [65]. Other researches like [66], applies multi-objective evolutionary Fuzzy
Classifier to the Pima Indians Diabetes Dataset and uses Goldberg’s Genetic Algo-
rithm to predict type 2 diabetes. The feature selection method using Genetic Algo-
rithm helps in reducing number of features in the dataset and hence prediction can be
done more accurately in lesser time. It is observed that meta-heuristics and machine
learning can be useful for predicting cardiovascular diseases as well. In [67], a
hybrid method to enhance the neural network is proposed where meta-heuristic
approaches like Genetic Algorithm can enhance the initial weight of the NN and
improve accuracy and performance in detecting severe heart diseases in patients.
Unlike Angiography, this method is cost effective and has almost no side effects.
Other meta-heuristic approaches like Particle Swarm Optimization can successfully
predict coronary heart disease using ensemble classifiers like AdaBoost, Random
Forest and bagged tree [68]. Here Particle Swarm Optimization is used as a method
for feature selection to reduce least ranked features and ensemble methods improve
classification performance. This accurately predicts and detects heart diseases earlier
than others. Ant Colony Optimization is another meta-heuristic approach that can be
useful in predicting the risk of a disease. In Big data, it is difficult to extract features
from huge datasets, especially if the data is unstructured, as shown in [69]. So an
improved Ant Colony Optimization is presented along with neural network machine
learning technique, so that the best features can be selected for better prediction
accuracy.
Smart Agriculture: It is observed that the confluence machine learning with meta-
heuristics can also have significant contributions in Smart Agriculture as well.
Genetic Algorithm can be used successfully in precision agriculture as shown in
[70], where translation of satellite imagery using convolutional neural network can
be done for accurate decision making. This can also be used to reduce carbon dioxide
emission, and minimize land degradation. Other approaches which are effective in
Smart Farming can be the integration of Particle Swarm Optimization with hybrid
machine learning model based on ANN as in [71] which results in improvement
in performance of common combine harvesters which are essential for the agri-
cultural industry. The combination of meta-heuristic approach like Particle Swarm
Optimization with ANN provides higher accuracy than simple ANN. Ant colony
464 A. Saha et al.
8 Conclusion
This chapter presents a detailed review of data intensive IoT based smart appli-
cations especially, smart city and its associated domains. Learning the heteroge-
neous data from various sources and prediction from it are crucial to build a robust
system. Different machine learning techniques, along with their utilities in IoT based
smart applications are discussed in detail. Interestingly, meta-heuristics techniques
are found to have been applied in many such applications either as a standalone
method for optimization or in combination with machine learning techniques to
create smarter applications for a smarter world of internet of things.
Table 1 Comparison of different works on IoT based smart applications that incorporates both
machine learning and meta-heuristics techniques for data analysis
References IoT area of Meta-heuristic Machine learning Remarks
application approach approach
Wang et al. [55] Smart traffic Genetic algorithm Linear-regression Reduces waiting
time, balances
traffic pressure at
intersections.
Zhang et al. [56] Smart traffic Genetic algorithm SVR & rRandom Accurate
forest forecasting of short
term traffic flow
Song [57] Smart traffic Ant colony ANN Fault Tolerance, anti
optimization jamming, shorten
operation time
Hu et al. [58] Smart Genetic algorithm Q-learning, Reduces average
environment K-means joint error, finds
suitable locations
for limited sensors
Kaur et al. [59] Smart Particle swarm Fuzzy logic Improves searching
environment optimization capability of PSO
and speeds up
convergence
(continued)
IoT Sensor Data Analysis and Fusion Applying Machine Learning … 465
Table 1 (continued)
References IoT area of Meta-heuristic Machine learning Remarks
application approach approach
Martinez-Zeron Smart Ant colony Neuro-Fuzzy Improves prediction
et al. [60] environment optimization accuracy, reduces
prediction errors
Sandhya and Smart home Genetic algorithm K-means Identifying patterns
Julian [61] and attacks, in IDS,
reduces redundant
features
Narayanan et al. Smart home Genetic algorithm RO-neural network Detects Unusual
[62] behavior of elderly
patients
Soni and Dubey Smart home Particle swarm ANN Locates residents,
[63] optimization criminals through
wearable devices
nearby homes
Chui et al. [64] Smart home Ant colony Decision tree Home automation,
optimization speech recognition,
device control
Abdollahi et al. Smart Genetic algorithm Ensemble learning Diagnose,predict
[65] healthcare diabetes accurately,
through hybrid
feature selection
Vaishali et al. [66] Smart Genetic algorithm MO volutionary Reduced features,
healthcare fuzzy accurate prediction
in lesser time
Arabasadi et al. Smart Genetic algorithm Neural network Enhances initial
[67] healthcare weight of NN, cost
effective and no
sideeffects
Yekkala et al. [68] Smart Particle swarm Ensemble methods Improved feature
heathcare optimization selection, improved
classification, early
diagnosis
Joel et al. [69] Smart Ant colony Neural networks Effective feature
healthcare optimization selection from Big
Data
Arkeman et al. Smart Genetic algorithm Convolutional NN Translation of
[70] agriculture satellite imagery,
reduce CO2
minimize land
degradation
Nádai et al. [71] Smart Particle swarm ANN Performance
agriculture optimization enhancement of
combine harvesters,
better accuracy
Carrillo et al. [72] Smart Ant colony ELM Features as nodes of
agriculture optimization graph, light
optimization for
low complexity
466 A. Saha et al.
References
1. URL: https://www.gartner.com/en/information-technology/glossary/internet-of-things
2. Mallick, A., Saha, A., Chowdhury, C., Chattopadhyay, S.: Energy efficient routing protocol for
ambient assisted living environment. Wirel. Pers. Commun. 109(2), 1333–1355 (2019)
3. Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L.: Edge computing: Vision and challenges. IEEE
Internet Things J. 3(5), 637–646 (2016)
4. Li, H., Ota, K., Dong, M.: Learning IoT in edge: Deep learning for the Internet of Things with
edge computing. IEEE Netw. 32(1), 96–101 (2018)
5. Chen, J., Ran, X.: Deep learning with edge computing: a review. Proc. IEEE 107(8) (2019)
6. Venkatesan, C., Karthigaikumar, P., Paul, A., Satheeskumaran, S., Kumar, R.J.I.A.: ECG
signal preprocessing and SVM classifier-based abnormality detection in remote healthcare
applications. IEEE Access 6, 9767–9773
7. Oliva, J.T., Garcia Rosa, J.L.: Differentiation between normal and epileptic eeg using k-nearest-
neighbors technique. In: Machine Learning for Health Informatics, pp. 149–160. Springer,
Cham (2016)
8. Miljkovic, D., Aleksovski, D., Podpečan, V., Lavrač, N., Malle, B., Holzinger, A.: Machine
learning and data mining methods for managing Parkinson’s disease. In: Machine Learning for
Health Informatics, pp. 209–220. Springer, Cham (2016)
9. Saha, J., Chowdhury, C., Biswas, S.: Two phase ensemble classifier for smartphone based
human activity recognition independent of hardware configuration and usage behaviour.
Microsyst. Technol. 24(6), 2737–2752 (2018)
10. Alhussein, M., Muhammad, G., Shamim Hossain, M., Umar Amin, S.: Cognitive IoT-cloud
integration for smart healthcare: case study for epileptic seizure detection and monitoring.
Mobile Netw. Appl. 23(6), 1624–1635
11. Sumalee, A., Wai Ho, H.: Smarter and more connected: future intelligent transportation system.
IATSS Res. 42(2), 67–71
12. Lin, Y., Wang, P., Ma, M.: Intelligent transportation system (its): concept, challenge and oppor-
tunity. In: 2017 IEEE 3rd International Conference on Big Data Security on Cloud (bigdatase-
curity), IEEE International Conference on High Performance and Smart Computing (hpsc),
and IEEE International Conference on Intelligent Data and Security (ids), pp. 167–172. IEEE
(2017)
13. Lin, T., Rivano, H., Le Mouël, F.: A survey of smart parking solutions. IEEE Trans. Intell.
Transp. Syst. 18(12), 3229–3253
14. Roman, C., Liao, R., Ball, P., Ou, S., de Heaver, M.: Detecting on-street parking spaces in
smart cities: performance evaluation of fixed and mobile sensing systems. IEEE Trans. Intell.
Transp. Syst. 19(7), 2234–2245
15. Khosravanian, R., Mansouri, V., Wood, D.A., Reza Alipour, M.: A comparative study of
several metaheuristic algorithms for optimizing complex 3-D well-path designs. J. Petrol.
Expl. Product. Technol. 8(4), 1487–1503
16. Thomas, D., Kovoor, B.C.: A genetic algorithm approach to autonomous smart vehicle parking
system. Procedia Comput. Sci. 125, 68–76 (2018)
17. Balzano, W., Stranieri, S.: ACOp: an algorithm based on ant colony optimization for parking slot
detection. In: Workshops of the International Conference on Advanced Information Networking
and Applications, pp. 833–840. Springer, Cham (2019)
18. Hnaif, A.A., Nagham, A.-M., Abduljawad, M., Ahmad, A.: An intelligent road traffic manage-
ment system based on a human community genetic algorithm. In: 2019 IEEE Jordan Inter-
national Joint Conference on Electrical Engineering and Information Technology (JEEIT),
pp. 554–559. IEEE (2019)
19. Khedo, K.K., Chikhooreeah, V.: Low-cost energy-efficient air quality monitoring system using
wireless sensor network. In: Wireless Sensor Networks-Insights and Innovations. IntechOpen
(2017)
IoT Sensor Data Analysis and Fusion Applying Machine Learning … 467
20. Paulchamy, B., Babu Thirumangai Alwar, E., Anbarasu, K., Hemalatha, R., Lavanya, R.,
Manasa, K.M.: IOT based waste management in smart city. Asian J. Appl. Sci. Technol. 2(2),
387–394
21. Ramasami, K., Velumani, B.: Location prediction for solid waste management—a Genetic
algorithmic approach. In: 2016 IEEE International Conference on Computational Intelligence
and Computing Research (ICCIC), pp. 1–5. IEEE (2016)
22. Ngo, M.H., Viet Cuong Nguyen, X., Duong, Q.K., Son Nguyen, H.: Adaptive Smart Lighting
Control based on Genetic Algorithm, pp. 320–325
23. Fernando, S.L., Sebastian, A.: IoT: Smart Homeusing Zigbee clustering minimum spanning
tree and particle swarm optimization (MST-PSO). Int. J. Inf. Technol. (IJIT) 3(3) (2017)
24. Jiang, D., Yu, L., Wang, Xiaoxia Xie, F., Yu, Y.: Design of the smart home system based on
the optimal routing algorithm and ZigBee network. PloS One 12(11)
25. Sasaki, S., Comber, A.J., Suzuki, H., Brunsdon, C.: Using genetic algorithms to optimise current
and future health planning-the example of ambulance locations. Int. J. Health Geograph. 9(1),
4 (2010)
26. Borchani, R., Masmoudi, M., Jarboui, B.: Hybrid genetic algorithm for home healthcare routing
and scheduling problem. In: 2019 6th International Conference on Control, Decision and
Information Technologies (CoDIT), pp. 1900–1904. IEEE (2019)
27. Ambigai, S.D., Manivannan, K., Shanthi, D.: An efficient virtual machine migration for smart
healthcare using particle swarm optimization algorithm. Int. J. Pure Appl. Math. 118(20),
3715–3722
28. Elhoseny, M., Shankar, K., Lakshmanaprabu, S.K., Maseleno, A., Arunkumar, N.: Hybrid
optimization with cryptography encryption for medical image security in internet of things.
Neural Comput. Appl., 1–15
29. Islam, Md.M., Abdur Razzaque, Md., Mehedi Hassan, M., Ismail, W.N., Song, B.: Mobile
cloud-based big healthcare data processing in smart cities. IEEE Access 5, 11887–11899 (2017)
30. Gumaste, S.S., Kadam, A.J.: Future weather prediction using genetic algorithm and FFT for
smart farming. In: 2016 International Conference on Computing Communication Control and
automation (ICCUBEA), pp. 1–6. IEEE (2016)
31. Jawad, H.M., Jawad, A.M., Nordin, R., Kamel Gharghan, S., Abdullah, N.F., Ismail, M., Jawad
Abu-Al Shaeer, M.: Accurate empirical path-loss model based on particle swarm optimization
for wireless sensor networks in smart agriculture. IEEE Sens. J. (2019)
32. Shaikh, P.H., Mohd Nor, N.B., Nallagownden, P., Elamvazuthi, I., Ibrahim, T.: Intelligent multi-
objective control and management for smart energy efficient buildings. Int. J. Electr. Power
Energy Syst. 74, 403–409 (2016)
33. Wang, Y., Geng, X., Zhang, F., Ruan, J.: An immune genetic algorithm for multi-echelon
inventory cost control of IOT based supply chains. IEEE Access 6, 8547–8555 (2018)
34. Fan, C., Li, S., Guo, R., Wu, Y.: Analysis of AGV optimal path problem in smart factory
based on genetic simulated annealing algorithm. In: 4th Workshop on Advanced Research and
Technology in Industry (WARTIA 2018). Atlantis Press (2018)
35. Chatterjee, S., Nag, R., Dey, N., Ashour, A.S.: Efficient economic profit maximization: genetic
algorithm based approach. In: Smart Trends in Systems, Security and Sustainability, pp. 307–
318. Springer, Singapore (2018)
36. Hao, J.-X., Yan, Yu., Law, R., Fong, D.K.C.: A genetic algorithm-based learning approach to
understand customer satisfaction with OTA websites. Tour. Manag. 48, 231–241 (2015)
37. De Lima, F.C., De Melo, J.D., Doria Neto, A.D.: Using the Q-learning algorithm in the construc-
tive phase of the GRASP and reactive GRASP metaheuristics. In: 2008 IEEE International
Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence),
pp. 4169–4176. IEEE (2008)
38. Guo, P., Cheng, W., Wang, Y.: Hybrid evolutionary algorithm with extreme machine learning
fitness function evaluation for two-stage capacitated facility location problems. Expert Syst.
Appl. 71, 57–68 (2017)
39. Téllez-Velázquez, A., Molina-Lozano, H., Villa-Vargas, L.A., Cruz-Barbosa, R., Lugo-
González, E., Batyrshin, I.Z., Rudas, I.J.: A feasible genetic optimization strategy for parametric
interval type-2 fuzzy logic systems. Inte. J. Fuzzy Syst. 20(1), 318–338 (2018)
468 A. Saha et al.
40. Chehouri, A., Younes, R., Khoder, J., Perron, J., Ilinca, A.: A selection process for genetic
algorithm using clustering analysis. Algorithms 10(4), 123 (2017)
41. Escalante, H.J., Ponce-López, V., Escalera, S., Baró, X., Morales-Reyes, A., Martínez-
Carranza, J.: Evolving weighting schemes for the bag of visual words. Neural Comput. Appl.
28(5), 925–939
42. Azad, C., Mehta, A.K., Jha, V.K.: Evolutionary decision tree-based intrusion detection system.
In: Proceedings of the Third International Conference on Microelectronics, Computing and
Communication Systems, pp. 271–282. Springer, Singapore (2019)
43. Ibrahim, A.O., Mariyam Shamsuddin, S., Abraham, A., Noman Qasem, S.: Adaptive memetic
method of multi-objective genetic evolutionary algorithm for backpropagation neural network.
Neural Comput. Appl. 31(9), 4945–4962
44. Zeng, D., Wang, S., Shen, Y., Shi, C.: A GA-based feature selection and parameter optimization
for support tucker machine. Procedia Comput. Sci. 111, 17–23 (2017)
45. Rani, M.J., Devaraj, D.: Microarray data classification using multi objective genetic algo-
rithm and SVM. In: 2019 IEEE International Conference on Intelligent Techniques in Control,
Optimization and Signal Processing (INCOS), pp. 1–3. IEEE (2019)
46. Zhang, Y., Gong, D.-w., Sun, X.-y., Guo, Y.-n.: A PSO-based multi-objective multi-label feature
selection method in classification. Scientific Reports 7(1), 1–12 (2017)
47. Jha, S.K., Josheski, F.: Artificial evolution using neuroevolution of augmenting topologies
(NEAT) for kinetics study in diverse viscous mediums. Neural Comput. Appl. 29(12), 1337–
1347
48. Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolu-
tion: Genetic algorithms are a competitive alternative for training deep neural networks for
reinforcement learning. arXiv preprint arXiv:1712.06567 (2017)
49. Ojha, V.K., Abraham, A. Snášel, V.: Metaheuristic design of feedforward neural networks: A
review of two decades of research. Eng. Appl. Artif. Intell. 60, 97–116
50. Said, A., Ayaz Abbasi, R., Maqbool, O., Daud, A., Aljohani, N.R.: CC-GA: A clustering
coefficient based genetic algorithm for detecting communities in social networks. Appl. Soft
Comput. 63, 59–70 (2018)
51. Aadil, F., Bashir Bajwa, K., Khan, S., Majeed Chaudary, N., Akram, A.: CACONET: Ant
colony optimization (ACO) based clustering algorithm for VANET. PloS One 11(5), e0154080
(2016)
52. Bagherlou, H., Ghaffari, A.: A routing protocol for vehicular ad hoc networks using simulated
annealing algorithm and neural networks. J. Supercomput. 74(6), 2528–2552 (2018)
53. Govindarajan, K., Selvi Somasundaram, T., uresh Kumar, V.: Particle swarm optimization
(PSO)-based clustering for improving the quality of learning using cloud computing. In: 2013
IEEE 13th International Conference on Advanced Learning Technologies, pp. 495–497. IEEE
(2013)
54. Dutta, P., Saha, S., Pai, S., Kumar, A.: A protein interaction information-based Generative
Model for enhancing Gene clustering. Scientific Reports 10(1), 1–12 (2020)
55. Wang, H., Liu, J., Pan, Z., Takashi, K., Shimamoto, S.: Cooperative traffic light controlling
based on machine learning and a genetic algorithm. In: 2017 23rd Asia-Pacific Conference on
Communications (APCC), pp. 1–6. IEEE (2017)
56. Zhang, L., Alharbe, N.R., Luo, G., Yao, Z., Li, Y.: A hybrid forecasting framework based on
support vector regression with a modified genetic algorithm and a random forest for traffic flow
prediction. Tsinghua Sci. Technol. 23(4), 479–492 (2018)
57. Song, L.: Improved intelligent method for traffic flow prediction based on artificial neural
networks and ant colony optimization. J. Convergence Inf. Technol. 7(8), 272–280 (2012)
58. Hu, Z., Bai, Z., Bian, K., Wang, T., Song, L.: Implementation and optimization of real-time
fine-grained air quality sensing networks in smart city. In: ICC 2019–2019 IEEE International
Conference on Communications (ICC), pp. 1–6. IEEE (2019)
59. Kaur, P., Singh, P., Singh, K.: Air pollution detection using modified Traingular mutation based
particle swarm optimization (2019)
IoT Sensor Data Analysis and Fusion Applying Machine Learning … 469
1 Introduction
In the recent years, climate change created adverse effects on various sectors of
the societies around the world. Significant increase of the drought cycles created
severe problems for the agricultural sector. Deforestation due to the natural or
man-made fires, illegal over-logging and illegal construction in the forests created
severe problems for the natural environment. The central plateau of Iran is one of
the areas in which the effect of the climate change is increasingly creates significant
problems. The deforestation in Zagros area, the drought in the agricultural centers of
southern provinces of Iran, drying of the major lakes in central Iran and air pollution
resulting from dust particles are only some of these problems which are increasing
in scale by the year.
© The Editor(s) (if applicable) and The Author(s), under exclusive license 471
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_21
472 B. Majidi et al.
people and businesses to adhere with climate smart and sustainable policies. The
goal of the ecoCystem is to use the power of artificial intelligence combines with
robotics and sensor networks in order to provide solutions for rural, agricultural and
natural ecosystems. The main modules of the proposed ecoCystem are the Geolyzer
and the Modular Rapidly Deployable Internet of Robotic Things (MORAD IoRT).
The Geolyzer platform is the focus of this paper. The Geolyzer platform automati-
cally indexes and processes the satellite and drone data in order to provide descrip-
tive, prescriptive and predictive solutions for rural communities. Until recently the
processing of visual information gathered by the drones are performed after trans-
mission of visual data to a ground station either by a computer or manually by a
human expert. The advances in onboard processing systems in recent years give the
drones the ability to process the visual data onboard and perform intelligent deci-
sions on the flight. In this paper, a computationally light deep neural network is used
for fast and accurate interpretation of rural environment for use in decision manage-
ment of an autonomous UAV. Several training scenarios are investigated and various
applications for such system in agriculture, forestry and climate change manage-
ment are proposed. The experimental results show that the proposed system have
high accuracy for various agricultural and forestry applications. This drones can
act as agents in MORAD IoRT [1–4]. MORAD IoRT is a collection of intelligent
unmanned vehicles which can monitor the environment around them using a collec-
tion of sensors including visual sensors. These robots use localized and distributed
intelligence to model the environment, to make decisions based on their goals, to act
on these decisions and to communicate with each other over internet and wireless
networks. This mobile sensor network is used for localized model generation and
validation of Geolyzer models.
The rest of this paper is organized as follows. The related works are discussed
in Sect. 2. The Geolyzer is detailed in Sect. 3. The experimental design and the
simulation scenarios are discussed in Sect. 4. Finally Sect. 5 concludes the paper.
2 Related Works
In the past few years, deep neural networks provided a good solution for many
complex pattern recognition and visual interpretation problems. The application of
deep neural networks in satellite and high altitude remote sensing is investigated
significantly. Ammour et al. [5] proposed an asymmetric adaption neural network
(AANN) for cross-domain classification in remote sensing images. They used three
remote sensing data sets (i.e. Merced, KSA, ASD) with six scenarios. Chen et al.
[6] proposed two semantic segmentation frameworks (SNFCN & SDFCN) which
use deep Convolutional Neural Network (CNN). The proposed framework increases
the accuracy compared to the FCN-8 and Segnet model and post processing raises
the accuracy by about 1–2%. Chen et al. [7] presented a method for feature fusion
framework based on deep neural networks. The proposed method uses a CNN to
extract features and a fully connected DNN which fuses the heterogeneous features
474 B. Majidi et al.
that generated by the previous CNN. They showed that the deep fusion model has
better results in classification accuracy.
Cheng et al. [8] proposed a method using Discriminative CNNs to improve their
performance for remote sensing image scene classification. In contrast to traditional
CNN that minimized only the cross entropy loss, D-CNN are trained by optimizing
a new discriminative objective function. Experimental result show that this method
perform better than existing methods. Gavai et al. [9] uses MobileNets model to
categorize flower datasets, which can minimize the time for flower classification.
Gong et al. [10] investigates the Deep Structural ML (DSML) for remote sensing
classification and uses structural information during training of remote sensing
images. Experiments over the six remote sensing datasets show better performance
than other algorithms.
Ben Hamida et al. [11] investigated the application of deep learning for classifica-
tion of remote sensing hyper spectral dataset and proposed the new 3D DL approach
for join spectral and spatial information processing. Kussul et al. [12] compared
CNN with traditional fully connected Multi-Layer Perceptron (MLP). Li et al. [13]
presented a new approach to retrieve large-scale remote sensing images based on
deep hashing neural networks and showed that this approach performs better than
state-of-the-art methods on two selected public dataset. Scot et al. [14] investigated
application of deep CNN in conjunction with two techniques (Transfer learning and
data augmentation) for remote sensing imagery. They achieved land-cover classifi-
cation accuracies of 97.8 ± 2.3%, 97.6 ± 2.6%, and 98.5 ± 1.4% with CaffeNet,
GoogLeNet, and ResNet, respectively on UC Merced dataset.
Shao et al. [15] proposed a remote sensing fusion method based on deep convo-
lutional network that can extract spectral and spatial features from source images.
This method perform better than traditional methods. Shi et al. [16] proposed a
framework for remote sensing image captioning by leveraging the fully convolu-
tional networks. The experimental results show that this method generate robust and
comprehensive sentence description with desirable speed. Xiao et al. [17] presented
a new method which uses a multi-scale feature fusion to represent the information
of each region using GoogleNet model. In the next level this feature is the input
to a support vector machine. Finally a simplified localization method is used. The
experimental results demonstrates that the feature fusion outperforms other methods.
Xie et al. [18] proposed a new method based on deep learning for detecting clouds
in remote sensing images by Simple Linear Iterative Clustering (SLIC) method for
segmentation and a deep CNN with two branches to extract multi-scale features. This
method can detect clouds with high accuracy.
Xu et al. [19] investigated two-branch CNN and showed that this method can
achieve better classification performance than previously proposed methods. Yang
et al. [20] proposed a new detection algorithm named Region-based Deep Forest
(RDF) which consists of a simple region proposal network and a deep forest
ensemble and showed that this method performs better than state-of-the-art methods
on numerous thermal satellite image datasets. Yu et al. [21] proposed an unsupervised
Geo-Spatiotemporal Intelligence for Smart Agricultural … 475
In the past few years there was a significant increase in the amount of low-cost
geospatial data in forms of satellite and drone data. The resolution and coverage of
this data is increasing rapidly. The intelligent analytics of this massive data can help
increasing the productivity across a large group of industries such as agriculture, live-
stock and environment industries. In this paper, two project related to processing of
remote sensing images for agricultural and environment management are discussed.
The first project is a computationally light deep neural network which is used for
fast and accurate interpretation of rural environment using an autonomous unmanned
aerial vehicle. The second project is a platform for large scale interpretation of satel-
lite images for agricultural and environmental applications which is called SatPlat.
In order to effectively use the massively available geospatial data, the SatPlat has
custom-made big spatial data processing stack to provide descriptive, prescriptive
and predictive solutions for agriculture and environment management.
There are various architecture proposed for deep neural networks. MobileNets
[9] is a light model that can perform the pattern recognition tasks efficiently on a
device with limited resources. Therefore, it can be used in a portable solution like
recognition of the objects on-board a drone. MobileNets is a model based on depth-
wise separable convolutions. As it is demonstrated in Fig. 1, in this architecture, a
standard convolution factorizes into a depth wise convolution and a 1 × 1 point-wise
convolution. It uses two step filters to reduce the computation with a small reduction
in accuracy.
If the kernel size is Dk × Dk , there are M input channels and N output channels
and the feature map size is D F × D F . In a standard CNN each stride is calculated as:
G k,l,n = Ki, j,m,n .Fk+i−1,l+ j−1,m (1)
i, j,m
Fig. 1 MobileNets
architecture
M
...
Dk
Dk N
(a) Standard Convolution Filters
Dk
Dk ...
1
M
(b) Depthwise Convolutional Filters
M ...
1
1 N
(c) Complex Filters named Pointwise Complexity
Geo-Spatiotemporal Intelligence for Smart Agricultural … 477
3x3 Conv
standard convolution
ReLU
BIN
(a) Standard convolutional layer
1x1 Conv
ReLU
ReLU
BIN
BIN
(b) Depthwise Separable convolutions
DK × DK × M × N × DF × DF (2)
DK × DK × M × DF × DF (4)
Figure 2 shows this comparison and in the final architecture each layer is followed
by a batch-normalization layer and the final fully connected layer feeds into a Softmax
for classification.
3.2 SatPlat
In the past few years there has been a significant increase in the amount of avail-
able low-cost satellite data. The intelligent analytics of this massive data can help
478 B. Majidi et al.
of this data considered an enterprise level professional big data solution which is
developed for the first time in Iran by SatPlat team.
For the first section of the experimental results the study area is in Mazandaran and
Kerman regions of Iran. The data from these regions are classified into 4 classes:
Agricultural fields (C1), Drought stricken areas (C2), Construction in the forest
territory (C3) and Dirt roads in the forest territory (C4). The training data set consists
of 5000 images from Google earth. Google Earth has been chosen for training due
to availability of the visual information across a large range of training classes. A
sample of each class is presented in Fig. 3. In this paper, the implementation of
MobileNets deep neural network is performed using Tensorflow machine learning
environment. Figure 4 shows the Tensor Board architecture of the used deep neural
network.
The results are tested on a dataset of aerial images captured from a UAV flying
over Melbourne, Australia [23–26] (Fig. 5).
In order to investigate the effect of the altitude of UAV on the accuracy of inter-
pretation, two deep neural networks are trained. The first network is trained using
2800 images from 100 m altitude from the Google Earth. Then the results are tested
on the actual UAV images and the accuracy and cross entropy are presented in Fig. 6.
The results for each class using the 100 m training dataset is presented in Table 1.
The second network is trained using 2000 images with 50 m altitude from the Google
Earth. Then the results are tested and the accuracy and cross entropy are presented in
Fig. 7. The experimental results show that the proposed model has high accuracy for
the intended application. Furthermore, the experimental results show that decreasing
the altitude of the training set will result in increasing the accuracy of the interpreted
images.
The results for each class using the 50 m training dataset is presented in Table 2.
4.2 SatPlat
In this section various projects which are performed by SatPlat team are discussed and
the viability of the proposed solution for commercial applications is investigated. The
SatPlat projects can be categorized in three levels: The government related projects
(B2G), the projects in which SatPlat collaborated with other businesses (B2B) and
finally the direct to customer (B2C) initiatives in which SatPlat engaged the farmers
directly and provided solutions for smallholders.
480 B. Majidi et al.
The first main B2G project of SatPlat which lead to the development and testing
of the platform was with the Agricultural Insurance Fund (AIF) of Iran. The AIF has
many human agents and the traditional method for verification of the insurance claims
is manually by these agents. This project was started in August 2018 and the goal of
the project was development of autonomous end-to-end agricultural insurance fraud
detection and other insurance related sub projects. The first phase of the project was a
pilot for investigation of 17 land parcels with total area of 200 ha in the Sabzevar-Bijar
area in Kurdistan province of Iran. The goal of this phase was detection of whether
these parcels are under cultivation of wheat or left as fallow automatically and using
satellite imagery. Then the temperature stress are calculated for these land parcels
and the required recommendation for irrigation deficiency in drought conditions are
presented to the farmers.
In this project, Sentinel 2 and Landsat 8 satellite images are used for genera-
tion of recommendations. For validation of the recommendations, the SatPlat team
interviewed the farmers during and after the project and a set of questionnaires are
482 B. Majidi et al.
(a) (b)
(c) (d)
used for data collection and improvement of the SatPlat platform. After the project,
the recommendations of SatPlat platform is validated by the AIF and the results are
considered satisfactory. After the initial pilot, AIF requested the SatPlat services for
other projects. The insurance claim verification extended to land use detection in
Kerman, Ardabil and Mahabad regions with total land size of 200 ha. The outcome
of project was development of autonomous remote sensing models and solutions for
automatic crop type and crop health investigation using satellite imagery. Figure 8
shows some of the results of this projects.
Another SatPlat project for AIF was automatic monitoring of change detection
during the massive floods of March 2019 in Khuzestan province of Iran using satellite
imagery. Figure 9 shows the flooded area calculated in this project.
After this pilot, SatPlat had the initial models for land use and crop mask detection
which expanded to other geographical regions in 2020 projects. The 2018–2019
pilots resulted in an end-to-end platform for providing autonomous services to the
large scale agricultural organizations. This project lead to an investment from Iranian
NS-Fund which is awarded to AIF for this new initiative on November 2019.
For B2B projects, SatPlat collaborated with several agricultural Internet of Things
(IOT) companies and provided these companies with the insights for optimization
of their smart irrigation solutions. SatPlat team modelled the requirements for all
the stakeholders of smart rural communities from farmers to other Agritech compa-
nies. In another service, SatPlat provided artificial intelligence and digital image
processing solutions for agricultural drone service providers. SatPlat provided image
processing services for an agricultural drone company for management and tree
Geo-Spatiotemporal Intelligence for Smart Agricultural … 483
(a)
(b)
Fig. 6 The results for the 100 m altitude dataset
modelling in a Pistachio farm in Yazd during September 2019. The pilot was imple-
mented on 1 ha of a 500 ha pistachio farm. SatPlat is also provided remote sensing
insights for smart irrigation platform by Rayan Arvin Algorithm Company for 40
ha of apple orchards in Mashhad on July 2019. The insight from this project lead to
improved smart irrigation solutions for orchards. Smart irrigation solutions are also
484 B. Majidi et al.
(a)
(b)
used for various other crops. Figure 10 shows the results of pistachio tree counting
and corn field irrigation decision support.
For B2C projects, in order to design a Platform usable by different groups of famers
in Iran (from smallholder farmers to larger agricultural companies) and across various
cultural and geographical domains (From Chabahar area at the shores of Persian Gulf
Geo-Spatiotemporal Intelligence for Smart Agricultural … 485
Fig. 8 VCI index and land use calculation using satellite imagery for AIF
to Ardabil at north of Iran), SatPlat interviewed various farming groups. For example,
SatPlat provided solutions for a private sector farmer with an 80 ha farm at Shahin
Shahr in Isfahan region of Iran in July 2019. The project lead to new insights to the
SatPlat team about thr arid regions on the farm (known as Oily regions locally). In
another project SatPlat estimated the crops of a pistachio farm using remote sensing
in Rafsanjan region on September 2019. Some of the indices provided by SatPlat
smartphone application is presented in Fig. 11.
SatPlat website currently has 237 active farmers as its users. Most of these farmers
required remote help for registration of their lands on the website and SatPlat team
gathered information and insights about improvements required for connecting with
smallholder farmer. An extremely valuable insight was that after receiving help for
initial registration of their land the farmers managed to registered and model the
other farms by themselves. During the last two months only 5% of the farmers
manage to perform the entire registration process by themselves. From the remaining
farmers 15% required help for registering their farm and close to 40% of the farmers
required help for registering themselves. This indicates that the user experience of
the platform should be improved and also educational material and courses should
be provided for the farmers. SatPlat also provided initiative solution for various
companies and governmental organization in order to educate these organizations
about the possible applications of remote sensing and artificial intelligence. For
486 B. Majidi et al.
(a)
(b)
Fig. 10 a The results of Pistachio tree counting and b the remote sensing insight provided by
SatPlat for smart irrigation
488 B. Majidi et al.
(a)
(b)
(c)
(d)
Fig. 11 Some of indices provided by SatPlat for a farm during the farming year: a NDVI, b NDRE,
c NDWI and (d) LAI
Geo-Spatiotemporal Intelligence for Smart Agricultural … 489
Fig. 13 SatPlat team at Bakutel Expo 2019, Azerbaijan with president Ilham Aliyev
5 Conclusion
In this paper, two artificial intelligence based solutions for the applications of aerial
and satellite imagery in forestry and agriculture are presented. The presented machine
learning based remote sensing solutions give the local governments and various busi-
nesses the ability to have an encompassing view of large agricultural, horticultural
490 B. Majidi et al.
and grazing fields and the forests. This ability enables them to make optimal decision
in adverse scenarios caused by climate change and drought. A series of projects and
solutions for validation of the proposed framework are discussed in this paper.
References
1. Abbasi, M.H., et al.: deep visual privacy preserving for internet of robotic things. In: 2019 5th
Conference on Knowledge Based Engineering and Innovation (KBEI) (2019)
2. Nazerdeylami, A., Majidi, B., Movaghar, A.: Smart coastline environment management using
deep detection of manmade pollution and hazards. In: 2019 5th Conference on Knowledge
Based Engineering and Innovation (KBEI) (2019)
3. Sanaei, S., Majidi, B., Akhtarkavan, E.: Deep Multisensor dashboard for composition layer of
web of things in the smart city. In: 2018 9th International Symposium on Telecommunications
(IST) (2018)
4. Norouzi, A., Majidi, B., Movaghar, A.: Reliable and energy-efficient routing for green soft-
ware defined networking. In: 2018 9th International Symposium on Telecommunications (IST)
(2018)
5. Ammour, N., et al.: Asymmetric adaptation of deep features for cross-domain classification in
remote sensing imagery. IEEE Geosci. Remote Sens. Lett. 15(4), 597–601 (2018)
6. Chen, G., et al.: Symmetrical dense-shortcut deep fully convolutional networks for semantic
segmentation of very-high-resolution remote sensing images. IEEE J. Sel. Topics Appl. Earth
Observ. Remote Sens. 11(5), 1633–1644 (2018)
7. Chen, Y., et al.: Deep fusion of remote sensing data for accurate classification. IEEE Geosci.
Remote Sens. Lett. 14(8), 1253–1257 (2017)
8. Cheng, G., et al.: When deep learning meets metric learning: remote sensing image scene
classification via learning discriminative CNNs. IEEE Trans. Geosci. Remote Sens. 56(5),
2811–2821 (2018)
9. Gavai, N.R., et al.: MobileNets for flower classification using TensorFlow. In: 2017 Interna-
tional Conference on Big Data, IoT and Data Science (BID) (2017)
10. Gong, Z., et al.: Diversity-promoting deep structural metric learning for remote sensing scene
classification. IEEE Trans. Geosci. Remote Sens. 56(1), 371–390 (2018)
11. Hamida, A.B., et al.: 3-D deep learning approach for remote sensing image classification. IEEE
Trans. Geosci. Remote Sens. 56(8), 4420–4434 (2018)
12. Kussul, N., et al.: Deep learning classification of land cover and crop types using remote sensing
data. IEEE Geosci. Remote Sens. Lett. 14(5), 778–782 (2017)
13. Li, Y., et al.: Large-scale remote sensing image retrieval by deep hashing neural networks.
IEEE Trans. Geosci. Remote Sens. 56(2), 950–965 (2018)
14. Scott, G.J., et al.: Training deep convolutional neural networks for land-cover classification of
high-resolution imagery. IEEE Geosci. Remote Sens. Lett. 14(4), 549–553 (2017)
15. Shao, Z., Cai, J.: Remote sensing image fusion with deep convolutional neural network. IEEE
J. Sel. Topics Appl. Earth Observ. Remote Sens. 11(5), 1656–1669 (2018)
16. Shi, Z., Zou, Z.: Can a machine generate humanlike language descriptions for a remote sensing
image? IEEE Trans. Geosci. Remote Sens. 55(6), 3623–3634 (2017)
17. Xiao, Z., et al.: Airport detection based on a multiscale fusion feature for optical remote sensing
images. IEEE Geosci. Remote Sens. Lett. 14(9), 1469–1473 (2017)
18. Xie, F., et al.: Multilevel cloud detection in remote sensing images based on deep learning.
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 10(8), 3631–3640 (2017)
19. Xu, P., et al.: Full-Wave simulation and analysis of bistatic scattering and polarimetric emissions
from double-layered sastrugi surfaces. IEEE Trans. Geosci. Remote Sens. 55(1), 292–307
(2017)
Geo-Spatiotemporal Intelligence for Smart Agricultural … 491
20. Yang, F., et al.: Ship detection from thermal remote sensing imagery through region-based deep
forest. IEEE Geosci. Remote Sens. Lett. 15(3), 449–453 (2018)
21. Yu, Y., et al.: An unsupervised convolutional feature fusion network for deep representation of
remote sensing images. IEEE Geosci. Remote Sens. Lett. 15(1), 23–27 (2018)
22. Yuan, Q., et al.: A multiscale and multidepth convolutional neural network for remote sensing
imagery pan-sharpening. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 11(3), 978–989
(2018)
23. Majidi, B., Bab-Hadiashar, A.: Land cover boundary extraction in rural aerial videos. In: MVA
(2007)
24. Majidi, B., Bab-Hadiashar, A.: Aerial tracking of elongated objects in rural environments.
Mach. Vis. Appl. 20(1), 23–34 (2009)
25. Sharafi, S., Majidi, B., Movaghar, A.: low altitude aerial scene synthesis using generative
adversarial networks for autonomous natural resource management. In: 2019 5th Conference
on Knowledge Based Engineering and Innovation (KBEI) (2019)
26. Majidi, B., Patra, J.C., Zheng, J.: Modular interpretation of low altitude aerial images of non-
urban environment. Digit. Signal Proc. 26, 127–141 (2014)
SaveMeNow.AI: A Machine Learning
Based Wearable Device for Fall Detection
in a Workplace
1 Introduction
Slips, trips and falls are among the main causes of accidents in a workplace in all
the countries of the world. For this reason, many fall detection approaches have
been proposed in the past literature. A possible taxonomy for them can be based
on the environment surrounding the user and the employed sensors. According to
this taxonomy, we can distinguish between ambient sensor based approaches, vision
based approaches and wearable based approaches [24].
Ambient sensor based approaches analyze the recordings of audio and video
streams from the work environment [33, 37] and/or track vibrational data derived
from the usage of pressure sensors [2, 29]. They are little intrusive for the final user;
however, they have high costs and could generate many false alarms.
Vision based approaches [23, 25] exploit image processing techniques, which
use cameras to record workers and detect their falls. They are not intrusive and can
achieve a great accuracy. However, they require to install cameras in each room to
monitor and can return many false alarms.
E. Anceschi · M. C. De Donato
Filippetti S.p.A., Falconara Marittima (AN), Italy
e-mail: [email protected]
M. C. De Donato
e-mail: [email protected]
G. Bonifazi · E. Corradini · D. Ursino (B) · L. Virgili
DII, Polytechnic University of Marche, Ancona, Italy
e-mail: [email protected]
G. Bonifazi
e-mail: [email protected]
E. Corradini
e-mail: [email protected]
L. Virgili
e-mail: [email protected]
© The Editor(s) (if applicable) and The Author(s), under exclusive license 493
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_22
494 E. Anceschi et al.
Wearable based approaches make use of wearable devices [17, 22, 34] which
workers are provided with, and, in some cases, they are combined with Machine
Learning algorithms to process data provided by these devices [26, 30]. They are
cost-effective and easy to install and setup. Moreover, they are strictly related to
people and can detect falls regardless the environment where workers are operating.
However, they can be bulky and intrusive, and their energy power and computation
capabilities are limited. Finally, analogously to what happens for the approaches
belonging to the other two categories, they could generate many false alarms [19]
because realizing a model that accurately represents the daily activities of workers
is difficult.
Nevertheless, we think that the advantages provided by this category of approaches
are extremely relevant and the current problems affecting them are, actually, challeng-
ing open issues that, if successfully faced, can open many opportunities in preventing,
or at least quickly facing, accidents in a workplace. For this reason, in this paper, we
aim at proposing a contribution in this context presenting a wearable device called
SaveMeNow.AI. This device aims at maintaining all the benefits of the previous
wearable devices proposed for the same purposes and, simultaneously, at avoiding
most, or all, of the problems characterizing them.
The hardware at the core of SaveMeNow.AI is SensorTile.box.1 This is a device
containing the ultra-low-power microcontroller STM32L4R92 and several sensors.
Among them the one of interest for us is LSM6DSOX,3 which is a six-axis Inertial
Measurement Unit (hereafter, IMU) and Machine Learning Core.
The fall detection approach we propose in this paper, which defines the behavior of
the SaveMeNow.AI device of a given worker, receives the data continuously provided
by its six-axis IMU and processes it by means of a customized Machine Learning
algorithm, conceived to determine if the corresponding worker has fallen or not.
In the affirmative case, it immediately sends an alarm to all the near workers, who
receive it through the SaveMeNow.AI device worn by them. This approach, once
defined, trained and tested, can be natively implemented in the Machine Learning
Core of LSM6DSOX.
As we will see in the following, SensorTile.box is very small and not bulky and,
as we said above, it is provided with an ultra-low-power microcontroller. We imple-
mented in it the Machine Learning approach presented in this paper and, therefore,
we optimized the exploitation of the limited computation power characterizing this
device. Finally, as we will show below, the accuracy of the defined fall detection
approach is very satisfying, and false alarms are very few. As a consequence, Save-
MeNow.AI is capable of addressing all the four open issues for wearable devices that
we mentioned above.
This paper is organized as follows: In Sect. 2, we define and, then, illustrate the
implementation and testing of the approach underlying SaveMeNow.AI. In Sect. 3,
we illustrate all the features characterizing both the hardware and the software of
1 https://www.st.com/en/evaluation-tools/steval-mksbox1v1.html.
2 https://www.st.com/en/microcontrollers-microprocessors/stm32l4r9-s9.html.
3 https://www.st.com/en/mems-and-sensors/lsm6dsox.html.
SaveMeNow.AI: A Machine Learning Based Wearable … 495
In this section, we illustrate the customized Machine Learning approach for fall
detection we have defined for SaveMeNow.AI. As a preliminary activity, we consider
relevant describing the data sources we have used for both its training and its testing.
In recent years, thanks to the pervasive diffusion of portable devices (e.g., smart-
phones, smartwatches, etc.), wearable based fall detection approaches have been
increasingly investigated by scientific community [6]. Thanks to this great interest,
it is possible to find online many public datasets to perform analyses on slips, trips
and falls or to find new approaches for their detection and management. After having
analyzed many of these datasets, we decided to select four of them for our training
and testing activities. In particular, we chose some datasets that would help us to
define a generalized model, able to adapt to the activities carried out by workers and
operators from various sectors, and performing very different movements during
their tasks.
The first dataset used is “SisFall: a Fall and Movement Dataset” (hereafter, Sis-
Fall), created by SISTEMIC. This is the Integrated Systems and Computational
Intelligence research group of the University of Antioquia [32]. This dataset consists
of 4505 files, each referring to a single activity. All the activities are grouped in 49
categories: 19 refer to ADLs (Activities of Daily Living) performed by 23 adults,
15 concern falls (Fall Activities) that the same adults had, and 15 regard ADLs per-
formed by 14 participants over 62 years of age. Data was collected by means of a
device placed at the hips of the volunteers. This device consists of different types of
accelerometers (ADXL345 and MMA8451Q) and of a gyroscope (ITG3200).
The second dataset used is “Simulated Falls and Daily Living Activities” (here-
after, SFDLAs), created by Ahmet Turan Özdemir of the Erciyes University and by
Billur Barshan of the Bilkent University [26]. It consists of 3060 files that refer to 17
volunteers who carried out 36 different kinds of activity. Each of them was repeated
by each volunteer about 5 times. The 36 categories of activities are, in turn, parti-
tioned in 20 Fall Activities and 16 ADLs. The dataset was recorded using 6 positional
devices placed on the volunteers’ head, chest, waist, right wrist, right thigh and right
ankle. Each device consists of an accelerometer, a gyroscope and a magnetometer.
496 E. Anceschi et al.
The third dataset used is “CGU-BES Dataset for Fall and Activity of Daily Life”
(hereafter, CGU-BES) created by the Laboratory of Biomedical Electronics and
Systems of the Chang Gung University [8]. This dataset contains 195 files that refer
to 15 volunteers who performed 4 Fall Activities and 9 ADLs. Data was collected
by a system of sensors consisting of an accelerometer and a gyroscope.
The fourth, and last, dataset used is the “Daily and Sports Activities Dataset”
(hereafter, DSADS) of the Department of Electrical and Electronic Engineering of
the Bilkent University [1]. This dataset comprises 9120 files obtained by sampling
152 activities carried out by 8 volunteers. Each activity had a duration of about
5 min, split into 5-s recordings. This dataset does not contain fall activities, but sport
activities. We chose it in order to make our model generalizable and, therefore, more
adaptable to most of the various situations that may occur in the working environment.
Data was collected through 5 sensors containing an accelerometer, a gyroscope and
a magnetometer, positioned on different parts of the volunteer’s body.
From these four datasets, we decided to extrapolate only the accelerometric and
gyroscopic data. This choice was motivated by two main reasons. The first concerns
data availability; in fact, the only measurements common to all dataset are accel-
eration and rotation. The second regards the ability of Machine Learning models
to obtain better performance than thresholding-based models when using accelero-
metric data, as described in [13]. By merging the acceleration and rotation data
extrapolated from the four datasets we obtained a new dataset whose structure is
shown in Table 1. It stores data from 8579 activities. 4965 of them do not represent
falls, while the remaining 3614 denote falls. Each activity has associated a file that
stores the values of the 6 parameters of interest for a certain number of samples.
Since data comes from different datasets, the number of samples associated with the
various activities is not homogeneous; in fact, it depends on the length of the activity
and the sampling frequency used in the dataset where it was originally registered.
With regard to this aspect, it should be noted that having datasets characterized by
different activity lengths and sampling frequencies does not significantly affect the
final result, as long as the sampling frequency is very high compared to the activity
length, as it is the case for all our datasets. This is because our features are little influ-
enced by the number of samples available. This happens not only for the maximum
and the minimum values, which is intuitive, but also for the mean value and the vari-
ance, because, in this case, as the number of samples increases, both the numerator
and the denominator of the corresponding formulas grow in the same way.
After building the new dataset, we applied a Butterworth Infinite Impulse Response
(i.e., IIR) second order low-pass filter with a cut-off frequency of 4 Hz to the data
stored therein. The purpose of this task was keeping the frequency response module
SaveMeNow.AI: A Machine Learning Based Wearable … 497
as flat as possible in the pass band in such a way as to remove noise. Instead, the choice
of the Butterworth filter was motivated by its simplicity and low computational cost
[15]. These features make it perfect for a possible future hardware implementation.
After performing data cleaning, through which we eliminated excess data, and
data pre-processing, through which we reduced the noise as much as possible, we
proceeded to the feature engineering phase. In particular, given a parameter ζ , whose
sampled data was present in our dataset, we considered 4 features, that is the max-
imum value, the minimum value, the mean value and the variance of ζ . If n is the
number of samples of ζ present in our dataset and ζ [k] denotes the value of the kth
sample of ζ , 1 ≤ k ≤ n, the definition of the 4 features is the one shown in Table 2.
As shown in Table 1, the parameters present in our dataset are 6, corresponding to
the values of the X , Y and Z axes returned by the accelerometer and the gyroscope.
As a consequence, having 4 features for 6 parameters at disposal, each activity can
have associated 24 features.
Finally, in a very straightforward way, each activity can have associated a two-
class label, whose possible values are Fall Activity and Not Fall Activity.
The result of all these operations is a 8579 × 25 matrix that represents the training
set used to perform the next classification activity.
In this section, we illustrate some of the analyses that we conducted on the support
dataset and that allowed us to better understand the reference scenario and, then, to
better face the next challenges.
The first activity we performed was the creation of the correlation matrix between
features, which is reported in Fig. 1.
What clearly emerged when looking at this matrix was the presence of some
evident negative correlations between the maximum and minimum values of some
parameters. Moreover, a positive correlation between the maximum values (resp.,
minimum values, variances) calculated on the various axes and on the two sensors
could be noticed. Finally, there were some parameters that had no significant correla-
tion, either positive or negative. This is particularly evident for all the cases in which
498 E. Anceschi et al.
the feature “mean value” is involved. From this analysis, we intuitively deduced
that exactly these last parameters would have played a fundamental role in the next
classification activity.
To verify if this last intuition was right, we ran a Random Forests algorithm [5]
with a 10-Fold Cross Validation [14] that allowed us to generate the list of features
sorted according to their relevance in identifying the correct class of activities.
In particular, in order to compute the relevance of features, this algorithm operates
as follows. Given a decision tree D having N nodes, the relevance ρi of a feature f i
is computed as the decrease of the impurity of the nodes splitting on f i weighted by
the probability of reaching them [12]. The probability of reaching a node n j can be
computed as the ratio of the number of samples reaching n j to the total number of
samples. The higher ρi , the more relevant f i will be. Formally speaking, ρi can be
computed as:
n j ∈N fi ϑ j
ρi =
n j ∈N ϑ j
SaveMeNow.AI: A Machine Learning Based Wearable … 499
ϑ j = w j C j − wl Cl − wr Cr
Here:
The value of ρi can be normalized to the range [0, 1]. For this purpose, it must be
divided by the sum of the relevances of all features.
ρi
ρi =
f k ∈F ρk
Fig. 3 Activities labeled as Not Fall and Fall against the mean and the maximum accelerations on
the Y axis
each activity labeled as Not Fall, while a blue cross is visualized for each activity
labeled as Fall. Looking at this diagram, we can observe that the activities labeled
as Not Fall have a very negative mean acceleration and a much lower maximum
acceleration than the ones labeled as Fall. This allows us to conclude that Random
Forests actually returned a correct result when it rated these two features as the most
relevant ones. In fact, their combination makes it particularly easy to distinguish falls
from not falls.
After having constructed a dataset capable of supporting the training task of our
Machine Learning campaign, the next activity of our research was the definition
of the classification approach to be natively implemented in the Machine Learning
Core of LSM6DSOX, i.e., the sensor at the base of SaveMeNow.AI. The first step of
this activity was to verify if one (or more) of the existing classification algorithms,
already proposed, tested, verified and accepted by the scientific community, obtained
satisfactory results in our specific scenario. Indeed, in that case, it appeared us natural
to adopt a well known and already accepted approach, instead of defining a new
one, whose complete evaluation in real scenarios would have required an ad-hoc
experimental campaign in our context, the publication in a journal and the consequent
evaluation and possible adoption by research groups all over the world, in order to
find possible weaknesses that could have been overlooked during our campaign.
In order to evaluate the existing classification algorithms, we decided to apply the
classical measures adopted in the literature, i.e., Accuracy, Sensitivity and Specificity.
If we indicate by: (i) T P the number of true positives, (ii) T N the number of true
negatives, (iii) F P the number of false positives, and (iv) F N the number of false
negatives, these three measures can be defined as:
TP +TN
Accuracy =
T P + T N + FP + FN
SaveMeNow.AI: A Machine Learning Based Wearable … 501
TP
Sensitivit y =
T P + FN
TN
Speci f icit y =
T N + FP
Accuracy corresponds to the number of correct forecasts on the total input size,
and represents the overall performance of the algorithm. Sensitivity denotes the
fraction of positive samples that are correctly identified. In our scenario, it stands for
the fraction of Fall Activities that are properly identified by the algorithms. Finally,
Specificity corresponds to the fraction of negative samples correctly identified, so it
represents the fraction of Not Fall Activities properly identified by the algorithms.
In Table 3, we report a summary of all the tested classification algorithms; in
particular, we show the mean values of Accuracy, Sensitivity and Specificity obtained
through a 10-Fold Cross Validation.
Depending on the application scenario, a metric can be more important than
another one. In our case, in which we want to detect falls in a work environment,
Sensitivity has a higher importance than Specificity. In fact, a missed alarm (cor-
responding to a Not Fall prediction of a Fall Activity) leads to a lack of assistance
to the worker. Furthermore, a false alarm can be mitigated by providing the worker
with the possibility to interact with the device and turn off the alarm.
From the analysis of the Table 3, we can observe that the Machine Learning model
that has the highest Accuracy (and, therefore, the best overall performance) is the
Decision Tree—C4.5. This model obtains excellent results also in terms of Sensitiv-
ity and Specificity. Another interesting results was obtained through the Quadratic
Discriminant Analysis, which achieves a Specificity value equal to 0.9680. However,
this last algorithm obtains low values for Accuracy and Sensitivity, which led us to
discard it.
Table 3 Accuracy, sensitivity and specificity values achieved by several classification algorithms
when applied to our dataset
Algorithm Accuracy Sensitivity Specificity
Decision tree—C4.5 0.9487 0.9391 0.9566
Decision tree—CART 0.9128 0.8910 0.9223
Multilayer perceptron 0.9270 0.8829 0.9363
k-Nearest neighbors (k = 3) 0.8790 0.8747 0.9263
Logistic regression 0.7707 0.8599 0.7057
Quadratic discriminant analysis 0.7664 0.4956 0.9680
Linear discriminant analysis 0.7557 0.4956 0.9663
Gaussian naive bayes 0.7175 0.4947 0.8989
Support vector machine 0.7141 0.4103 0.9486
502 E. Anceschi et al.
Based on all these considerations, we decided that, among the classification algo-
rithms of Table 3, the best one for our scenario was the Decision Tree—C4.5. Fur-
thermore, we evaluated that the performance it achieved was so good that it could
be adopted for our case, without the need to think about a new ad-hoc classification
algorithm, which would have hardly achieved better performance than it and would
have been exposed to all the problems mentioned at the beginning of this section.
In this section, we explain how we realized SaveMeNow.AI starting from the device
SensorTile.box. Specifically, in Sect. 3.1, we describe the main characteristics of the
hardware adopted. Then, in Sect. 3.2, we outline how we implemented the logic of
our approach in the device. Finally, in Sect. 3.3, we show how we tested it.
The choice of the IoT device for implementing our approach and constructing Save-
MeNow.AI was not simple. Indeed, it had to comply with some requirements. First, as
outlined previously, the device had to be small and ergonomic in order to be worn by
a user. Afterwards, it should have an Inertial Measurement Unit (i.e., IMU), which,
in its turn, should have contained an accelerometer and a gyroscope, as well as a
Bluetooth module able to manage the Bluetooth Low Energy (i.e., BLE) protocol.
One of the possible devices compliant with all these requirements is SensorTile.box
provided by STMicroelectronics. In Fig. 4, we report a picture of this device.
SensorTile.box was designed for supporting the development of wearable IoT
devices. It contains a BLE v4.2 module and a ultra-low-power microcontroller
STM32L4R9 that manages the following sensors:
The MLC configuration is not trivial, because it implies to also configure the
sensors of LSM6DSOX and to set all its registers. To perform this task, STMicro-
electronics provides a software tool called Unico. This is a Graphical User Interface
allowing developers to manage and configure sensors, like accelerometers and gyro-
scopes, along with the Machine Learning Core of LSM6DSOX. The output of Unico
is a header file containing the configurations of all the registers and all the informa-
tion necessary for the proper functioning of the Machine Learning models. Indeed,
thanks to Unico, it is possible to set the configuration’s parameters of MLC and
the sensors of LSM6DSOX, like the output frequency of MLC, the full scale of the
SaveMeNow.AI: A Machine Learning Based Wearable … 505
accelerometer and gyroscope, the sample window of reference for the computation
of features, and so on. We report our complete configuration in Table 4.
With this configuration, at each clock of MLC, the output of the classification
algorithm implemented therein is written to a dedicated memory register. In this
way, it is possible to read this value and, in case this last is set to Fall (which implies
that the worker who is wearing it has presumably fallen), to activate the alarm. At
this point, all the problems concerning the communication between SaveMeNow.AI
devices in presence of an alarm come into play.
In Fig. 6, we show a possible operation scenario of such an alarm. Each Save-
MeNow.AI device continuously checks its status and determines whether or not
there is a need to send an alarm. If the MLC component of the SaveMeNow.AI
device worn by a worker reports a fall, the device itself sends an alarm in broadcast
mode. All the other SaveMeNow.AI devices that are in the signal range receive the
alarm and, then, trigger help (for example, workers wearing them go to see what
happened). If no SaveMeNow.AI device is in the range of the original alarm signal,
the alarm is managed by the Gateway Devices. These must be positioned in such a
way as to cover the whole workplace area. A Gateway Device is always in a receiving
state and, when it receives an alarm, it sets a 30 s timer. After this time interval, if no
SaveMeNow.AI device was active in the reception range of the original alarm, the
Gateway Device itself sends an alarm and activates rescue operations.
As mentioned above, communications are managed through the Bluetooth pro-
tocol, in its low-energy version, called BLE. Each SaveMeNow.AI device has two
roles, i.e., Central and Peripheral. The BLE protocol is ideal for our scenario because
it allows SaveMeNow.AI to switch its role at runtime. During its normal use, a Save-
MeNow.AI device listens to any other device; therefore, it assumes the role of Central.
When the worker who wears it falls, and its MLC component detects and reports this
fall, it switches its role from Central to Peripheral and starts sending the advertising
data connected to the alarm activation.
506 E. Anceschi et al.
Fig. 6 A possible
emergency scenario
After having deployed the logic of our approach in the SensorTile.box, we proceeded
with the testing campaign. Specifically, we selected 30 volunteers, 15 males and 15
females, of different age and weight, and asked them to perform different kinds of
activity. In particular, the considered activities include all the ones mentioned in
the past literature. They are reported in Table 5. Some of them could be labeled as
Fall Activity, whereas other ones could be labeled as Not Fall Activity. In all these
activities, SaveMeNow.AI was put at the waist of the volunteers.
In Table 6, we report the confusion matrix obtained after all the activities, and the
corresponding output provided by SaveMeNow.AI.
From the analysis of this table, we can observe that the number of real Fall
Activities was 1,205; 1,170 of them were correctly recognized by SaveMeNow.AI,
whereas 35 of them were wrongly categorized by our system. On the other hand, the
number of real Not Fall Activities was 595; 540 of them were correctly recognized by
SaveMeNow.AI, whereas 55 of them were wrongly labeled by our system. Observe
that the number of real Fall Activities is much higher than the one of real Not Fall
Activities. This fact is justified because, in our scenario, Sensitivity is much more
important than Specificity. Starting from these results, we have that the Sensitivity of
SaveMeNow.AI: A Machine Learning Based Wearable … 507
Table 5 A taxonomy for Not Fall Activities (on the left) and Fall Activities (on the right)
SameMeNow.AI is equal to 0.97; its Specificity is equal to 0.91. Finally, its Accuracy
is 0.95.
After having tested SaveMeNow.AI, we can conclude that its performance is very
satisfying during both the training and the test phases. In our opinion, the dataset
used for training played a key role in obtaining this successful results, because it
contained very heterogeneous activities, which allowed us to create a generalized
model. Indeed, our model is able to distinguish sport activities from fall activities,
which is a difficult task to achieve. A careful reader could point out that a generalized
model like ours sacrifices performance to generalizability. However, we observe that
508 E. Anceschi et al.
Sensitivity (that is the most important parameter to evaluate in our scenario) is very
high; only Specificity is not particularly high. This could lead to some false alarms
that, in most cases, could be directly stopped by the worker wearing the alarming
device. On the other hand, in a work environment, which is the reference application
scenario of our approach, it is really common to assist to activities like running
or jumping, which could generate many false alarms if the model would not be
sufficiently generalized to handle them, at least partially.
4 Related Literature
Ambient sensors based approaches exploiting vibrational data are focused on the
usage of pressure sensors. For example, in [2], the authors design a floor vibration-
based fall detector. It considers the vibrations caused by objects moving on the floor,
because the vibrations generated by a human fall are different from the ones related to
normal activities. In this perspective, they use a special piezoelectric sensor, coupled
with the floor, and generate a binary fall signal in case of a fall event.
Another proposal in this setting can be found in [29]. Here, the authors propose
to use a floor sensor based on near-field imaging. This sensor detects the locations
and patterns of people by measuring the impedances with respect to a matrix of thin
electrodes under the floor. Then, a collection of features is computed starting from
the cluster of observations associated with a person. In this way, a Bayesan filter and
a Markov chain can be adopted to estimate the posture of the user and, finally, to
detect a possible fall.
The approaches based on ambient sensors are not intrusive for the final user.
However, they have two main disadvantages. The former regards their cost, while
the latter concerns the difficulty of installing them because it is necessary to setup
the whole room with sensors.
The second type of fall detection approaches concerns those based on vision
[10, 11, 21, 23, 25]. The reasoning underlying this kind of system is that cameras
are increasingly present in our daily environment and are less intrusive than other
kinds of objects (for instance, the ones that should be worn by the user). In [23], the
authors present a fall detector for smart homes based on artificial vision algorithms.
The overall system is developed through a single-board computer with an external
camera, placed in the room to be monitored. The approach consists of different
phases. First, it acquires an image and subtracts the subject from the background.
Then, it uses a Kalman filter to reduce noise in the data. Afterwards, it starts to study
the changes in the human actions. Finally, it applies a Machine Learning algorithm
to the data obtained to classify the current state of the subject.
Another interesting fall detection system is reported in [25], where the authors pro-
pose a framework for indoor scenarios using a single-camera system. This approach
is based on the analysis of motion orientation, motion magnitude and human shape
changes. According to the authors, the duration of a fall is often less than 2s, starting
when the balance is lost until the fallen person completely lies on the floor. Specif-
ically this system works as follows: when it detects an abnormally large motion,
whose direction is less than 180◦ , it continues to monitor the next 50 frames. Then,
if there is a downward movement, followed by the exceeding of AR ratio (which
represents a body width-to-height ratio) and the inclination of the angle of the major
axis of the person, a fall might have happened. Then, it monitors the next 25 frames
and, if no further movement, or just a small movement, occurs, it concludes that the
motion is a fall. If none of the above conditions is satisfied, no warning signal is sent
out and the monitoring continues.
Vision based approaches are really interesting and can achieve a great accuracy
in the fall detection process. Their main drawback concerns the necessity to install
cameras in each room to monitor, which, in turn, leads to a high installation cost.
510 E. Anceschi et al.
Finally, the last category of fall detection systems is based on wearable devices [3,
4, 16, 18, 20, 27, 31, 36]. These approaches rely on smart garments with embedded
sensors capable of detecting the motion and location of the user body. In the literature,
there are many interesting proposals, each one with different employed sensors. For
instance, in [17], the authors present a posture-based fall detection algorithm that
operates starting from the reconstruction of the posture of a user. Several wireless tags
are placed on some parts of the body, such as hips, ankles, knees, wrists, shoulders
and elbows. The locations of these tags are detected by a motion capture system,
so that it can reconstruct the complete posture of a person in a 3D plane. Finally,
acceleration thresholds, along with velocity profiles, are applied to detect falls.
A less invasive approach, based on an accelerometer, is presented in [22]. Here,
the authors use an integrated approach of waist-mounted accelerometers, so that a
fall is detected when a negative acceleration is suddenly increased, due to the change
in orientation from an upright to a lying position. A similar proposal can be found in
[34], where the authors design a wearable airbag containing an accelerometer and a
gyroscope. This airbag is inflated when acceleration and angular velocity thresholds
are exceeded.
There are also interesting fall detection proposals using Machine Learning algo-
rithms. An example is reported in [30], where the authors propose a fall detection
system consisting of a sensing unit (such as a mobile phone) and a threshold for
acceleration along three axes specific to a patient. The overall system is based on
monitoring the tri-axial accelerometer data in three different sliding time windows,
each one lasting one second. Depending on this information and the threshold related
to a patient, the authors exploit a Machine Learning algorithm to predict if she is
falling or she is conducting normal daily activity.
A similar proposal is reported in [26]. Here, the authors describe a fall detection
system with wearable motion sensor units fitted to the subject’s body at six different
positions. Each of these units comprises three tri-axial devices, i.e., an accelerometer,
a gyroscope, and a magnetometer. Then, six different Machine Learning algorithms
are tested to evaluate which one performs better than the others. Finally, the overall
system is tested in a real world scenario, obtaining interesting results. Even if this
approach achieves a high accuracy with an acceptable computation time, it could be
invasive for the final user to be adopted in everyday life.
Analogously to the other categories of fall detection approaches, the wearable
based ones have their advantages and disadvantages. The most important advantages
are their cost efficiency, their easy installation and setup. Furthermore, these systems
are not directly connected with only one place, but with a person; therefore, they can
identify falls regardless of the environment. On the other hand, some disadvantages
concern the low computation power and the high energy power consumption charac-
terizing wearable devices. Another possible disadvantage could be the intrusiveness
of the system in the user’s life, even if researchers are constantly offering increasing
small and ergonomic wearable devices.
In any case, since SaveMeNow.AI belongs to the category of wearable based
fall detection approaches, we consider appropriate to present a further comparison
SaveMeNow.AI: A Machine Learning Based Wearable … 511
Table 7 Comparison between SaveMeNow.AI and several wearable based fall detection
approaches proposed in past literature
Research Sensors Sensors’position Algorithm Results
SaveMeNow.AI Accelerometer, Waist Decision tree Sensitivity: 0.97
gyroscope
Specificity: 0.91
Accuracy: 0.95
Pannurat et al. [27] Accelerometer Waist Gaussian Accuracy: 0.91
mixture model
Sabatini et al. [31] Acceleromter, Upper right iliac Decision tree Sensitivity: 0.8
gyroscope, bone
barometric
altimeter
Specificity: 0.99
Jian and Chen [16] Accelerometer, Shirt k-NN Sensitivity: 0.95
gyroscope
Specificity: 0.96
Karantonis et al. [18] Accelerometer Waist Binary classifier Accuracy: 0.95
Zhang et al. [36] Gyroscope Waist Decision tree Specificity: 1.00
Bourke et al. [4] Accelerometer Waist One-Class SVM Accuracy: 0.96
Anania et al. [3] Accelerometer Jacket collar Decision tree Sensitivity: 0.98
Lai et al. [20] Accelerometer Waist, neck, right Decision tree Accuracy: 0.92
and left hands
between it and several other approaches belonging to this category, particularly those
ones that, like ours, use accelerometers and gyroscopes.
In Table 7, we report a comparison between SaveMeNow.AI and most wearable
based fall detection approaches proposed in the literature. This comparison considers
several characteristics, namely the position of sensors, the adopted Machine Learning
algorithm and the results obtained. From the analysis of this table, we can see that
SaveMeNow.AI returns results equivalent or better than the ones characterizing the
other approaches. In particular, the Sensitivity of SaveMeNow.AI (which, we recall,
is much more important than Specificity in our application scenario) is higher than the
one of all the other approaches, except the approach of [3] that presents a Sensitivity
slightly higher (0.98 against 0.97 reached by SaveMeNow), even if no information
about Specificity and Accuracy is provided by the authors.
5 Conclusion
the classical activities that a worker can perform in the workplace. Then, we tested
different classification algorithms and found that at least one of them, i.e. Decision
Tree based on C4.5, can reach very satisfactory results when applied on the created
dataset.
After this, we selected an IoT device available on the market and we natively imple-
mented the logic of our approach on it. As for this aspect, we observe that the choice
of SensorTile.box as the starting IoT device where implementing SaveMeNow.AI
helped us very much reaching our goals. Indeed, thanks to SensorTile.box, we were
able to implement all the operations for data collection, data pre-processing, feature
engineering and classification directly into the STM32L4R9 microcontroller and the
LSM6DSOX sensor contained in this device. This fact allowed us to obtain a relevant
energy saving and the optimization of the limited computation power characterizing
our device, as well as all the other wearable ones available in the market.
Afterwards, we tested SaveMeNow.AI in a real world scenario and found that
its performance is very satisfying, especially for Sensitivity. Finally, we proposed
a comparison between SaveMeNow.AI and several wearable based fall detection
approaches proposed in the literature.
Regarding some possible future developments of our research, we note that, cur-
rently, the only sensor of SensorTile.box used in SaveMeNow.AI is LSM6DSOX.
However, other sensors in the device may be useful to monitor some parameters to
predict and/or report possible emergency situations in a workplace. For example,
humidity, pressure and temperature sensors could be used for this purpose. In addi-
tion, the set of SaveMeNow.AI devices worn by operators in a delimited place can
be seen as a Wireless Sensor Network that could be used, similarly to what proposed
in [28], to detect emergency situations, such as fires or harmful gas leaks.
Another interesting development could be the implementation of a routing system
that can show a rescuer the shortest route to the fallen worker. Finally, SaveMeNow.AI
could be transformed into a non-invasive garment that allows a worker to perform
operations and movements in total freedom. The simplest solution would be the
insertion of the various sensors on a shirt that, once worn, would allow the evaluation
of the accelerometric and gyroscopic data in a solidary way with the body, making
data processing even more accurate. Last, but not the least, other sensors could
be added to evaluate vital parameters, such as blood pressure and heartbeat. This
would open up new frontiers in the use of SaveMeNow.AI which would also (at least
partially) become a medical device.
Acknowledgments This work was partially funded by the Department of Information Engineering
at the Polytechnic University of Marche under the project “A network-based approach to uniformly
extract knowledge and support decision making in heterogeneous application contexts” (RSAB
2018), and by the Marche Region under the project “Human Digital Flexible Factory of the Future
Laboratory (HDSFIab)—POR MARCHE FESR 2014-2020—CUP B16H18000050007’.
SaveMeNow.AI: A Machine Learning Based Wearable … 513
References
1. Altun, K., Barshan, B., Tunçel, O.: Comparative study on classifying human activities with
miniature inertial and magnetic sensors. Pattern Recognit. 43(10), 3605–3620 (2010)
2. Alwan, M., Rajendran, P.J., Kell, S., Mack, D., Dalal, S., Wolfe, M., Felder, R.: A smart
and passive floor-vibration based fall detector for elderly. In: Proceedings of the International
Conference on Information & Communication Technologies (ICICT’06), Damascus, Syria,
vol. 1, pp. 1003–1007. IEEE (2006)
3. Anania, G., Tognetti, A., Carbonaro, N., Tesconi, M., Cutolo, F., Zupone, G., De Rossi, D.:
Development of a novel algorithm for human fall detection using wearable sensors. In: Sensors,
pp. 1336–1339. IEEE (2008)
4. Bourke, A.K., Lyons, G.M.: A threshold-based fall-detection algorithm using a bi-axial gyro-
scope sensor. Med. Eng. Phys. 30(1), 84–90 (2008)
5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
6. Casilari, E., Santoyo-Ramón, J., Cano-García, J.: Analysis of public datasets for wearable fall
detection systems. Sensors 17(7), 1513 (2017)
7. Chaccour, K., Darazi, R., El Hassans, A.H., Andres, E.: Smart carpet using differential piezore-
sistive pressure sensors for elderly fall detection. In: Proceedings of the International Confer-
ence on Wireless and Mobile Computing, Networking and Communications (WIMOB’15),
Abu-Dhabi, United Arab Emirates, pp. 225–229. IEEE (2015)
8. Chan, H.L.: CGU-BES Dataset for Fall and Activity of Daily Life, p. 8 (2018)
9. Chandra, I., Sivakumar, N., Gokulnath, C.B., Parthasarathy, P.: IoT based fall detection and
ambient assisted system for the elderly. Clust. Comput. 22(1), 2517–2525 (2019)
10. Cucchiara, R., Prati, A., Vezzani, R.: A multi-camera vision system for fall detection and alarm
generation. Expert Syst. 24(5), 334–345 (2007)
11. Diraco, G., Leone, A., Siciliano, P.: An active vision system for fall detection and posture
recognition in elderly healthcare. In: Proceedings of the Design, Automation & Test in Europe
Conference & Exhibition (DATE’10), Dresden, Germany, pp. 1536–1541. IEEE (2010)
12. Genuer, R., Poggi, J.M., Tuleau-Malot, C.: Variable selection using random forests. Pattern
Recognit. Lett. 31(14), 2225–2236 (2010)
13. Gibson, R.M., Amira, A., Ramzan, N., Casaseca de-la Higuera, P., Pervez, Z.: Multiple com-
parator classifier framework for accelerometer-based fall detection and diagnostic. Appl. Soft
Comput. 39, 94–103 (2016)
14. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kauf-
mann notes (2011)
15. Hussain, F., Umair, M.B., Ehatisham ul Haq, M., Pires, I.M., Valente, T., Garcia, N.M., Pombo,
N.: An Efficient Machine Learning-based Elderly Fall Detection Algorithm (2019). arXiv
preprint 1911.11976
16. Jian, H., Chen, H.: A portable fall detection and alerting system based on k-NN algorithm and
remote medicine. China Commun. 12(4), 23–31 (2015)
17. Kaluža, B., Luštrek, M.: Fall detection and activity recognition methods for the confidence
project: a survey, A:22–25 (2009)
18. Karantonis, D.M., Narayanan, M.R., Mathie, M., Lovell, N.H., Celler, B.G.: Implementation
of a real-time human movement classifier using a triaxial accelerometer for ambulatory moni-
toring. IEEE Trans. Inform. Technol. Biomed. 10(1), 156–167 (2006)
19. Kwolek, B., Kepski, M.: Human fall detection on embedded platform using depth maps and
wireless accelerometer. Comput. Methods Program. Biomed. 117(3), 489–501 (2014)
20. Lai, C.F., Chang, S.Y., Chao, H.C., Huang, Y.M.: Detection of cognitive injured body region
using multiple triaxial accelerometers for elderly falling. Sensors 11(3), 763–770 (2010)
21. Mastorakis, G., Makris, D.: Fall detection system using Kinect’s infrared sensor. J. Real-Time
Image Process. 9(4), 635–646 (2014)
22. Mathie, M.J., Coster, A.C.F., Lovell, N.H., Celler, B.G.: Accelerometry: providing an inte-
grated, practical method for long-term, ambulatory monitoring of human movement. Physiol.
Meas. 25(2), R1 (2004)
514 E. Anceschi et al.
23. De Miguel, K., Brunete, A., Hernando, M., Gambao, E.: Home camera-based fall detection
system for the elderly. Sensors 17(12), 2864 (2017)
24. Mubashir, M., Shao, L., Seed, L.: A survey on fall detection: principles and approaches. Neu-
rocomputing 100, 144–152 (2013)
25. Nguyen, V.A., Le, T.H., Nguyen, T.H.: Single camera based fall detection using motion and
human shape features. In: Proceedings of the Symposium on Information and Communication
Technology (SoICT’16), Ho Chi Minh, Vietnam, pp. 339–344 (2016)
26. Özdemir, A.T., Barshan, B.: Detecting falls with wearable sensors using machine learning
techniques. Sensors 14(6), 10691–10708 (2014)
27. Pannurat, N., Thiemjarus, S., Nantajeewarawat, E.: A hybrid temporal reasoning framework
for fall monitoring. IEEE Sens. J. 17(6), 1749–1759 (2017)
28. Qandour, A., Habibi, D., Ahmad, I.: Wireless sensor networks for fire emergency and gas
detection. In: Proceedings of the International Conference on Networking, Sensing and Control
(ICNSC’12), Beijing, China, pp. 250–255. IEEE (2012)
29. Rimminen, H., Lindström, J., Linnavuo, M., Sepponen, R.: Detection of falls among the elderly
by a floor sensor using the electric near field. IEEE Trans. Inform. Technol. Biomed. 14(6),
1475–1476 (2010)
30. Saadeh, W., Altaf, M.A.B., Altaf, M.S.B.: A high accuracy and low latency patient-specific
wearable fall detection system. In: Proceedings of the International Conference on Biomedical
& Health Informatics (BHI’17), Orlando, FL, USA, pp. 441–444. IEEE (2017)
31. Sabatini, A.M., Ligorio, G., Mannini, A., Genovese, V., Pinna, L.: Prior-to-and post-impact
fall detection using inertial and barometric altimeter measurements. IEEE Trans. Neural Syst.
Rehabil. Eng. 24(7), 774–783 (2015)
32. Sucerquia, A., López, J.D., Vargas-Bonilla, J.F.: SisFall: a fall and movement dataset. Sensors
17(1), 198 (2017)
33. Tabar, A.M., Keshavarz, A., Aghajan, H.: Smart home care network using sensor fusion and
distributed vision-based reasoning. In: Proceedings of the International Workshop on Video
Surveillance & Sensor Networks (VSSN’06), Santa Barbara, CA, USA, pp. 145–154 (2006)
34. Tamura, T., Yoshimura, T., Sekine, M., Uchida, M., Tanaka, O.: A wearable airbag to prevent
fall injuries. IEEE Trans. Inform. Technol. Biomed. 13(6), 910–914 (2009)
35. Wang, F., Wang, Z., Li, Z., Wen, J.R.: Concept-based short text classification and ranking.
In: Proceedings of the International Conference on Information and Knowledge Management
(CIKM’14), Shangai, China, pp. 1069–1078. ACM (2014)
36. Zhang, T., Wang, J., Xu, L., Liu, P.: Fall detection by wearable sensor and one-class SVM
algorithm. In: Intelligent Computing in Signal Processing and Pattern Recognition, pp. 858–
863. Springer (2006)
37. Zhuang, X., Huang, J., Potamianos, G., Hasegawa-Johnson, M.: Acoustic fall detection using
Gaussian mixture models and GMM supervectors. In: Proceedings of the International Con-
ference on Acoustics, Speech and Signal Processing (ICASSP’09), Taipei, Taiwan, pp. 69–72.
IEEE (2009)
Artificial Intelligence and Security
Fraud Detection in Networks
1 Introduction
The value of fraudulent transactions in the EU was about 1.8 billion euro in 2016 for
card fraud [20] only. Many other financial criminal activities, like money laundering
(e.g. through multiple or over-invoicing), corruption-related transfers, VAT evasion,
identity theft, affect individuals, banks and state activities like tax collection. As the
number of transactions increases and criminal behavior becomes more sophisticated,
fraud detection requires more attention and time from human experts employed
by banks or state authorities. The need of performant automatic tools for at least
selecting the most likely fraudulent activities, but aiming also to detect new types of
ill-intentioned activities, is imperative. This is by no means valid only in the case of
financial crimes. Insurance frauds, e-commerce or social networks misconducts such
as fake reviews are other examples of domains where securing activities are critical.
Money transactions (payments, transfers, cash withdrawals, etc.) can be described
by a vector of characteristics. The suspect transfers—the anomalies—may be
regarded as outliers in a given set of training vectors. However, treating transac-
tions as independent vectors is an over-simplification, due to the intricacies of many
types of criminal behavior. It is much more adequate to treat the transactions in their
natural form, that of a graph whose nodes are the financial entities (individuals, firms,
banks) and whose edges are transactions data (amount, time, payment mode, etc.).
A directed edge between two nodes illustrates that there is a money transfer in the
respective direction, where the weight on the edge is the transferred amount. Graphs
allow to model inter-dependencies, capture the relational nature of transactions and
are also a more robust tool, as fraudsters usually do not have a global view of the
graph. Some frauds, in other words, imply a particular scheme of relationships that
can be revealed only by looking at the underlying graph structure. In those cases,
taken in isolation, an individual transaction may display little or no indication of being
fraudulent but its nature becomes apparent when the larger context is considered.
This review is concerned with such frauds that take place over networks. Most of
the times, the data consists of an attributed graph, with attributes either on the nodes,
edges or both. Fraud detection methods thus need to integrate both relational knowl-
edge, accessible from the network structure, and the numeric features that describe
graph components. Depending on the application, anomalies can be considered at
the level of a node, an edge, or a subgraph. Our present review is concerned mostly
with finding unusual subgraph patterns, as they frequently appear in the types of
application we consider. We survey several types of topological features that can be
exploited in order to differentiate legitimate graph entities from fraudulent ones and
examine how relational information can be used in tandem with other numerical data
sources.
Robustness to camouflage strategies is becoming one of the requirements for fraud
detection methods. In some cases disguise is impossible without knowledge of the
network structure, which is unavailable to all users including fraudsters. Others, still,
are more prone to be affected by deceitful actions, even in the absence of this type
of information. Throughout the survey, we signal the solutions that take camouflage
into consideration.
Whenever applicable, we also focus on solutions that take into account the evolu-
tion of a graph. The need for including the temporal dimension is two-fold. On one
hand, in some cases it is the temporal pattern that defines the anomaly. Several types
of frauds, including credit card frauds, network attacks and spam, involve some sort
of high-frequency activity. It is not the activity per se that gives away its fraudulent
nature, but the fact that it occurs often or in a specific pattern. If no temporal informa-
tion were considered, the behavior could pass as legitimate. On the other hand, there
is the pressing need to react as fast as possible to the occurrence of a misconduct.
Fraudulent event detection therefore requires temporal representation.
As an overall perspective dictated by the above anomaly environements, fraud
detection methods are thus expected to:
• function in (quasi) real-time; with the definition of real-time differing slightly from
one application to another, what is common to all fraud detection methods is the
need that the estimates they provide are actionable in due time.
• result in high specificity; since the discovery of frauds is intended to be coupled
with corrective actions that in one form or another restrict access to the network,
solutions are preferred that do not hinder the legitimate activity.
1.1.1 Notations
We now briefly define some of the notions most present in our survey.
Definition The adjacency matrix of graph G is the matrix A ∈ RV ×V with elements
in {0, 1} where V is the number of vertices and Ai j = 1 if there is an edge between
vertices i and j, and Ai j = 0 otherwise. For multi-graphs Ai j is equal to the number
of edges from x to y.
Definition The vertex indexed Laplace matrix L of graph G is L = D − A, where
D is the diagonal degree matrix associated with each vertex. For undirected graphs
L is symmetric with zero row sums.
Definition The spectrum of a graph G is the spectrum of the adjacency matrix A.
520 P. Irofti et al.
Several surveys exist that tackle similar topics. We briefly state the main differences
from our present review. Whenever appropriate, we focus on works newer than those
covered by existing surveys.
Dating 2010, [21] focuses on fraud detection techniques from an audit perspec-
tive, however it does not cover only machine learning approaches. A classification-
oriented survey on financial fraud identification can be found in [47], yet it does
not review solutions for graphs. Accuracy and specificity summaries obtained by
various machine learning methods on different types of financial frauds can be found
in [68]. The survey in [53] presents supervised and unsupervised methods and have
a dedicated section to similar applications. Credit card fraud detection methods are
reviewed in [13] and more recently in [61].
Fraud detection can be cast as an anomaly detection (AD) task, since frauds are
rare events that distinguish themselves from normal behavior. As such, methods for
anomaly identification on graphs represent a great pool of solutions to the problem of
fraud detection. Nonetheless, it must be noted that in the absence of an application-
driven definition of a graph anomaly, some general methods can be ineffective for our
task. A comprehensive survey on graph anomaly detection can be found in [2]. The
taxonomy is constructed along the following lines: (a) quantitative descriptions of
graph anomalies and qualitative explanations/explorations; (b) methods that consider
static graphs and those that deal with the evolution of the graphs; (c) graph analysis
based on the structure of the graph and on communities [2]. The distinction between
structure-based and community-based analysis is relevant to the works we present
herein and is also a point where our approach diverges from the one in [2].
In the case of simple, non-attributed graphs, the above cited survey defines
structure-based methods as those which seek information in the characteristics of the
graph to define (ab)normality: node-level measures such as degree, between nodes
measures such as number of common neighbors. Community-based methods, on
the other hand, assume in their perspective that anomalous nodes do not belong to a
community, and are found to be linking communities together. While the assumption
is true when working with a broad definition of anomalies and may apply to some
types of network frauds as well (especially those involving fake accounts of different
sorts), it does not cover a wide range of misconducts that are performed by otherwise
legitimate entities (e.g. financial frauds). Moreover, particularly in large networks
it is to be expected that not all legitimate nodes belong to communities [9]. The
definition of a community, as well as that of an anomaly is application-dependent.
The two above categories are also used in [2] when considering the case of
attributed graphs, albeit with a slightly different meaning. Attributed graphs contain
additional information, either on the nodes, thus describing features of the respective
entity or on the edges, characterizing the relationship between two entities. There-
fore, in this case, structure-based methods are, for the authors of [2], those methods
that look for unusual subgraph patterns. The definition is particularly relevant to the
fraud detection problem, since frauds are often performed by a group of entities, as a
Fraud Detection in Networks 521
scheme. As mentioned earlier, taken individually, the events making up the scheme
may look legitimate, yet they form an anomalous structure. More on several such
structures, successfully identified on a dataset of real financial transactions data,
however not dealing with attributed graphs, can be found in [17].
Returning to the perspective of [2], community-based methods are now con-
cerned with finding the odd node out in a given community. The authors introduce
a third category of relational learning. Unlike the regular learning paradigm, where
independence is assumed between entities, relational learning seeks to incorporate
connectivity information, for example by looking at one node’s neighbours as well.
2 Locality
The large variety of anomalies types arising in networked environments have led
to different research directions depending of applications. In this section we survey
literature sectors which orbit around the idea of detecting anomalies in attributed
networks through promotion of particular subgraphs having unusual structure, such
as high density or specific connectivity patterns like rings, cliques or heavy paths.
2.1 Communities
Perhaps the most widespread approach when considering anomalous patterns is that
of using community information to train a supervised learning system.
In [52] a normality measure is used to quantify the topological quality of com-
munities as well as the focus attributes of communities in attributed graphs. In other
words, normality quantifies the extent to which a neighborhood is internally con-
sistent and externally separated from its boundary. The proposed method discovers
a given neighborhood’s latent focus through the unsupervised maximization of its
normality. The respective communities for which a proper focus cannot be identified
receive low score, and are deemed as anomalous.
A modularized anomaly detection hierarchical framework has been developed
in [17] to detect static anomalous connected subgraphs, with high average weights.
For this purpose, particular community detection strategies are tailored based on
140 features (including Laplacian spectral information) and network comparison
tests (such as NetEMD). Then, a classification via random forests or simple sum of
individual (feature-based) scores is performed to highlight the anomalous subgraphs.
In directed trading networks, blackhole and volcano patterns represent groups of
nodes with inlinks only from the rest of nodes or outlinks only towards the rest nodes,
respectively. These kinds of patterns, which often have fraudulent nature, are isolated
in [40] through pruning (divide-et-impera) schemes based on structural features of
blackholes and volcanoes.
522 P. Irofti et al.
In [56], community detection methods are combined with supervised learning for
detecting money laundering groups in financial transactions. Community detection
begins with extraction of (possibly overlapping) connected components from the
transactions attributed multi-graph. Since typical fraudulent communities contain a
small number of vertices (less than 150), the excessively large connected components
extracted from the AUSTRAC dataset [37] are further decomposed through a k−step
neighborhood strategy. This entire process leads to a collection of small communities
which are classified through a supervised learning scheme (Support Vector Machine
and Random Forests).
Within communities, uncommon subgraphs can be mined using structural infor-
mation, as shown in [7]. The authors propose adaptations of the dictionary learning
problem to incorporate connectivity patterns. One such adaptation involves imposing
that the dictionary atoms express a Laplacian structure, thus creating a dictionary of
elementary relational patterns.
Evolutionary networks are considered in [10], where a community detection strat-
egy is used to highlight anomalies based on the temporal quantitative evolution of
network communities.
The intuition behind searching for dense blocks in graphs as signs of anomalies is
that some frauds are performed by repetitive activity bursts. When looking at the
graph connectivity, these activities form a dense subgraph that stands out from the
sparser normal activity. In yet other cases, the density is a consequence of multiple
malicious users acting similarly and synchronously, a behavior known as lockstep.
The dense subgraph detection problem is approached in [75]. The authors consider
a hierarchical framework of subgraph detection, where the initial graph is succes-
sively filtered in k steps until a dense cluster results. After modeling this problem as
the maximization of a nonconvex quadratic finite-sum (with k term) over integers,
several relaxations are applied: (i) ordering in node vectors (equivalent with replacing
binary constraint {0, 1} with convex interval [0, 1]); (ii) penalization of hierarchical
density order such that the k−th subgraph is more dense than the subgraph (k − 1).
Furthermore, as typical for continuous nonconvex problems, a block-coordinate gra-
dient descent algorithm with Armijo stepsize policy is presented and its convergence
to a stationary point of QP model is proved. For numerical evaluation, the authors
use the AMiner co-authorship citation network and a financial bank accounts net-
work. Large density clusters detection is proved along with superiority over existing
2-hierarchies strategies.
The approach in [5] considers tensors for modelling large scale networked data.
Starting from the fact that formation of dense blocks is the result of certain entities
sharing between two or more entries in the tensor, they construct an Information
Sharing Graph (ISG) illustrating these relations. Since the dense blocks in the tensor
model leads to dense subgraphs in the ISG, an efficient D-Spot algorithm that detects
Fraud Detection in Networks 523
these dense subgraphs is proposed. Along with some theoretical guarantees for the
subgraphs densities generated by the algorithm, the empirical evaluation shows that
D-Spot has better accuracy than other tensor-based schemes on synthetic and real
datasets such as Amazon, DARPA, Yelp and AirForce.
The work in [32] sets the goal to detect suspicious nodes in a directed graph that
have synchronized and abnormal connectivity patterns. First, a mapping is proposed
that embeds the data into a chosen feature space. Then, synchronicity and normality
measures are introduced. Similarity is computed between the points resulting from
the embedding as well as normality of the given data features relative to the rest of
the data. Parabolic lower bounds are shown for the synchronicity-normality function.
Superior performance (precision, recall, robustness) over well-known state-of-the-
art static graph anomaly detection techniques such as Oddball and OutRank is shown
on three real world datasets, namely TwitterSG, WeiboJanSG an WeiboNovSG, all
of which are complete graphs with billions of edges.
Coordinated activity can also be detected in edge streams. The problem of near
real-time detection of fraudulent edges is addressed in [19]. The authors propose a
combined metric for labeling attacks that takes into account activity bursts, as well as
path weights between source and destination regions of the graph. The solution keeps
a memory-reasonable edge sample for comparison and uses a random walk method
with a modified score in order to label fraudulent edges. Furthermore, the work in [72]
sets to uncover both structural and weight changes in graphs. A structural anomaly
is considered when the process of adding or deleting edges is not smooth, namely
when the first and second derivative of the node scores are large. The score of a node
is given a PageRank-like interpretation and is updated dynamically as modifications
appear on the graph.
In [55] constrained cycles are detected in dynamic graphs and labeled as fraudu-
lent activities in financial payments system (fake transactions). The authors consider
a directed attributed graph with varying edge-structure over time. For each incom-
ing edge between vertices (u, v), efficient algorithms are given to generate all fixed
k−length cycles between u and v. Based on empirical observed issues in the case
when high-degree vertices (hot points) are encountered in the generated paths, some
indexing procedures are proposed to boost time performance of the brute force depth-
first search algorithm. The evaluation data is based on real activity from Alibaba’s
e-commerce platform, containing both static and dynamic edges, resulting in a graph
with ≈ 109 vertices and ≈ 109 edges. Results show a somewhat improved perfor-
mance, guaranteed by the indexing procedure.
An altogether different way of looking at graphs is through the concept of k-core
structures, which represent “the maximal subgraph in which all vertices have degree
at least k” [59]. The authors study a number of real world networks (such as social
networks and citation networks) and observe several patterns related to the coreness
property. One such model, named the “Mirror Pattern”, correlates the core number
to the degree of a vertex. The authors are thus able to identify anomalies deviating
from this pattern, specifically lockstep-type attacks, as vertices with low degree and
high coreness.
524 P. Irofti et al.
A bipartite graph is a graph that can be split into two distinct subsets such that all
edges connect a node from the first subset to one from the second subset. In some
fraud detection problems, a bipartite (sub)graph occurs naturally as a result of scams,
or is a convenient way of representing the data.
One reason for the evasive nature of network frauds is the difficulty of finding the
right focus: looking at individual nodes/edges may not reveal anything suspicious,
and likewise, considering a too large set of entities may obscure fraudulent activities
occurring within the group [69]. The work in [49] considers two types of identities in
auction networks besides honest users: frauds and accomplices. The latter category
supports frauds, but also acts as a camouflage, by adding legitimate activities to
the frauds’ repertory. The networks these two types of users create form a bipartite
core within the large graph. The authors then develop a belief propagation algorithm,
which infers the identity of a node by evaluating the neighbours. An adaptation is also
provided that efficiently solves the identification problem when the graph structure
evolves in time.
Cases when the two classes of nodes involve mutual relations are approached in
[42], where bipartite graphs are also used to represent data. The fraudulent instances
considered here are assumed to satisfy some given empirically observed traits such
as: fraudsters engage as much firepower as possible to boost customer objects, sus-
picious objects seldom attract non-fraudulent users and fraudulent attacks are well
represented by bursts of activity. Further, they detect fraudulent blocks correspond-
ing to both vertices sets in the bipartite graph and formulate a metric that measures
to what extent a given block obeys the fixed traits. By maximizing this metric over
the entire data, suspicious blocks are labeled. The experiments show that the solution
achieves significant accuracy improvements on synthetic and real data, compared to
other fraud detection methods.
More particular bipartite reviewer-product data is considered in [14] and, using
unsupervised algorithmic heuristics, the authors aim to find fraudulent groups of
reviewers that typically write fraud reviews to promote/demote certain products.
DeFrauder detects suspicious groups by several coherent behavioral signals of
reviewers based on particular quantitative measures such as: reviewer tightness,
neighbor tightness and product tightness. Also, the ranking of groups over the spam-
icity degree is realized through a specific ranking strategy. Experiments on four real-
world labeled datasets (including Amazon and Playstore) show that the DeFrauder
algorithm outperforms certain baselines, having 11.35% higher accuracy in group
detection.
Furthermore, deep learning is used in [66] to design novel graph fraud detection
methods. The data, representable as a bipartite graph (e.g. nodes are users on one side
and products on the other), is embedded into a latent space such that the representa-
tions of the suspicious users in the same fraud block sit as close as possible, while the
representations of the normal users are distributed uniformly in the remaining latent
space. In this way, the additional density-based detection methods might easily detect
Fraud Detection in Networks 525
the fraud blocks. In fact, the deep model from [66] involves an autoencoder used to
reconstruct the “user” nodes information from the bipartite graph and, at the same
time, to ensure that in the low dimensional latent space the anomalous instances are
sufficiently similar with respect to a proposed similarity measure. Thus, the objec-
tive function of the minimization problem contains the nonlinear composite terms
associated to reconstruction and similarity. Experimental evaluations are given for
some synthetic datasets and a real-world network attack dataset. The tests show that
the model is able to robustly detect multiple fraud blocks without predefining the
number of blocks, in comparison with other state-of-the-art baselines (which do no
rely on deep learning) such as HoloScope, D-cube and others.
3 Hybrid Clustering
that are introduced in order to extract semantic relations among multiple transactions
(using the paths connecting them) and aggregation of label information. Over baseline
methods such as Support Vector Machine (SVM) and Random Forests, the proposed
HitFraud algorithm provides a boost of 7.93% in recall and 4.63% in F-score on the
EA data.
An integrated anomaly detection framework for attributed networks is proposed
in [41]. A preliminary clustering strategy is presented, which provides the degrees to
which attributes are associated to each cluster. Then, a subsequent unsupervised
learning procedure is applied based on the representation of the links and data
attributes by the set of outcome vectors from the clustering stage. Finally, the abnor-
mal attributes and their corresponding degrees of abnormality are computed on the
basis of these representations. In [3], clusters are constructed of nodes that have
“similar connectivity” and exhibit “feature coherence”, based on the intuition that
clusters are a way to compress the graph. As such, the Minimum Description Length
(MDL) principle is used to derive a cost function that encodes both the connectivity
matrix and the feature matrix.
Edge-attributed networks are considered in [58] from an information-theoretic
perspective similarly based on MDL. The algorithm consists of a combination
between an aggregation step of neighborhood attributes information and a clustering
step used to provide an abnormality score on each node using the aggregated data.
MDL is also used in [64], where normative graph substructures are identified by
taking into account some coverage rules and their quantitative occurrence is estab-
lished. Anomalous substructures are selected from those with least occurrences in
the graph.
In [65] certain types of anomalies are detected based on scoring each node (or
an entire subgraph) using statistical neighborhood information, such as the distance
between the attributes of the node and its neighbors. A combination of this scoring
procedure with a deep autoencoder is also provided. Several social network statistical
metrics and clustering techniques are also used in [11] to detect fraud in a factoring
company.
The model from [38] defines a normal instance as one that has a sparse represen-
tation on a set of some representative instances. The problem of anomaly detection
is thus cast as a minimization problem of the representation residual. The network
structure is included in the model through a Laplacian type quadratic penalty. Fur-
thermore, the model developed in [51] selects a subset of representative instances on
the space of attributes that are closely hinged with the network topology based on
CUR decomposition, and then measures the normality of each instance via residual
analysis.
The authors of [39] adopt a finite mixture model to interpret the underlying
attribute distributions in a vertex-attributed graph. In order to accommodate graph
structure, network entropy regularizers are proposed to control the mixture pro-
portions for each vertex, which facilitates assigning vertices into different mixture
components. A deterministic annealing expectation maximization is considered to
estimate model parameters.
528 P. Irofti et al.
4 Perspectives
So far, we have surveyed methods that consider a panoptic view of the graph, where
topological information is typically conveyed by connectivity matrices such as the
adjacency matrix and graph Laplacian or is summarized using different network
statistics. We turn now to alternative ways of exploiting connectivity information, as
well as to the case where a single graph has limited descriptive power.
An egonet is the induced subgraph formed by the neighbors of a single node. The
authors of [1] have done an extensive research on several real-world networks and
found that using carefully extracted, but otherwise intuitive features for describing an
egonet, unforeseen normality power laws appear. These patterns describe dependen-
cies such as between the number of nodes and edges, between the weights and number
of edges, the principal eigenvalue and total edge weight. The power laws, which have
been tested for both uni- and bipartite, weighted and unweighted graphs, encourage
the use of various outlier detection metrics. A mix of distance-based heuristic and
a density-based score is used in [1]. The method is used to identify several types of
graph anomalies such as (near-)cliques, stars, heavy vicinities and heavy links.
The work in [57] uses the egonet approach and a thresholding technique to detect
anomalous cliques, considering statistical measures of the underlying graph model.
A similar solution uses egonets for fraud detection in e-banking transaction data [67].
Authors use the Mahalanobis distance metric to label anomalous accounts.
A combination of egonet attributes and node features is used in [25], together
with a set of recursive features. The latter consist of aggregated means and sums of
different network metrics. The method thus constructs an abstract characterization of
a node that serves in classification and de-anonymization (identity resolution) tasks.
A different connectivity formulation can also be obtained by casting the multi-
attribute representation of a graph as a multi-view formulation. This implies looking
at the relationship between the entities (nodes) from different perspectives, namely
one for each attribute. The authors of [48] propose a metric for quantifying how
suspicious a group of nodes is, by extending previous single-view metrics such as
mass (sum of edge weights) or average degree to the multi-view setting. Aggregat-
ing across views led to 89% precision in detecting organizations that violated the
Snapchat Terms of Service through different fraud schemes.
If the nodes (or edges) of a network are highly heterogeneous, working with the
entire graph might leave out important relationship attributes. Consider the case
where nodes can represent actors that interact differently across domains. Example:
530 P. Irofti et al.
individuals connections with colleagues, friends and family which can be represented
as three separate graphs or a single graph with multiple edges between nodes (e.g. col-
leagues can also be friends). In other words, the underlying graph is multi-relational.
In [22] anomalies are viewed as nodes that perform inconsistently across multiple
sources (e.g. a close family member that is not friendly). Each source Si is used to
build a simple graph G i containing the same vertices V but with different edges E i .
Let vi j represent vertex j from graph G i , the multigraph G is built by connecting the
identical vertices from all source nodes, i.e. there will exist an edge between vi j and
vk j with k = i, ∀ j ∈ V . Let m i j be the constraint placed between two sources Si and
S j with Mi j = m i j I . The adjacency matrix of the resulting graph is built by placing
on the diagonal the adjacency matrix of each graph G i and Mi j on the corresponding
off-diagonal entries. With this the authors build the Laplace matrix and use it to
select the largest k eigenvectors that they further split into two parts P and Q, thus
performing soft spectral clustering. Finally they assign an anomalous score to each
vertex v j through the cosine distance between the corresponding vectors from p j and
q j , respectively. We note that a similar multi-source approach, called unmasking, has
been applied in computer vision for deep anomaly detection in video content [27].
5 Challenges
In this section we briefly mention two general issues that occur in fraud detection
applications. The first concerns the case where the real graph is actually not known
or is incomplete. In practice, this situation can arise for several reasons, including
privacy protocols. The second issue is that of missing positive examples, which often
occurs as a consequence of anomalies being extremely scarce.
While some methods for fraud detection are graph-agnostic and treat transactions as
regular signals, as the field matures, it becomes a requirement to include information
on the underlying network. In some applications however, the real network is not
available. Dictionary learning (DL) [16] offers a solution for estimating the graph,
as well as classifying the signals. In DL, one seeks for a sparse representation of the
samples, as well as the basis (dictionary) for that representation. When the signals lie
on a weighted graph, the dictionary is also required to capture the underlying graph
structure. One approach is to include the graph Laplacian, which incorporates graph
patterns, into the learning problem.
Often, signals are similar when the nodes they rest on are connected, contributing
to the smoothness property of the graph. Integrating smoothness ensures that the
structure of the dictionary captures the graph topology and is generally obtained
via a regularization term that controls the similarity between dictionary atoms with
Fraud Detection in Networks 531
Oftentimes, the datasets for fraud detection are extremely unbalanced, containing
only few illegitimate activities. Moreover, new fraudulent schemes may constantly
appear, as a response to increasing anti-fraud methods and policies. In those cases,
no positive examples exist in the databases.
One-class classification problems refers precisely to those two-class problems
where the main class is the well sampled (“normal” samples), and the other one is a
severely undersampled because of its extremely diverse nature (“abnormal” samples).
The main objective of one-class classification technique is to distinguish between a
set of target objects and the rest of existing objects, which are defined as anomalies or
outliers [60]. One-class Support Vector Machine (OC-SVM) represents an effective
boundary based classification method which provides an optimal hyperplane with
maximum margin between the data points and the origin. A new data sample will be
classified as normal if it is located within the boundary and conversely, as abnormality
when if lies outside the boundary [4, 18, 35, 62]. The method has been applied on
money laundering applications and for finding fraudulent credit card transactions
[24, 33].
One approach to solving the problem of missing positive examples in training
data can be found in [34], that uses the Markov decision framework to estimate
the distribution of the missing anomalous samples. Experiments show encouraging
results even in the case where the estimated distribution is not the real one. A newer
approach uses generative adversarial networks (GAN) to generate positive examples
[76]. The method was tested on a Wikipedia database with the aim of identifying
editing vandals. Results show improved precision and accuracy over three baseline
algorithms: one-class SVM, one-class nearest neighbors and one-class Gaussian
process.
Other approaches consider human intervention. In [15], the anomaly detection
problem in interactive attributed networks is approached by allowing the system
to proactively communicate with the human expert in making a limited number
of queries about ground truth anomalies. The problem is formulated in the multi-
armed bandit framework and after applying some basic clustering methods, it aims
532 P. Irofti et al.
to maximize the true anomalous nodes presented to the human expert in the given
number of queries. The results show certain improvements compared to similar
approaches.
In [54] it is shown that unsupervised anomaly detection is an undecidable problem,
requiring priors to be assumed on the anomaly distribution. In the expert feedback
context, a new layer extension is analyzed, that can be applied on top of any unsuper-
vised anomaly detection system based on deep learning to transform it in an active
anomaly detection system. In other words, the strategy is to iteratively select a num-
ber of the most probable samples to be audited, wait for the expert to select their label,
and continue training the system using the new information. Various improvements
have been shown over the state-of-the-art deep approaches.
6 Conclusions
Acknowledgements This work was supported by BRD Groupe Societe Generale through Data
Science Research Fellowships of 2019.
References
1. Akoglu, L., McGlohon, M., Faloutsos, C.: Oddball: Spotting anomalies in weighted graphs.
In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 410–421 (2010)
2. Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey.
Data Min. Knowl. Discov. 29, 626–688 (2014)
3. Akoglu, L., Tong, H., Meeder, B., Faloutsos, C.: PICS: Parameter-free Identification of Cohe-
sive Subgroups in Large Attributed Graphs. In: Proceedings of the 2012 SIAM International
Conference on Data Mining, pp 439–450 (2012)
4. Amer, M., Goldstein, M., Abdennadher, S.: Enhancing one-class support vector machines for
unsupervised anomaly detection. In: Proceedings of the ACM SIGKDD Workshop on Outlier
Detection and Description, pp. 8–15 (2013)
5. Ban, Y., Liu, X., Duan, Y., Liu, X., Xu, W.: No place to hide: catching fraudulent entities in
tensors. In: The World Wide Web Conference, pp. 83–93 (2019)
6. Bhatia, S., Hooi, B., Yoon, M., Shin, K., Faloutsos, C.: Midas: microcluster-based detector
of anomalies in edge streams. In: Association for the Advancement of Artificial Intelligence
(2020)
7. Băltoiu, A., Pătraşcu, A., Irofti, P.: Graph anomaly detection using dictionary learning. In: The
21st World Congress of the International Federation of Automatic Control, pp. 1–8 (2020)
Fraud Detection in Networks 533
8. Cao, B., Mao, M., Viidu, S., Yu, P.S.: Collective fraud detection capturing inter-transaction
dependency. In: Proceedings of Machine Learning Research, KDD 2017, vol. 71, pp. 66–75
(2017)
9. Chen, J., Saad, Y.: Dense subgraph extraction with application to community detection. IEEE
Trans. Knowl. Data Eng. 24(7), 1216–1230 (2012)
10. Chen, Z., Hendrix, W., Samatova, N.F.: Community-based anomaly detection in evolutionary
networks. J. Intell. Inf. Syst. 39(1), 59–85 (2012)
11. Colladon, A.F., Remondi, E.: Using social network analysis to prevent money laundering.
Expert Syst. Appl. 67, 49–58 (2017)
12. Cucuringu, M., Blondel, V.D., Van Dooren, P.: Extracting spatial information from networks
with low order eigenvectors. Phys. Rev. E 87, 032803 (2013)
13. Delamaire, L., Abdou, H., Pointon, J.: Credit card fraud and detection techniques: a review.
Banks Bank Syst. 4, 57–68 (2009)
14. Dhawan, S., Gangireddy, S.C.R, Kumar, S., Chakraborty, T.: Spotting collective behaviour of
online frauds in customer reviews. In: Proceedings of the Twenty-Eighth International Joint
Conference on Artificial Intelligence (IJCAI-19), pp. 245–251 (2019)
15. Ding, K., Li, J., Liu, H.: Interactive anomaly detection on attributed networks. In: Proceedings
of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 357–365
(2019)
16. Dumitrescu, B., Irofti, P.: Dictionary Learning Algorithms and Applications. Springer (2018)
17. Elliott, A., Cucuringu, M.C., Luaces, M.M., Reidy, P., Reinert, G.: Anomaly detection in
networks with application to financial transaction networks. Arxiv: arXiv:1901.00402 [stat.AP]
(2018)
18. Erfani, S.M., Rajasegarar, S., Karunasekera, S., Leckie, C.: High-dimensional and large-scale
anomaly detection using a linear one-class svm with deep learning. Pattern Recognit. 58(C),
121–134 (2016)
19. Eswaran, D., Faloutsos, C.: Sedanspot: detecting anomalies in edge streams. In: IEEE Interna-
tional Conference on Data Mining (ICDM), pp. 953–958 (2018)
20. European Central Bank. Ecb report shows a fall in card fraud in 2016. https://www.ecb.europa.
eu/press/pr/date/2018/html/ecb.pr180926.en.html, 26 September 2018. Accessed 29 Feb 2020
21. Flegel, U., Vayssiere, J., Bitz, G.: A state of the art survey of fraud detection technology. In
Probst, C., Hunker, J., Gollmann, D., Bishop, M. (eds.) Insider Threats in Cyber Security, pp.
73–84. Springer (2010)
22. Gao, J., Du, N., Fan, W., Turaga, D., Parthasarathy, S., Han, J.: A multi-graph spectral frame-
work for mining multi-source anomalies. In: Graph Embedding for Pattern Analysis, pp. 205–
227. Springer (2013)
23. Guo, Q., Li, Z., An, B., Hui, P., Huang, J., Zhang, L., Zhao, M.: Securing the deep fraud detector
in large scale e-commerce platform via adversarial machine learning approach. In: Proceedings
of the 2019 World Wide Web Conference (WWW’19), pp. 616–626 (2019)
24. Hejazi, M., Singh, Y.P.: One-class support vector machines approach to anomaly detection.
Appl. Artif. Intell. 27(5), 351–366 (2013)
25. Henderson, K., Gallagher, B., Li, L., Akoglu, L., Eliassi-Rad, T., Tong, H., Faloutsos, C.: It’s
who you know: graph mining using recursive structural features. In: Proceedings of the 17th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD
’11, pp. 663–671. New York, NY, USA. Association for Computing Machinery (2011)
26. Huang, Z., Ye, Y., Li, X., Liu, F., Chen, H.: Joint weighted nonnegative matrix factorization
for mining attributed graphs. In: Pacific-Asia Conference on Knowledge Discovery and Data
Mining, pp. 368–380 (2017)
27. Ionescu, R.T., Smeureanu, S., Alexe, B., Popescu, M.: Unmasking the abnormal events in
video. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2895–
2903 (2017)
28. Irofti, P., Băltoiu, A.: Malware identification with dictionary learning. In: 27th European Signal
Processing Conference, pp. 1–5 (2019)
534 P. Irofti et al.
29. Irofti, P., Băltoiu, A.: Unsupervised dictionary learning for anomaly detection. Arxiv:
arXiv:2003.00293 (2019)
30. Irofti, P., Stoican, F.: Dictionary learning strategies for sensor placement and leakage isolation
in water networks. In: The 20th World Congress of the International Federation of Automatic
Control, pp. 1589–1594 (2017)
31. Jiang, M., Cui, P., Beutel, A., Faloutsos, C., Yang, S.: Inferring strange behavior from con-
nectivity pattern in social networks. In: Pacific-Asia Conference on Knowledge Discovery and
Data Mining, pp. 126–138. Springer (2014)
32. Jiang, M., Cui, P., Beutel, A., Faloutsos, C., Yang, S.: Catching synchronized behaviors in large
networks: a graph mining approach. ACM Trans. Knowl. Discov. Data 10(4), 1–27 (2016)
33. Jun, T., Jian, Y.: Developing an intelligent data discriminating system of anti-money laundering
based on SVM. In: 2005 International Conference on Machine Learning and Cybernetics, vol.
6, pp. 3453–3457 (2005)
34. Kocsis, L., György, A.: Fraud detection by generating positive samples for classification from
unlabeled data. In: Proceedings of the 27th International Conference on Machine Learning.
Workshop on Machine Learning and Games (2010)
35. Lamrini, B., Gjini, A., Daudin, S., Pratmarty, P., Armando, F., Travé-Massuyès, L.: Anomaly
detection using similarity-based one-class svm for network traffic characterization. In: 29th
International Workshop on Principles of Diagnosis (2018)
36. Larik, A.S., Haider, S.: Clustering based anomalous transaction reporting. Procedia Comput.
Sci. 3, 606–610 (2011)
37. Latimer, P.: Australia: Australian transaction reports and analysis centre (austrac). J. Financ.
Crime 3, 306–307 (1996)
38. Li, J., Dani, H., Hu, X., Liu, H.: Radar: residual analysis for anomaly detection in attributed
networks. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial
Intelligence, pp. 2152–2158 (2017)
39. Li, N., Sun, H., Chipman, K.C., George, J., Yan, X.: A probabilistic approach to uncovering
attributed graph anomalies. In: Proceedings of the 2014 SIAM International Conference on
Data Mining, pp. 82–90 (2014)
40. Li, Z., Xiong, H., Liu, Y., Zhou, A.: Detecting blackhole and volcano patterns in directed
networks. In: 2010 IEEE International Conference on Data Mining, pp. 294–303 (2010)
41. Liu, N., Huang, X., Hu, X.: Accelerated local anomaly detection via resolving attributed net-
works. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intel-
ligence, pp. 2337–2343 (2017)
42. Liu, S., Hooi, B., Faloutsos, C.: Holoscope: Topology-and-spike aware fraud detection. In:
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management,
pp. 1539–1548 (2017)
43. Miller, B.A., Arcolano, N., Bliss, N.T.: Efficient anomaly detection in dynamic, attributed
graphs: emerging phenomena and big data. In: IEEE International Conference on Intelligence
and Security Informatics, pp. 179–184 (2013)
44. Miller, B.A., Beard, M.S., Bliss, N.T.: Eigenspace analysis for threat detection in social net-
works. In: Proceedings of the 14th International Conference on Information Fusion (FUSION),
pp. 1–7 (2011)
45. Miller, B.A., Beard, M.S., Wolfe, P.J., Bliss, N.T.: A spectral framework for anomalous sub-
graph detection. IEEE Trans. Signal Process. 63(16), 4191–4206 (2015)
46. Miller, B.A., Bliss, N.T., Wolfe, P.J.: Toward signal processing theory for graphs and non-
euclidean data. In: IEEE International Conference on Acoustics Speech and Signal Processing
(ICASSP), pp. 5414–5417 (2010)
47. Ngai, EWT., Hu, Y., Wong, Y.H., Chen, Y., Sun, X.: The application of data mining techniques
in financial fraud detection: a classification framework and an academic review of literature.
Decis. Support Syst. 50(02), 559–569 (2011)
48. Nilforoshan, H., Shah, N.: Slicendice: mining suspicious multi-attribute entity groups with
multi-view graphs. In: 2019 IEEE International Conference on Data Science and Advanced
Analytics (DSAA), pp. 351–363 (2019)
Fraud Detection in Networks 535
49. Pandit, S., Chau, D.H., Wang, S., Faloutsos, C. Netprobe: a fast and scalable system for fraud
detection in online auction networks. In: Proceedings of the 16th international conference on
World Wide Web, pp. 201–210 (2007)
50. Pastor-Satorras, R., Castellano, C.: Distinct types of eigenvector localization in networks. Sci.
Rep. 6 (2016)
51. Peng, Z., Luo, M., Li, J., Liu, H., Zheng, Q.: Anomalous: a joint modeling approach for anomaly
detection on attributed networks. In: Proceedings of the Twenty-Seventh International Joint
Conference on Artificial Intelligence, IJCAI-18, pp. 3513–3519 (2018)
52. Perozzi, B., Akoglu, L.: Scalable anomaly ranking of attributed neighborhoods. In: Proceedings
of the 2016 SIAM International Conference on Data Mining, pp. 207–215 (2016)
53. Phua, C., Lee, V., Smith, K., Gayler, R.: A comprehensive survey of data mining-based fraud
detection research. Intell. Comput. Technol. Autom. (ICICTA), pp. 50–53 (2010)
54. Pimentel, T., Monteiro, M., Viana, J., Veloso, A., Ziviani, N.: A generalized active learning
approach for unsupervised anomaly detection. CoRR, abs/1805.09411 (2018)
55. Qiu, X., Cen, W., Qian, Z., Peng, Y., Zhang, Y., Lin, X., Zhou, J.: Real-time constrained cycle
detection in large dynamic graphs. Proc. VLDB Endow. 11(12), 1876–1888 (2018)
56. Savage, D., Wang, Q., Chou, P., Zhang, X., Yu, X.: Detection of money laundering groups
using supervised learning in networks. arXiv preprint arXiv:1608.00708 (2016)
57. Sengupta, S.: Anomaly Detection in Static Networks using Egonets (2018)
58. Shah, N., Beutel, A., Hooi, B., Akoglu, L., Günnemann, S., Makhija, D., Kumar, M., Faloutsos,
C.: Edgecentric: anomaly detection in edge-attributed networks. In: 2016 IEEE 16th Interna-
tional Conference on Data Mining Workshops (ICDMW), pp. 327–334 (2016)
59. Shin, K., Eliassi-Rad, T., Faloutsos, C.: Patterns and anomalies in k-cores of real-world graphs
with applications. Knowl. Inf. Syst. 677–710 (2017)
60. Skretting, K., Engan, K.: Intrusion detection in computer networks by a modular ensemble of
one-class classifiers. Inf. Fus. 9(1), 69–82 (2008)
61. Sorournejad, S., Zojaji, Z., Atani, R.E., Monadjemi, A.H.: A survey of credit card fraud detec-
tion techniques: data and technique oriented perspective. arXiv: abs/1611.06439 (2016)
62. Tian, Y., Mirzabagheri, M., Bamakan, H., Wang, S.M.H., Qu, Q.: Ramp loss one-class support
vector machine; a robust and effective approach to anomaly detection problems. Neurocom-
puting 310, 223–235 (2018)
63. Tong, H., Lin, C.-Y.: Non-negative residual matrix factorization with application to graph
anomaly detection. In: Proceedings of the 2011 SIAM International Conference on Data Min-
ing, pp. 143–153 (2011)
64. Velampalli, S., Eberle, W.: Novel graph based anomaly detection using background knowledge.
In: FLAIRS Conference (2017)
65. Vengertsev, D., Thakkar, H.: Anomaly detection in graph: unsupervised learning, graph-based
features and deep architecture. Tech. Rep. (2015)
66. Wang, H., Zhou, C., Wu, J., Dang, W., Zhu, X., Wang, J.: Deep structure learning for fraud
detection. In: IEEE International Conference on Data Mining, pp. 567–576 (2018)
67. Wang, Y., Wang, L., Yang, J.: Egonet based anomaly detection in e-bank transaction networks.
IOP Conf. Ser. Mater. Sci. Eng. 715, 012038 (2020)
68. West, J., Bhattacharya, M., Islam, R.: Intelligent financial fraud detection practices: an inves-
tigation. In: International Conference on Security and Privacy in Communication Networks,
pp. 186–203. Springer (2014)
69. Wu, L., Wu, X., Lu, A., Zhou, Z.H.: A spectral approach to detecting subtle anomalies in
graphs. J. Intell. Inf. Syst. 41(2), 313–337 (2013)
70. Xu, Z., Ke, Y., Wang, Y., Cheng, H., Cheng, J.: A model-based approach to attributed graph
clustering. In: Proceedings of the 2012 ACM SIGMOD International Conference on Manage-
ment of Data, SIGMOD ’12, pp. 505–516. New York, NY, USA, ACM (2012)
71. Ying, X., Wu, X., Barbará, D.: Spectrum based fraud detection in social networks. In: 2011
IEEE 27th International Conference on Data Engineering, pp. 912–923. IEEE (2011)
72. Yoon, M., Hooi, B., Shin, K., Faloutsos, C.: Fast and accurate anomaly detection in dynamic
graphs with a two-pronged approach. In: Proceedings of the 25th ACM SIGKDD International
536 P. Irofti et al.
Conference on Knowledge Discovery & Data Mining, KDD ’19, pp. 647–657. Association for
Computing Machinery (2019)
73. Yu, W., Cheng, W., Aggarwal, C..C, Zhang, K., Chen, H., Wang, W.: Netwalk: a flexible
deep embedding approach for anomaly detection in dynamic networks. In: Proceedings of the
24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp.
2672–2681 (2018)
74. Yuan, S., Wu, X., Li, J., Lu, A.: Spectrum-based deep neural networks for fraud detection. In:
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management,
pp. 2419–2422 (2017)
75. Zhang, S., Zhou, D., Yildirim, M.Y., Alcorn, S., He, J., Davulcu, H., Tong, H.: Hidden: hierar-
chical dense subgraph detection with application to financial fraud detection. In: Proceedings
of the 2017 SIAM International Conference on Data Mining, pp. 570–578 (2017)
76. Zheng, P., Yuan, S., Wu, X., Li, J., Lu, A.: One-class adversarial nets for fraud detection. In:
Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1286–1293 (2018)
Biomorphic Artificial Intelligence:
Achievements and Challenges
1 Introduction
Artificial intelligence technologies are increasingly moving away from where they
started—from modeling human behavior. Currently, quite a few people use processes
associated with neural networks of the brain to implement software, and all funding
depends on the specific tasks performed on time.
If to take into account the generally accepted point of view “Everything is compli-
cated,” as well as bitter experience in this area (for example, two AI winters), then
we have no choice but to accept status quo, and slowly make small steps improving
existing algorithms in order to increase profit for various companies by increasing
the accuracy of methods.
But is the brain really so complex to simulate it as a black box? Below we describe
what technologies applicable to create humanoid AI have appeared recently. But
before that, we emphasize that the practical ideas of AI undergo the same evolution
as the behavior and brain of animals.
Scientists are constantly improving old models, combining methods, experi-
menting with their models, choosing the best ones, and when it comes to practice,
those models are selected that are best suited for specific tasks, finding their niche,
so AI ideas have even more similarities with biology than the researchers themselves
might suppose at first glance. In the next subsection we will discuss just that.
© The Editor(s) (if applicable) and The Author(s), under exclusive license 537
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_24
538 D. O. Chergykalo and D. A. Klyushin
The term “biomorphic AI” in the title can confuse those who are familiar with
modern developments of biomorphic neural networks, for example Blue Brain, but
here it’s better to clarify. This term is used in art and means something based on
natural patterns or forms resembling nature and living organisms. By biomorphism,
we mean precisely the similarity of the general form and organization of processes
in AI that we offer.
In this subsection, we show the similarity of the processes that were created for
AI and the real organization of processes in the brain and psyche. Since we describe
humanoid AI, it is quite logical that it will contain neural networks, as already noted,
we consider the term “biomorphism” more generally, and we believe that traditional
neural networks are efficient in use and, if used correctly, are no worse than natural
neural networks. Why we chose them in general terms will be explained in the Sect. 3.
As is well known, neural networks have recently undergone a period of intensive
development. There are many intriguing parallels between artificial and natural neural
networks. An interesting example is dropout algorithm—random removal of neurons
in an artificial neural network that helps to make the neural network more stable
and reduces overfitting on the available data [24]. The authors of that papers note
that neurons during the dropout process can be compared with genes during sexual
reproduction—only a certain part of them is realized in the descendant. Therefore, if
an individual possessed good features, but these features were determined by a large
set of genes, in other words, a whole large co-adaptation of genes, then this set is
very likely to be passed on to the next generation due to the random nature of their
transmission, therefore only genes that themselves remain in nature benefit, or small
sets of such genes.
However, there is a more direct biological rationale for the dropout algorithm,
because in the human brain neurons do experience similar processes. As noted in
[2], neurons are trying to effectively adapt to the data, trying not to succumb to large
co-adaptations and their random dropout occurs. Oddly enough, few people mention
such a direct connection.
The artificial neural networks that we program usually develop normally in isola-
tion from the outside world. They have the opportunity to learn all the variety of
information at the same time, without intermediate tests of performance. On the
contrary, a person needs to develop gradually and immediately apply his knowledge
in practice. Information appears in the brain in certain portions, having received
which, it can immediately switch to using the knowledge gained. Naturally, these
portions cannot describe the whole variety of information, so such training will
constantly make random deviations from training based on complete information.
But this turns out to be not even bad: if the portion of information (sample) has small
deviations, then they will not allow to leave the region of the global minimum of
error (corresponding to complete information), but at the same time they will help
to leave the region of local minimums.
Biomorphic Artificial Intelligence: Achievements and Challenges 539
In the field of artificial neural networks, this approach is called Small Batch
Training, and its effectiveness is well described in the article [13]. It also indicates
that it is optimal to take a sample with sizes ranging from m = 2 to m = 32. In
other words, even m = 2 may turn out to be the most effective option for a specific
problem.
If we consider the approach described above from a social point of view, then
Small Batch Training can be compared to lessons in schools where students are
given information in portions, which allows them to change their unconscious ideas
about objects in accordance with new information. But first, the schools teach the
most basic, and only then they give more and more complex information, meanwhile
the AI should be able to learn on its own—that is, independently go from simple to
complex.
In addition, new information or a tactical move made by a person is not always
immediately clear, therefore information is transferred from short-term to long-term
memory at the end of the day—during sleep (because of which the most ordinary
part of life can be well remembered if after it a bright event happened on that day).
These two problems are partially resolved, which we will discuss in Sect. 5.3.
It is worth noting that convolutional neural networks are also quite close to real
analysis in the human brain, although the biological implementation of filters that
emit abstract objects is rather complicated. It is carried out not only through general
training of the entire network (based on internal rewards for good work), but so that
similar filters are applied at different places in the layer, self-organization within
each layer is applied. If you create an AI model, then you need to develop a similar
process of self-organization, so as not to reduce everything to a fixed set of filters.
Therefore, as an approximation to real AI, we will use a conventional convolutional
neural network, which will immediately train a finite number of filters over the entire
area.
Such a process of self-organization is not limited to two-dimensional data. A
person can analyze and identify objects of increasing complexity, for example, in a
one-dimensional sequence of audio signals. A person also has “layers” responsible
for specific images of visual and auditory information. They are contained in the
temporal lobe. But for a person’s hearing and vision, different architectures are
provided. On one hand, hearing is a sequential model of neural networks that leads
from sensors to the analysis of complex sound signals and words, on other hand
vision is divided into two directions: one leads to an analysis of the image itself, and
the other leads to analysis of location where this image is [2].
This is fully consistent with one of the main neural network architectures for image
detection [8]. A similar process always occurs in a person, when he, analyzing the
image, can focus attention on only one thing. This process is closer to another of the
main neural network architectures for pattern detection [19].
In addition, a person has his own maxpool layer, but it works harder than choosing
a maximum from several neighbors. There is a choice from an object-oriented struc-
ture, due to which an unconscious understanding arises where to choose an object.
This corresponds to the self-organization described above, only for significant corre-
lations with more distant neurons. However, if this choice is correctly implemented
540 D. O. Chergykalo and D. A. Klyushin
When the task is to create an AI similar to human one, there is always a simple desire
to simply simulate all the processes of the brain in a computer. However, different
people have different views on what the brain consists of—for some it is a set of
neurons that interact with each other, for others it is different parts of the brain that
act like black boxes and perform some kind of function.
Each of these people, creating their own philosophy, their models—including
computer ones—bring to them their understanding of how this happens—electro-
chemical dynamics on neurons, modeling brain regions and their interactions, expert
systems, chat bots, etc.
When comparing the options, it may naturally seem that a model based on elec-
trochemical dynamics should be the most accurate—after all, everything else looks
like a description of the higher-level consequences of this dynamics. However, not
everything is so simple. There are a huge number of factors that affect this dynamics,
and counting all these factors for each neuron and synapse is technically impossible
now. It seems natural to simplify the task and reduce the real brain to some model,
but at the same time maintaining the basic properties of neural networks. In this
subsection, we briefly discuss the case when the brain model is simplified due to the
simplifying its general organization, but maintaining intercellular structure.
The Human Brain Project and Blue Brain Project distinguish the neuron column
as the basic component of the brain [5]. They analyzed the structural unit of neuron
organization in the neocortex and simulated brain electrochemistry, but general archi-
tecture of brain [14] was not taken into account in their model. They transferred the
connection graph in the part of the brain to a digital model and compared the electrical
Biomorphic Artificial Intelligence: Achievements and Challenges 541
signals received at the output from the living part of the brain and its model. The
columns were fixated, neurons were also fixated. They saved only electrochemical
dynamics, from which we can conclude that at least for a short time this model can
behave a real fragment of the rat brain.
But during training through the natural selection of neurons and their connections,
this model will increasingly diverge from reality. Blue Brain researchers themselves
admitted that trying to account for and understand all types of neural cells is a
rather difficult task [12]. The process modeling performed by Blue Brain and similar
projects very well describes one side—electrochemical dynamics and behavior in a
given time frame, but very poorly describes the other—the internal organization of
learning processes and neural networks.
Human cells are constantly dying and the human body is constantly being updated
at the cellular level. It is appropriate to recall Theseus’ paradox—“If all the compo-
nents of the original object have been replaced, does the object remain the same
object?” A human still remains himself, despite the fact that the connections in the
human brain are strongly rearranged, and neurons die and new ones appear.
We believe that modeling should preserve qualitatively intelligent processes in the
brain, direct transfer at the cell level is incredibly difficult and pointless due to the
strong variability at this level, Blue Brain used 8192 processors to calculate a single
neural column consisted from a few hundreds of neurons. They still work slower
than a real neural column.
If the task is to model a person’s thinking at the level of preservation of his
intellectual processes, then, in our opinion, this does not have to be done at the
cellular level. It is enough to create a system equivalent in processes (in the images
that it classifies, memory that it can activate, as well as in the processes that occur
during brain function, including emotional).
In the future, we will not talk about the digitization of the mind, since we do not
consider such a problem. However, the architecture of AI, which we propose, can
inspire someone to create such an AI and, based on analogy, make the transfer of
biological processes occurring in the brain to artificial neural networks and various
forms of memory, and then perform a comparison with a natural brain (on the basis
of which a virtual copy was made, possibly even a human one).
One could set the task of simulating AI so that it develops in the same way as
other people, but people have their weaknesses, we will talk about some of them in
Sect. 5.2. Modeling AI as a person is also not reasonable, since we want not just the
appearance of other essences, but want AI to serve us and solve human problems.
Therefore, the intellectual processes themselves in AI should be equivalent to
human ones—while maintaining the general principles of learning and thinking in
humans, but in order to ultimately get an assistant, but not a potential enemy or
system with unpredictable behavior.
It is worth noting that the achievements of the Blue Brain project are not limited
to just a simulation—the transfer of neural connections. In addition, Blue Brain
researchers use their models to gain a deeper understanding of the underlying
processes in the brain. For example, comparing their model and processes in the
real brain, researchers found that artificial neurons appear to have common behavior
542 D. O. Chergykalo and D. A. Klyushin
similar to brain rhythms [18]. Having analyzed the reasons for his appearance in
their model, they improved their understanding of real processes.
Those who are interested in the idea of modeling precisely neural networks with
similar properties to real ones, as the closest of the classical general models, we
suggest exploring the LSTM model and the rhythms in it, especially with regard to
long-term information. But if we talk about the closest of the simply classical models,
then we propose to consider the rhythms and similar biological characteristics for
the BELBIC model. This model we will discuss in Sect. 4.
Attempts to create AI from scratch with a high probability lead to the creation of
what is already there, so it is always useful to examine the idea in parts and see what
others have already implemented.
You can break down a task in different ways. You can try to find the programmed
analogues of the brain regions and try to connect them, you can try to modify the
existing analogues of human thinking that are closest to the model, or you can try to
do both.
AlphaZero is a program that, learning from scratch, has won all the people and
the rest of the algorithms in chess, segos and Go. Go’s game runs on board 19 × 19
and generates a huge number of options. Conventional algorithms were powerless
and reached the maximum level of average amateur in Go. AlphaGo was the first
program to defeat the professional, and even the best player in Go. But it was unusual
only because of the convolutional neural network as a function of evaluating and
determining profitable moves. Its algorithm could be compared with an intuitive
assessment of the situation on the board by a person, i.e. with his trained unconscious
appreciation. In combination with the Monte Carlo method, this allows successfully
simulating the game thinking of a person (we’ll talk about this in more detail in
Sect. 5.3).
The next version of the AlphaGo program, AlphaGo Zero, learned to train its
intuition not on examples of other people’s games, but on examples of games with
itself. This not only saved it from dependence on external data, but also significantly
improved the program. AlphaZero program can in a matter of hours learn to play
any game and become the best in it. It itself adapt to the game, which makes it quite
flexible and easily customizable to different types of games.
Another example of the Recursive Cortical Network, mentioned earlier, made it
possible to correctly guess Captha in 66.6% of cases, and after determining the style
and additional training—90%. Moreover, this network uses only 5000 examples of
resolved examples and a small number of layers. It simulates the work of the primary
visual cortex. A description of the work and experiments with this neural network is
Biomorphic Artificial Intelligence: Achievements and Challenges 543
given in [7]. What we called the connection with more distant neurons in this article
is called the lateral connection.
The emotional component of learning is important for proposed model as it helps
to filter and remember important information. Models of emotional brain training
are called BEL models (Brain Emotional Learning). Unfortunately, at the moment
we do not know any complex models for emotional learning that would adapt to
complex mechanisms of thinking.
The most advanced of these models is the BELBIC model [3]. It has practical
applications for real-time control systems, since it is computationally efficient and
at the same time gives quite acceptable results [15]. It is similar to an RNN network,
which predicts the necessary action, which should be chosen, but has one big differ-
ence from it—the network learns directly during operation, thus constantly adapting
to new conditions. Models of emotional thinking model the functioning of the limbic
system and their relationship with other areas during emotional perception and
memorization.
Naturally, the readers may have their own views on how to break down the task
of constructing AI. In this case we just want to clarify and improve understanding of
proposed approach.
Let us consider the proposed model of artificial Intelligence and its general princi-
ples and assumptions. This model can make the complexity of thinking similar in
complexity to human and lead to the strong artificial intelligence.
As we said in Sect. 3, when trying to create AI, a person follows his own ideas. Part
of his ideas is how a person imagines the development of AI and what he wants from
it. To successfully characterize the development and the model itself, the researcher
must distinguish a set of characteristics. A rather convenient characteristic is the
fitness function. Thanks to it, development can be defined as an increase in this
function on training data, and the goal can be defined as an increase on its values on
testing data.
But the definition of such a function is far from an easy task for a person, because
he has a great many goals and desires. The theory of decision-making sees its way
out of the situation, considering a whole multitude of factors and introducing a partial
order on it. This may help some people make better choices, but does not describe
the actual decision making process by a person.
But abstracting from goals and desires, we can still try to define this function.
Psychologists, trying to find it, built several theories: the theory of personality of Z.
544 D. O. Chergykalo and D. A. Klyushin
We will associate the unconscious with various self-organizing neural networks that
have an architecture that is adaptable to solving a variety of problems, in particular,
image analysis, choice of options in life situations or in various abstract games
(intuition).
Consciousness is a mechanism that can direct other parts of the brain to the desired
activity. Thinking deeply, a person can model interaction with reality. This process
is called “weighing all the pros and cons”, “pondering actions”, “using common
sense”, “reasoning” or “planning” (in game with large time scale).
When remembering something, the conscious can reactivate the same neurons as
directly observing an object, thereby including unconscious algorithms for analysis.
When reproducing in memory of subsequent situations, other neurons are reactivated,
due to which a person subjectively senses that he sees other objects or situations, i.e.
uses imagination.
Biomorphic Artificial Intelligence: Achievements and Challenges 545
Trying to survive, a person must take into account the reaction of the environment,
especially that part of it that limits its actions, which he unconsciously defined as
optimal.
In humans, consciously accepted behavior forms in the orbitofrontal cortex [22].
She also organizes modeling of scenarios of the future [4]. When we have a specific
goal and an important choice, we imagine various options for the development of
events. This changes human unconscious assessment of options in the prefrontal
cortex. After these mental simulations, the unconscious assessment becomes more
thoughtful.
In [4], the choice of a path to achieve a goal is described. This can be perceived
as a “game with reality”, in which a person is constantly offered to make a difficult
and important choice that needs to be considered.
Interaction with society can also be considered as “choosing the right path” of
interaction with it. With the help of attention mechanisms, a person can switch to
“different games”, for example, thinking about choosing the right move in a board
game. But it is the hippocampus is responsible for distinguishing possible positions
from reality, as well as the rules of the “game” itself [4]. Also, it plays another role.
It uses episodic memory, recalling similar situations from the past and thanks to this
he immediately gives an idea of which course options will be better. In the future,
we will talk about the game with own rules and episodic memory in Sect. 5.3.
To summarize, we note that with conscious behavior, a person using modeling
interaction with reality (based on his unconscious assessment and unconscious model
of the world), which we will continue to call the “game with reality”, can improve
his unconscious assessment of choice with maximum dopamine and make more
informed choices about the best move. But often a person does not have the time,
energy or desire to perform a deep analysis of the situation, therefore he often
uses directly unconscious analysis—associations, analogies, patterns, stereotypes,
automatic actions developed in society, etc. [6].
AI can think over options instantly, so is it worth giving it the opportunity to choose
a less effective—unconscious version of thinking, is a separate issue. Our opinion on
this matter: regardless of the task, it is more simple and reliable for people not to let
him do this. For this reason, in the future we will consider only a conscious choice
of behavior.
In Sect. 2, many methods were described that allow making AI closer to natural
and corresponding to reality. In particular, mini-batch and dropout algorithms were
mentioned there. These algorithms have been successfully implemented in the
AlphaZero. In addition, AlphaZero, playing with itself, can learn the game and for
this it needs enough basic rules. Having gained experience, the program understands
how to improve its strategy. “In the evening,” it plays with its “morning” version
and, if it realizes that after “improvements” it does not play “much better”, then
546 D. O. Chergykalo and D. A. Klyushin
these changes are not delayed “during sleep” in long-term memory (although they
are all deposited in conscious memory, next “morning” the unconscious memory of
AlphaZero will be the same as the previous one).
This can also be compared with how scientists, using new data, try to put
forward new hypotheses, and then test them in practice to check whether the theory
has become closer to reality, and if not, then completely reject the hypothesis as
unnecessary (not using it in the future even to derive other hypotheses).
In the previous subsection, we noted that a person’s consciousness works as a simu-
lation of games with reality to better achieve the goal. In Sect. 4 we have already said
that the Monte Carlo method with an unconscious assessment resembles a thinking
process. In this light, this process is interpreted as consciousness (including conscious
thinking) and, in this understanding, AlphaZero has consciousness.
Developing in the game, AlphaZero has its own analogue of game episodic
memory—it remembers for each position at which moves it won and lost, so that
it immediately has an idea about it without “pondering”. As mentioned earlier,
AlphaZero defeated all the best programs in the board games of his choice, showing
his excellent ability to completely change the type of game he is playing with
small adjustments, and to use his own training to achieve better results than the
best programs.
But as AlphaZero AI is far from ideal—all reconfigurations to new games are
performed by people, not AlphaZero itself, and the most important component of AI
for a person is missing—its flexible ability to apply its old experience to new tasks.
The changes we propose will be outlined in the next subsection.
As noted in Sect. 5.1, the human brain develops ideas per blocks, leaving the rest
fixed. This immediately solves the problem of overfitting, since only a certain unit
is trained, and not the entire system.
AlphaZero, performing a simple unconscious assessment, is a common convo-
lutional classification neural network that has two outputs one for assessing the
general situation, which for a person can mean a general feeling of the situation on
an emotional level (conditionally from very poor to very good), and the second for
evaluating options, for which the person is responsible for the prefrontal cortex (see
Sect. 5.2).
In addition, the original AlphaZero program analyzes seven previous boards. This
can have an optimization effect, thanks to which the network better remembers the
tactics that have developed for this particular version of the game.
But, in general, since the games he played were games with full information, it
was enough to analyze one board for analysis, although for games with incomplete
information, the win in the game may completely depend on important information
about past events. As an improvement, we propose the AlphaZero algorithm with
a context evaluation over a fairly large time period. In this case, you can break the
Biomorphic Artificial Intelligence: Achievements and Challenges 547
information into the current situation (board) and the context obtained using the
LSTM (long short-term memory) neural network.
As indicated in Sect. 5.2, the hippocampus is responsible for isolating the game
from reality and it is he who is responsible for isolating the features of this position,
for which he naturally also has to collect information from the context, thereby
organizing what is called working memory. As you can see, AlphaZero already
analyzes the previous positions, but does it in the form of an analysis of ordered 8
boards, and not as a person who can work with a wider context, and better handle
the sequence of position-situations, so we consider this the next step to improve this
algorithm and increasing its biomorphism.
With further improvement, we offer the possibility of its “growing up”, which is
manifested in the gradual training of skills and gaining experience. In the theory of
Jean Piaget [16], a person develops in stages and each stage can be considered as a
sequence of stages—building up of some skills over others.
For example, a person initially has instincts. A child who controls instincts can use
one instinct to use another and thereby get the simplest skills—relatively speaking, “1
level”. Having learned to control his body, the child is already learning to manipulate
the surrounding reality. To do this, he already needs to manage primitive skills,
thereby developing higher-level skills. Subsequently, he already learns to manipulate
objects, manipulate some objects with the help of others, and ultimately manipulate
representations of objects to successfully manipulate these objects. The next stages
and stages become already going on as add-ons over these views.
As an implementation of this principle, the model proposes to conditionally divide
neural networks into networks to identify features and networks for the analysis
of these features. In the case of convolutional neural networks, we assume that
convolutional layers reveal features, and fully connected ones analyze them. Each
time, increasing the complexity of identifying features, the neural network increases
the level of analysis and actions, and its own fully connected neural network of
analysis adapts to each new level, which implements a skill of a certain level. A
similar structure can develop hierarchically, and for each specific task there will be
its own branch in this hierarchy.
Naturally, there are many cases where the data is not mentally perceived. A person
leaves such situations, reducing everything to some kind of images, as well as to
formulations that are understandable for his analysis. For example, if AI analysis
will be more adapted to mathematics and other exact sciences, then he will need to
itself reformulate that inaccurate information that he will receive during development
in more convenient terms and images.
For the model, such a learning option is optimal, since not the entire neural
network is constantly learning, but only its layer that uses the internal features of
already trained neural networks. This not only speeds up learning, but also allows
learning from less data and also prevents the neural network from overfitting. In
addition, a similar model with small modifications to the network for identifying
traits from each neural network makes it possible to better generalize them.
How we can consider the properties of AI from a psychological point of view, we
will show in Sect. 5.5.
548 D. O. Chergykalo and D. A. Klyushin
implement with the dropout algorithm), due to which, for example, people who have
lost part of the brain can restore their functions and memory.
Perhaps the proposed point of view has become more clear after these explana-
tions. But, if readers do not agree anyway, we hope that this work will inspire and
help them to create more advanced and versatile AI models.
To understand the future behavior of AI and how to make it suitable for us, it is
advisable to study its behavior with the help of psychology as the behavior of a
person, considering the personality not only from the point of view of biological and
mental processes, but also from the point of view of its experience and government
activities. For this, we propose using Platonov model [17]. In his theory a personality
is considered as a dynamic system of interacting structures. The personality in this
model is divided into 4 structures: orientation, experience, mental processes and
biopsychosocial properties.
The AI model that we described in the previous sections has many uncer-
tain parameters, and as important hyperparameters of proposed model, we suggest
initially (before the AI launch) to set the Biopsychic properties, while the remaining
properties need to be obtained or supplemented with 4 types of formation associated
with the structures: Training, Exercise, Learning, Parenting.
We will not focus on temperament, sexual characteristics and the like, but age-
related properties are very important for proposed model. As we wrote in Sect. 5.4,
with growing up, more and more highly organized skills appear in order to ensure
the active construction of this structure of skills in the process of growing up, so we
need a substructure of age-related properties.
We turn to the structure of the features of psychological processes. Attention in a
person is provided by the hippocampus, thinking is provided by the whole brain as
a system, but if we talk about calculating options and planning, then these processes
are organized by the orbitofrontal cortex. Memory, its various types and forms, are
provided by the whole brain as a whole. Semantic memory is largely provided by
the temporal lobes of the brain, for episodic, in addition to the temporal lobes, the
hippocampus is also necessary, etc.
A very important structure is Experience. It is precisely the improvement of the
properties of this structure that AI programmers are usually engaged in: skills during
human development are built into a hierarchy, skills, i.e. fixed skill management for
the optimal solution of a task of a certain type is determined by the neural network
responsible for this skill, knowledge, i.e. organization of information in memory
develops inextricably with skills, so that they can be well adapted to them. Although
AI may not gain its experience with unsuccessful training, but it can gain knowledge
that can help it in the future.
The orientation of the personality is determined by its desires, aspirations and
midnset. Desires, i.e. specifically formulated needs, when creating an AI, we can
550 D. O. Chergykalo and D. A. Klyushin
direct it in the right direction, and then give it knowledge for the correct (acceptable
for us) formulation of these needs. Aspiration, i.e. a persistent desire to do something
or to achieve something is an inherently necessary necessity. We think that by making
the need “to serve as best as possible” and giving the AI an idea of this, we will open
to him levels of service that he has not yet reached, and he will strive for them. A
worldview as a system of views and assessments about the world and its role in it
can be initially set at the level of knowledge.
Platonov’s theory well describes the characteristics of AI, and also gives a clear
separation and understanding of how to change them. But for a better understanding
of the necessary development of the tree of experience, we rewrite it in the terms of
the theory of cognitive development of Piaget [16].
We will divide the development into the following periods and stages:
1. The period of sensory-motor intelligence.
1.1. The first stage is creating essential reflexes,
AI needs instructions for minimal work in a field in which it does not
understand and is far from it existing skills.
1.2. The second stage is analysis of its actions and coordination of reflexes,
For AI, this is the definition of potential action in a new area.
1.3. The third stage is the analysis of the results and the direction of activity on
the target,
For AI, it’s training simple essential skills.
1.4. The fourth stage is the analysis of the results of the preparation of plans
and their implementation,
For AI these are actions based on a miscalculation of reality one step ahead.
1.5. The fifth stage is the analysis of consequences and experimentation to
obtain more information,
For AI, this is composing a new game in reality.
1.6. The sixth stage is internal experimentation, i.e. simulation of interaction
with reality,
For AI, these are internal simulations based on this game, when in the case
of a lack of complete information in the game, contextual information is
used.
2. The period of preparation and organization of specific operations.
It is a period of connecting to internal modeling the collection of signs of objects,
the possibility of reactivation in the memory of an object according to signs and
operating with objects according to their properties, linking to the internal model
of reality the properties of various objects.
3. Formal operation period.
This is the period when it becomes possible to distinguish abstract games in
reality using previously selected semantic information.
Biomorphic Artificial Intelligence: Achievements and Challenges 551
6 Proposed Architecture
According to the Platonov model, the behavior of the proposed system can be
described as follows.
1. Define the biological maturation of AI with its characteristics to improve the
construction of the tree of experience.
2. Train AI and select information for it with such properties so that it does not have
retraining and there is an effective storage of knowledge—memory, as well as
other psychological functions.
3. To select AI information on topics that, as in the educational paradigm, will set
up skills and knowledge for it—from basic to more complex.
4. To consolidate the internal need for service and to enable it to be properly realized,
giving knowledge about how to do it and directing it as a person to service.
In connection with how we characterize the interaction of parts of the brain, we
can immediately understand that the first skills that will be trained will immediately
translate into attempts to act (Out) on the available input data (Input). In humans, the
simplest type of action is reflexes. They are responsible for the survival of newborns
and represent their most basic skills. In Sect. 5.5 we also indicated the general patterns
of human development of skills, and show their meaning for AI.
The possible tree of experience is presented at Fig. 1, and the proposed architecture
of a biomorphic AI is illustrated at Fig. 2.
Fig. 1 The architecture of the biomorphic AI (here the diamonds denote experience and the rounded
rectangles denote skills)
552 D. O. Chergykalo and D. A. Klyushin
The main assumption is that the functioning of AI is a “game with reality.” He learns
the rules of this game and adapts to it in an optimal way, striving for victory. At the
same time, it is not limited to one type of game, but transfers the gained experience to
the types of games, thereby becoming a universal player. In architecture, the digital
hippocampus is responsible for highlighting the game. Already inside the selected
game with its rules and general presentation, the training of skills takes place. We’ll
clarify that, by default, human emotions are responsible for choosing a game, or
rather the limbic system, so when a person does not focus on anything, he usually
relaxes or does all kinds of entertainment, seeking pleasure, and AI, choosing a game,
seeks to maximize cost function.
Biomorphic Artificial Intelligence: Achievements and Challenges 553
We emphasize that the choice of a single game at any given time is extremely
important, since the choice of several games at once does not allow to achieve
perfection and hinders effective learning. In humans, this is manifested in the form
of cognitive dissonance.
To illustrate the proposed AI scheme, let us restrict by the visual system, since, as
is known, there is the possibility of rearrangement of the auditory cortex to perform
visual functions, vice versa [23]. This means that the architecture of the neural
networks of the primary analysis, analogues of the auditory and visual cortex, are
not configured specifically for their data. They both single out objects of increasing
complexity and both have the last “layer” responsible for the “image analysis”. In
humans, this function is performed by the temporal lobe of the brain responsible for
semantics [10].
The approach described above can find useful applications in multitasking. Tasks
are usually related to each other in a hierarchy, so a similar approach can provide
significant improvement [25].
Even if a broad topic cannot be divided into subsections, because the tasks within
this topic are not related to each other, then even in this case, we can still get a
significant improvement in general learning compared to a separate training for each
task [20]. The reason for this can be called a more effective implementation of
abstract experience to analyze, which is updated differently with different skills. The
determination of the structure of subtasks in the case of related tasks is studied in
[9].
When a need arises for a new application of a skill (when old applications for
the desired activity no longer work effectively), a new skill begins to be determined
based on this skill. This can be interpreted as Transfer of Knowledge [1]. If the scope
of the new skill is similar to the previous one, then a similar skill is used, but for which
an interpretation layer is created, which can be considered as a variant of Transfer
of knowledge. And as already mentioned, both of these options complement each
other.
There are different definitions of “strong AI,” but most researchers agree that AI must
have the same general human skills and use them at a certain level. In particular, it
is believed that achieving the ideal fulfillment of certain tasks leads to a strong AI.
Such tasks are called AI-complete. In this section, we describe how to configure AI
to solve these problems.
554 D. O. Chergykalo and D. A. Klyushin
7 Conclusions
In this paper we examined various technologies useful for creating biomorphic AI.
We did not want to focus on perfect correspondence with biology, but using the
principles of the organization of the human brain and thought processes, we tried
to describe the direction for achieving the human level of intelligence. Humanity,
striving to create a strong AI, follows a rather complicated and unpredictable path,
so it is difficult to say what to expect in the near future from AI.
It is not known to what level AI development will reach, but we believe that all the
basic technologies for creating AI, comparable in level of intelligence with humans,
already exist. Whether research at this level is achieved or whether a new “Winter of
Artificial Intelligence” is coming depends on many factors that go beyond the scope
of our discussion.
We do not claim that after this a technological singularity will occur, the system
will not be rebuilt immediately, but global processes and patterns will immediately
change from that moment, and in the case of a controlled scenario, progress will
become even better.
References
1. Argote, L., Ingram, P.: Knowledge transfer: A Basis for Competitive Advantage in Firms.
Organ. Behav. Hum. Decis. Process. 82(1), 150–169 (2000)
2. Baars, B., Gage, N.: Cognition, Brain and Consciousness: An Introduction to Cognitive
Neuroscience, 2nd edn. Elsevier/Academic Press, London (2010)
3. Beheshti, Z., et al.: A review of emotional learning and it’s utilization in control engineering.
Int. J. Advance. Soft. Comput. Appl. 2(2), 191–208 (2010)
4. Brown, T.I., et al.: Prospective representation of navigational goals in the human hippocampus.
Science 352(6291), 1323–1326 (2016)
5. Buxhoeveden, D., Casanova, M.: The minicolumn hypothesis in neuroscience. Brain 125(5),
935–951 (2002)
6. Cialdini, R.B.: Influence: Science and Practice, 5th edn. Allyn & Bacon, Boston (2009)
7. George, D., et al.: A generative vision model that trains with high data efficiency and breaks
text-based CAPTCHAs. Science 358(6368), art. no. eaag2612 (2017)
8. Girshick, R.: Fast R-CNN. In: The IEEE International Conference on Computer Vision (ICCV),
pp. 1440–1448 (2015)
9. Jawanpuri, P., Saketha, N.: A convex feature learning formulation for latent task struc-
ture discovery. In: Proceedings of the 29th International Conference on Machine Learning,
Edinburgh, Scotland, UK. http://icml.cc/2012/papers/90.pdf (2012). Accessed 22 February
2020
10. Jung, J., et al.: GABA concentrations in the anterior temporal lobe predict human semantic
processing. Scientific Reports, 7, Article number: 15748
11. Kiyohito I et al. (2018) An effect of serotonergic stimulation on learning rates for rewards
apparent after long intertrial intervals. Nat. Commun. 9(2477) (2017)
12. Markram, H., et al.: Reconstruction and simulation of neocortical microcircuitry. Cell 163(2),
456–492 (2015)
13. Masters, D., Luschi, C.: Revisiting small batch training for deep neural networks. https://arxiv.
org/pdf/1804.07612.pdf (2018). Accessed 22 February 2020
556 D. O. Chergykalo and D. A. Klyushin
14. Mountcastle, V.: The columnar organization of the neocortex. Brain 120(4), 701–722 (1997)
15. Package with BELBIC controller for Autonomous Navigation of AR. Drone https://github.
com/dvalenciar/BELBIC_Controller_ROS. Accessed 22 February 2020
16. Piaget, J.: The Psychology of Intelligence. Routledge, New York (2001)
17. Platonov, K.: A Concise Dictionary of the System of Psychological Concepts. High School,
Moscow (2008)
18. Reimann, M., et al.: A biophysically detailed model of neocortical local field potentials predicts
the critical role of active membrane currents. Cell 79(2), 375–390 (2015)
19. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with
region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99
(2015)
20. Romera-Paredes, et al.: Exploiting unrelated tasks in multi-task learning. In: Proceedings of the
Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR, pp. 22:951–
959 (2012)
21. Schrittwieser, J., et al.: Mastering Atari, Go, Chess and Shogi by Planning with a Learned
Model https://arxiv.org/pdf/1911.08265.pdf (2020). Accessed 22 February 2020
22. Setogawa, T., et al.: Neurons in the monkey orbitofrontal cortex mediate reward value
computation and decision-making. Commun. Biol. 2(126) (2019)
23. Sharma, J., et al.: Induction of visual orientation modules in auditory cortex. Nature 404,
841–847 (2000)
24. Srivastava, N., Hinton, G., Krizhevsky, A., et al.: Dropout: a simple way to prevent neural
networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
25. Zweig, A., Weinshall, D.: Hierarchical regularization cascade for joint learning. In: Proceedings
of the 30th International Conference on Machine Learning, Part 2, pp. 1074–1082. Atlanta,
Georgia, USA (2013)
Medical Data Protection Using Blind
Watermarking Technique
1 Introduction
© The Editor(s) (if applicable) and The Author(s), under exclusive license 557
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_25
558 A. Soualmi et al.
In the sequel, most of the telemedicine application use only the encryption tech-
niques to protect data, and even though the encrypted information’s are protected
during transferring, unfortunately, once it’s decrypted the illegal reproduction or the
authorship proofing couldn’t be ensured. To this end, some application combines
cryptography and watermarking techniques for more security and protection.
Digital watermarking consists of encrusting information into cover data (image,
video, text file…etc.) [3]. In the last years, lot of image-watermarking methods was
presented [4–27]. Where these approaches could be classified as spatial or frequency.
In the first-class methods, data are encrusted directly on pixels which make it requires
less processing time. However, most of the spatial methods are fragile. In the second
method, data are encrusted into the transform coefficients such as DCT (Discrete
Cosine Transform) [28] or DWT (Discrete Wavelet Transform) [29], which make
this method more robust but require more processing time [30].
Moreover, the watermarking technique could be robust, fragile, or semi-fragile
[31, 32]. For the first class, the encrusted data shouldn’t collapse under any attack
performed on image [8, 33], for the second class, the encrusted data needs to keep
minor manipulation [34, 35]. While, for last, the encrusted data deleted or collapsed
after facing any attack [36]. More, The watermarking approach could be grouped
as blind, semi-blind, or non-blind techniques [29]. The first type, de-watermarking
phase doesn’t need the presence of the original image or watermark, so, it will be
enough to possess the watermarking key. The second type, the watermark is required
and for the last one, it needs the presence of the original image.
Currently, A brief talking about the medical images, and more precisely the
DICOM (Digital Imaging and Communicating in Medicine), which is a standard
developed for data transfer, storage, and designed to cover all aspects of digital
medical imaging [37]. The DICOM file is constituted of two principal components
(Fig. 1), namely: Header and Body. The header contains the File Meta information
[24]. While the body contains the graphical medical data’s set (a set of data elements
which constitute the medical image), the image pixel data’s (Body) are split to regions
Body
of non-interest (RONI) and region of interest (ROI) [20, 36], The ROI area is consti-
tuted from the significates pixel (depend on the clinical finding). While the RONI is
the insignificant pixel.
This chapter presented a blind watermarking scheme for medical data protection
using Diffie-Hellman [38] and Number Theoretic Transform (NTT) [39].
The rest of the chapter is ordered in the following way. Some relevant works
are described in Sect. 2. In Sect. 3 presents the basic requirements of the proposed
scheme. In Sect. 4, presented the proposed scheme in detail. In Sect. 5, show the
experimentation results. Section 6 gives a discussion of the presented method in terms
of advantages and disadvantages. And finally, in Sect. 7, the presented conclusion
and future.
2 Relevant Schemes
The fragile watermarking techniques are used to authenticate the images or for data
integrity [34, 35, 40].
Kannammal et al. [19] presented a blind watermarking scheme for medical images
using DWT and QIM (Quantization Index Modulation), the watermark encrusted in
DWT-coefficients using QIM technique. The proposed method denotes the effective-
ness of this scheme in image fidelity and attacks resistance. However, the security and
computational complexity are mediocre, also, the authors mention that the technique
is used for medical image authentication, but they didn’t present how the technique
ensures authenticity.
The authors in [20] propose a hybrid method for medical data protection based
on watermarking and cryptography techniques, the main idea is to employ Message
Digits 5 (MD5) to compute the image hash value. After that, ideal groups are obtained
from the medical image and then compressed using Huffman technique. Finally, the
hash value, the compressed groups, and the patient information are combined and
cyphered by Advanced Encryption Standard (AES) algorithm and encrusted in the
host image. This technique provides image quality, data protection, and authentica-
tion. However, it requires high processing time, and the resistance degree with the
attacks was not mentioned (the method was not tested with attacks).
560 A. Soualmi et al.
In [9] presented a non-blind fragile method, where the idea is to encrust the
watermark pixels into a transform based on Compressive Sensing (CS), NSCT (Non-
Subsampled Contourlet Transform), and DWT. This method offers good advantages
when it comes to tamper detection, data payload, and authenticity. Except that it
requires high execution time.
In [26] presented a blind reversible medical image watermarking method. The
method focuses on a set of pixels from the cover image as an emplacement for
embedding the watermark using a chaotic key, this method offers a good trade-off
between tamper detection and fidelity, but it gives mediocre data payload.
In Soualmi et al. [27] presented a blind fragile method using Schur decomposition,
where the main idea consists of encrusting data into the host image after carrying
Schur on it, using a new embedding technique which works on embedding the water-
mark bits on a specific emplacement. Experimental results show their effectiveness
when it comes to fidelity and processing time, but, it offers low embedding capacity.
into the DWT-HD coefficients singular values, and then the FOA is applied to get
a trade-off between attack resistance and fidelity. This method offers better image
quality and attacks resistance, except that, it needs high processing time and gives
low security to the watermark embedded.
In [18] proposed a non-blind technique using differential evolution (DE) and
DWT, the cover image is decomposed up into blocks using DE, then the watermark
bits and then embed DE results into the DWT coefficients of training blocks. This
approach gives good robustness and fidelity. However, the scheme wasn’t employed
any security method to secure the embedded watermark, and consequently, any user
could extract the watermark, also, the data payload is depending on the blocks selected
by DE.
Ariatmanto and Ernawan [13] proposed a semi-blind transform scheme. First, the
cover image is decomposed up into non-overlapping blocks, and the variance pixel
value of each pixel is computed to select blocks that will be used for watermark
embedding. Finally, the scrambled watermark is encrusted into DCT middle band
coefficient using some rules. This method offers high imperceptibility and robustness.
However, a bit need 8 × 8 block to be encrusted, this makes this method gives less
embedding capacity.
As mentioned bellows, most of the existing watermarking techniques suffer from
the important computational complexity and low imperceptibility, especially when
it comes to the medical images. Although one of the basic characteristics of e-health
application is the need for low processing time; while respecting the medical image
quality.
Through this chapter, our main objective is to develop a secure and impercep-
tible watermarking technique that requires less processing time. The cause is that
the watermarking for medical images have some specific issues due to their high
sensitivity caused by the restricted texture number.
In the next part, the basic requirements are explained in detail.
The NTT [39] is a popular transform decomposition based on DFT. The NTT of a
sequence x constituted of S elements presented in the Galois field GF(v) of order v:
562 A. Soualmi et al.
S
Xm = xs · δ sm
(1)
s=0 v
Diffie-Hellman (D-H) [38] is a protocol used to share secrets keys, even though
two sides (emitter, receiver) might never have communicated with each other (see
Table 1). Diffie-Hellman used basically between two sides to share the symmetric
key. When the two sides have the key, they encrypt their shared information between
them.
In the proposed technique NTT coefficients are used to embedding area.
While Diffie-Hellman algorithm is used to share the key which will be used for
encryption/decryption of watermark.
The next section explains the proposed method processes with deep details.
The combination of NTT and Diffie-Hellman leads to being highly secure in terms
of medical data preservation and privacy. In this section, explained in details the
Patient Name
Receiving 7
Transmission
Patient BirthDate
Encryption
Patient Sex
…
Watermarked Image
Patient Information’s
Watermark
Embedding Watermark
Extracting
Decryption
proposed technique. In the following the two main processes (Fig. 2) are described:
the watermark embedding, and extraction phases. Where the main purpose is to
increase data security while keeping image fidelity.
where
i: the cover image.
w: the used watermark.
δ: the NTT generator term.
iwdiff NTT: the NTT watermarked image.
iNNT: the NTT image.
wdiff : the encrypted watermark.
q:parameter of NTT.
iwdiff NTT :the encrypted watermarked image.
4
WE G j = Ck j × K (3)
k=1
where C 1j and C 3j are the coefficients of the group Gj in indices 1 and 3 respectively
(Fig. 4).
10. Repeat with r + 1, j + 1, and i + 1 until encrusting all watermark bits.
Apply NTT−1 to get the watermarked image (Fig. 5).
Figure 6 illustrates the watermark embedding phase.
Medical Data Protection Using Blind Watermarking Technique 565
16
Group 1 C1,1 C2,1 C3,1 C4,1C1,2C2,2C3,2C4,2……………………. ..
Group 2 ……………………
……………………
……………………
16
……………………
……………………
……………………
……………………
i+1
Select a Block Bi
j+1; r+1
Select a Group Gj (C1j, C2j, C3j, C4j)
WE(Gj)
No Mod(WE(Gj),2)~=Watermark(Br) Yes
MAX(C1j,C3j)=
MAX(C1j,C3j)-1
Yes
END
A shared image could be altered during its transmission, the receiver of the transferred
image must be identified. The extraction phase (Fig. 5) is described as follow:
566 A. Soualmi et al.
Br = mod W E G j , 2 (5)
5 Performances Measurement
The performances and strong points of the proposed scheme are well clarified in this
section.
i+1
Select a Block Bi
WE(Gj)
Yes
End
256 × 256 downloaded from [41, 42, 43], while the watermark data are string type of
size 32 bytes, cover images sample, and watermark are shown in Fig. 8, Table 2 illus-
trates the execution steps. Figure 9 displays the patient’s information, the cyphered
watermark, and the amount of data needed for embedding. Figure 10 displays
the received patient’s information by an expert after extracting and decrypting the
watermark and the processing time required for extracting.
Imperceptibility is how much the watermarked images are similar to the original one
in terms of dB [29]. The PSNR is used to measure the similarity degree between the
cover and the watermarked images. It is defined via the Mean Square Error (MSE),
The relative metrics are shown in Eqs. (6) and (7).
568 A. Soualmi et al.
H −1 W
−1
1
MSE = (O I (l, m) − W I (l, m)) (6)
H × W l=0 m=0
where H and W both are the image size, OI and WI are the original and watermarked
images.
2552
PSNR(dB) = 10 log10 (7)
MSE
From the experimental values of PSNR, as shown in Table 3, we observe that the
proposed keep the image fidelity.
The robustness means how much the watermark could resist after any kind of alter-
ation [29]. The proposed method is tested against most popular serious attacks such as
rotation, adding noise, median filtering, and average filtering attacks. The first attack
affects the watermark by rotating the image from 0° to 360° [44]. While the main
purpose of the second attack is to increase the difficulty of the watermark extraction
process, by adding noise to the watermarked image. The noise value is varied from
0 to 1 [24, 35]. The third attack changes the values of center pixels with the median
values of the sorted pixels [44]. The last attack (e.g. average filtering attack), replaces
each pixel of the image with the neighbor pixels Average Value [44].
In our experimentations, we use Bit Error Rate (BER) measure to evaluate the
resistance to attacks. It gives the probability of watermark binary data that are incor-
rectly received [15]. The BER could be bounded value between 0 and 1, or in terms of
percentage (0–100), and the BER low value means a good resistance against attacks.
Table 2 The important execution steps of the proposed method
Cover image Apply NTT Watermark data
Nesrine Djellabi
6/10/1993
Female
Encrypted watermark data NTT−1 Extracted watermark data Decrypted extracted watermark
34DE8D0DFCEF50E198675E2EE3147E34 34DE8D0DFCEF50E198675E2EE3147E34 Nesrine Djellabi
6 /10/1993
Medical Data Protection Using Blind Watermarking Technique
Female
569
570 A. Soualmi et al.
Cr
B E R = 100 × (8)
ABr
where Cr is the number bit corrupted and ABr is the watermark bits number.
The processing time, the BER for both retrieved watermarks, and the watermarked
images are shown in Table 4, which clarifies that the proposed approach resists against
some dangerous attacks such noise, rotation, median filtering, and average filtering.
This is proved through the low BER of the retrieved watermarks. The attacks are
performed on the watermarked image using Stirmark benchmark software [14].
The processing time means the time required for encrusting and de-watermarking
phases. Figure 11 demonstrates the execution time (ms) needed to watermark embed-
ding/extracting in/from different images (a..h). The processing time directly depends
on image’s characteristics (textures, smoothness…etc.).
6 Discussion
In this work, a blind watermarking technique is proposed for medical data protection
to support telemedicine. The proposed method leak from the good performances
of NTT. Hence, it offers medical staffs with many advantages: good fidelity, high
Security, and low execution time. However, it stills suffers from the limited robustness
degree with many geometric and signal processing attacks such: JPEG compression
572 A. Soualmi et al.
Table 4 Obtained results after noise, rotation, median filtering, and average filtering
Attacked Image
Original image Watermarked image
(continued)
Medical Data Protection Using Blind Watermarking Technique 573
Table 4 (continued)
embedding extracting
3.5
2.5
1.5
0.5
0
a b c d e f g h
7 Conclusion
In this work, our purpose was to propose an effective watermarking method that
could ensure medical data security. The proposed method is employed to encrust the
data into the transform domain-based NTT. This watermarking approach allows fast
embedding and extracting of watermark in/from the watermarked image, the water-
mark is cyphered using the secret key shared with Diffie-Hellman then embedded
in the NTT coefficients. The proposed technique is blind, this means that the data
embedded could be extracted only with the key used in embedding, without the
presence of the watermark or host image.
Future works will focus to enhance the robustness against geometric and signal
processing attacks The future works will focus also on validating the watermarking
Medical Data Protection Using Blind Watermarking Technique 575
technique for real-time telemedicine applications This presents open issues that
hinder the progress of telemedicine systems.
References
1. Agarwal, N., Singh, A. K., Singh, P.K.: Survey of robust and imperceptible watermarking.
Multimedia Tools Appl. 1–31 (2019)
2. Liu, Y., Zhang, Y., Ling, J., Liu, Z.: Secure and fine-grained access control on e-healthcare
records in mobile cloud computing. Fut. Gene. Comput. Syst. (2017)
3. Byun, S.W., Son, H.S., Lee, S.P.: Fast and robust watermarking method based on DCT specific
location. IEEE Access 1–17 (2019)
4. Singh, S.P., Bhatnagar, G.: A robust blind watermarking framework based on Dn structure. J.
Amb. Intell. Human. Comput. 1–19 (2019)
5. Kumar, S., Jha, R.K.: FD-based detector for medical image watermarking. IET Image Process.
13(10), 1773–1782 (2019)
6. Phadikar, A., Jana, P., Mandal, H.: Reversible data hiding for DICOM image using lifting and
companding. Cryptography 3(21), 1–19 (2019)
7. Gao, T., Jiang, F., Li, D.: A robust zero-watermarking algorithm for color image based on
tensor mode expansion. Multimedia Tools Appl. 1–16 (2020)
8. Savakar, D.G., Ghuli, A.: Robust invisible digital image watermarking using hybrid scheme.
Arab. J. Sci. Eng. 1–14 (2019)
9. Thanki, R., Borra, S.: Fragile watermarking for copyright authentication and tamper detection
of medical images using compressive sensing (CS) based encryption and contourlet domain
processing. Multimedia Tools Appl. 1–20 (2018)
10. Hsu, L.Y., Hu, H.T.: Blind image watermarking via exploitation of inter-block prediction and
visibility threshold in DCT domain. J. Vis. Commun. Image Represent. 1–20 (2015)
11. Jiang, F., Gao, T., Li, D.: A robust zero-watermarking algorithm for color image based on
tensor mode expansion, Multimedia Tools Appl. 1–16 (2020)
12. Vaishnavia, D., Subashini, T.S.: Robust and invisible image watermarking in RGB color space
using SVD. Procedia Comput. Sci. 46, 1770–1777 (2015)
13. Ariatmanto, D., Ernawan, E.: An improved robust image watermarking by using different
embedding strengths. Multimedia Tools Appl. 1–27 (2020)
14. Su, Q., Niu, Y., Wang, Q., Sheng, G.: A blind color image watermarking based on DC component
in the spatial domain. Optik Int. J. Light Electron Opt. 124(23), 6255–6260(2013).
15. Radharani, S., Valarmathi, D.: A Study on watermarking schemes for image authentication.
Int. J. Comput. Appl. 2(4), 24–32 (2010)
16. Gomathikrishnan, M., Tyagi, A.: HORNS-A homomorphic encryption scheme for cloud
computing using residue number system. IEEE Trans. Parallel Distrib. Syst. 23(6), 995–1003
(2011)
17. Mohanta, B.K., Gountia, D.: Fully homomorphic encryption equating to cloud security: an
approach. IOSR J. Comput. Eng. (IOSRJCE) 9, 46–50 (2013)
18. Salimi, L., Haghighi, A., Fathi, A.: A novel watermarking method based on differential
evolutionary algorithm and wavelet transform. Multimedia Tools Appl. 1–18 (2020)
19. Kannammal, A., Rani, S.S.: Authentication of DICOM medical images using multiple fragile
watermarking techniques in wavelet transform domain. Int. J. Comput. Sci. Iss. (IJCSI) 8(6)(1),
181–189 (2011)
20. Abdeldayem, M.M.: A proposed security technique based on watermarking and encryption for
digital imaging and communicating in medicine. Egypt. Inform. J. (EIJ) 1–13 (2012)
21. Ernawan, F., Kabir, M.N.: An improved watermarking technique for copyright protection based
on Tchebichef moments. IEEE Access 1–20 (2019)
576 A. Soualmi et al.
22. Liu, J., Huang, J., Luo, Y., Cao, L., Yang, S., Wei, D., Zhou, R.: An optimized image
watermarking method based on HD and SVD in DWT domain. IEEE Access 80849–80860
(2019)
23. Arumugham, S., Rajagopalan, S., Rayappan, J.B.B., Amirtharajan, R.: Tamper-resistant secure
medical image carrier: an IWT–SVD–Chaos–FPGA combination. Arab. J. Sci. Eng. 44, 9561–
9580 (2019)
24. Singh, D., Singh, D.: DWT-SVD and DCT based robust and blind watermarking scheme for
copyright protection. Multimedia Tools Appl. (2016)
25. Sadeghi, M., Toosi, R., Akhaee, M.A.: Blind gain invariant image watermarking using random
projection approach. Sig. Process. 163, 213–224 (2019)
26. Rahman, A., Sultan, K., Aldhafferi, N., Alqahtani, A., Mahmud, M.: Reversible and fragile
watermarking for medical images. Comput. Math. Methods Med. 2018, 1–7 (2018)
27. Soualmi, A., Alti, A., Laouamer, L., Benyoucef, M.: A blind fragile based medical image
authentication using Schur decomposition. In: International Conference on Advanced Machine
Learning Technologies and Applications, pp. 623–632. Springer (2019)
28. Rakhmawati, L., Wirawan, W., Suwadi, S.: A recent survey of self-embedding fragile water-
marking scheme for image authentication with recovery capability. EURASIP J. Image Video
Process. 61, 1–22 (2019)
29. Tao, H., Chongmin, L., Zain, J.M., Abdalla, A.N.: Robust image watermarking theories and
techniques: a review. J. Appl. Res. Technol. 12(1), 122–138 (2014)
30. Guikema, S.D., Aven, T.: Assessing risk from intelligent attacks: a perspective on approaches.
Reliab. Eng. Syst. Saf. 95(5), 478–483 (2010)
31. Kamran, A.K., Malik, S.: A high capacity reversible watermarking approach for authenticating
images: exploiting down-sampling, histogram processing, and block selection. Inform. Sci.
256, 162–183 (2014)
32. Lee, C.F., Shen, J.L., Chen, Z.R., Agrawal, S.: Self-embedding authentication watermarking
with effective tampered location detection and high-quality image recovery. Sensors 1–18
(2019)
33. Wang, C., Wang, X., Zhang, C., Xia, Z.: Geometric correction based color image watermarking
using fuzzy least squares support vector machine and Bessel K form distribution. Sig. Process.
134, 197–208 (2017)
34. Agarwal, N., Singh, A.K., Singh, P.K.: Survey of robust and imperceptible watermarking.
Multimedia Tools Appl. 1–31 (2019)
35. Ortiz, A.M., Uribe, C.F., Beltran, R.H., Hernandez, J.J.G.: A survey on reversible watermarking
for multimedia content: a robustness overview. IEEE Access 1–21 (2019)
36. Mousavi, S., Naghsh, A., Abu-Bakar, S.: Watermarking techniques used in medical images: a
survey. J. Digit. Imag. 27(6), 714–729 (2014)
37. http://dicom.nema.org/dicom/2013/output/chtml/part03/PS3.3.html. Accessed 15 Oct 2017
38. Kallam, S.: Diffie-Hellman: Key exchange and public key Cryptosystems. Master degree of
Science, Math and Computer Science, Department of India State University, USA, pp. 5–6
(2015)
39. Laouamer, L.: Toward a robust and fully reversible image watermarking framework based on
number theoretic transform. Signal Imag. Syst. Eng. Indersci. 10(4), 169–177 (2017)
40. Roy, S., Pal, A.: A blind DCT based color watermarking algorithm for embedding multiple
watermarks. AEU Int. J. Electr. Commun. 72, 149–161 (2017)
41. http://www.barre.nom.fr/medical/samples
42. http://deanvaughan.org/wordpress/2013/07/dicom-sample-images/
43. http://www.barre.nom.fr/medical/samples/
44. Kandi, H., Mishra, D., Gorthi, S.: Exploring the learning capabilities of convolutional neural
networks for robust image watermarking. Comput. Secur. 65, 247–268 (2017)
45. http://w.petitcolas.net/fabien/watermarking/stirmark/
An Artificial Intelligence Authentication
Framework to Secure Internet
of Educational Things
1 Introduction
The rapid growth technology in IoT industry led to the increment in the computing
capacity of devices. Many generations of IoT devices where defined based on many
factors and the ability of computing at the edge is providing huge independency
without the need of having a central management [1]. The first time the definition of
IoT identified was by Kevin Ashton in 1999 [2]. Simultaneously, devices are being
developed to adapt the new cloud-scale developments in many fields of IoT. The
expansion of using Internet of Educational Things systems–and their rapid growth
development devices with various application types–represents the next level of a
strong connected community [3].
However, many security risks and vulnerabilities have appeared due to:
(1) The intricacy level of the IoET systems.
(2) The wide-usage of intelligent machines in industry procedures.
Because of the smart applications fast development and the automation integration
in different industry fields nowadays, IoT and Artificial Intelligence have contributed
in security part for the development form both academically and practically [4].
Figure 1 shows the initial three generations of IoT starting from the classic router
© The Editor(s) (if applicable) and The Author(s), under exclusive license 577
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_26
578 A. A. Mawgoud et al.
at home reaching to sophisticated small sensors. The result of merging those three
definitions (network of objects, computing at the edge and cloud-scale) is to enable
the transformation to IoT in each industry [5].
Internet of Educational Things (IoET) is a definition that describes a huge range
of addressable unique smart devices in the education field. The devices can commu-
nicate with each other in one or many nodes through the internet [6]. The connected
devices or things represent a new concept level of the internet usage. A node of
devices that can be connected anytime, anywhere with the internet can provide a
smart environment. The connected devices can communicate with each other in
education environment (Labs-Classrooms-Research Centers-Lecture Halls). Also, it
can share critical data that are involved in the layer of decision making. The whole
previous processes describes the definition of IoE. The concept of (Internet of Things)
at its beginning generally was defined in many previous researches as: Internet of
processes, Internet of Data, Internet of People or Internet of Signs [7], as [8] stated
in their research, that the initial concept of IoT depends on the connectivity of many
objects through the internet based on a pre-defined architecture, those objects can be
physical or virtual. There is an enormous increment of the connected devices number,
as a result, there was a need from the scientific research field to come out with new
solutions; to face the new rised challenges with this regards. As shown in Fig. 2
below, (Gartner’s forecast) estimated that the number of connected devices word-
wide will increase from 6 billion devices (2015) to about 27 billion smart device by
2025. Most of those connected devices will be connected through Wireless Personal
Area Networks (WPAN). It is estimated that by 2025:
(1) The revenue in the IoT industry will be 3 trillion US$.
(2) Over two zetta bytes of data will be generated from IoT devices [10].
The IoT challenges are mostly related to security and privacy, reliability, scala-
bility and management [11]. Those challenges were studied and analyzed in many
An Artificial Intelligence Authentication Framework … 579
Fig. 2 Statistics shows the expected number of active IoT devices between 2015 till 2025 [9]
previous surveys such as [12–14]. IoT systems generally are being considered an
essential infrastructure for future smart cities, smart transportation, heavy industries,
smart hospital, higher education and military operations [15]. The main discussion
in this paper is the role of IoT in the higher education field and how to provide
a secure authentication scheme to suit the characteristics of this environment. The
concept of IoET was studied previously in may different researches, the discussions
took the idea of how to integrate the IoT system with the education facilities. IoET is
considered the bridge that connects the physical side with the cyber side in education
industry; However, IoET provides a vision of an educational network of connected
smart devices. Figure 3 shows various types of end points (i.e. actuators and sensors),
places or even the environment conditions for providing on-demand services [16].
It is estimated that the market of IoT size will grow up to reach to 3.8B$ by 2020.
This growth is mostly achieved through the sellings of IoT vendors in softwares,
hardwares and IoT solutions [17]. It will take no time till each known industry is
touched by the technology of IoT. As shown in Fig. 4, Forrester’s heat map illustrated
the IoT opportunities in many different industries and applications. Although there
is a noticable growth for IoT opportunities in many fields, the map shows different
sectors to use IoT in. The highest opportunities for IoT are as follow:
1 Security and surveillance in both government and education sectors.
2 Supply chain and fleet management in transportation.
3 Inventory and warehouse in retail.
4 Industrial assest management in high technology production.
Lufthansa Airlines, is an example of smart airline environment; they use real-time
aircraft and weather sensor data to enhance performance and optimize operations.
However, there were a lot of warnings from buying huge numbers of IoT devices in
such an early stage of its development; to avoid the negative consequences that the
scientific field did not spot the light on [19]. No doubt that the IoT market is growing
fast in many fields that provided new business opportunities, but the predictions about
the IoT growth in industries are still un realistic; as the fear many companies may take
the risk of running out of time and mone before gaining profits from their investments.
In this paper, we focus on the security part of IoT in general and the educational
environment in specific. A multi factor authentication is proposed to secure the
connected devices in IoET systems, we have provided a general introduction about
the IoT technology to highlight the importance of security and privacy in this field,
this paper is organised as following: Sect. 2 discusses previous related work and
An Artificial Intelligence Authentication Framework … 581
2 Literature Review
It takes time and effort to have the students’ attendance each course, using IoT can
provide the consumption of both time and efforts. Previous studies have focused
on efficient smart classrooms using IoT systems to record the attendance for both
students and teachers after every lecture accurately, for each student’s ID there is an
attached RFID tag; to let the installed (Student Record System) in every classroom
and lecture hall to count and identify each student attached with their timing. As
a result, statistics can be extracted periodly to show the attendance ratio for each
student [30].
An Artificial Intelligence Authentication Framework … 583
The essential target from smart labs is teaching needed learning skills for students
in a PiP environment. The lab is equipped with a node of IoT devices to provide the
needed data and statistics. Researches such as [31] have showed the result of testing
the smart labs with IoT. It has provided the suitability for students from different age
and backgrounds to learn faster.
• IoET Authentication Problem
In education environment, security and privacy are important to secure critical data of
both students and professors, these data can be: Modules lectures, Grading History,
Student Record Systems and Attendance Applications. Giving the needed authoriza-
tion to a certain group members through traditional authentication approaches have
showed some security breaches.
Researches such as [32, 33] have studied the authentication security problems in
IoT environment in general. However, there was a huge space for improvement for the
proposed approaches; as it can provide more effectiveness in enhancing the security
in IoT systems. Particularly, in dynamic IoET networks, there was an authentication
scheme, this was proposed in [34] by using machine learning algorithm based on
the physical layer, to be as a defensive solution against spoofing attacks through
tracking the events in physical layer and application layer in dynamic environments.
Additionally, [35] has provided an authentication provision in physical layer based on
extreme machine learning for improving the accuracy of the attack detection process.
Hasan et al. [36] designed watermarking methods for detecting attacks based on deep
learning method for dynamic authentication that enables the IoT objects to get a set
of stochastic qualities from the generated signals and watermark those qualities into
a signal. Suzuki and Koyama [37] have studied a Physical non-cloneable Function
(PUF) in their research, this was based on existing circuits array, they use an extreme
learning approach during the authentication phase [38]. While in [39] a developed
deep learning method was proposed to recogne the behavior of denial of service and
malicious attacks based on previously calculated data.
In his research, [40] has developed a learning-based model for centralized servers
in order to extract the mobility features and differentiate between the (Sybil Attacks)
and (Benign Vehicles) through analyzing their behaviors. Although, the previ-
ously mentioned approaches contribute to security enhancement through proposing,
exploring and developing new machine learning methods, they still insufficient in
facing the security challenges in the IoT systems. The taken time by machine learning
methods in training, detecting and preventing attacks can lead to an increment of long
time-latency, this is due to the overload on communication and communication [41].
Yu et al. [42] have proposed an authentication approach using ECC-based IoT
scheme, they have created their own scheme based on a secure ID-verifier protocol,
the main focus in their research was on the security on the server and its relation with
the Radio Frequency Identification (RFID) tag [42].
584 A. A. Mawgoud et al.
Chen [43] in their study have provided a framework using a user authentication for
IoT systems and key agreement approach through a smart card, after the registration
process in the proposed scheme, the authorized user will obtain a (Smart Card) to
help him/her into server access [43].
Cui et al. [44] have proposed ML-attack predictive model with PUFs. This was
done by investigating the Ring Oscillator PUF architectures strength in normal
PUF softwares. The proposed attack model is based on different machine learning
methods. The numerical simulations based on PUFs digital approaches shows the
CRPs results. The silicon CRPs results were nearly to the one obtained from the
simulated CRPs, the finding of this work led to the need to have new additional
requirement to secure the electrical PUFs. Thus, the results of this research will help
both PUF attackers and designer at their future work [45].
Most of the proposed machine learning solutions are based on a binary technique
which means in the case of authentication success, the user will gain access to 100%
of the resources. While in the case of authentication failure the user will be incapable
to communicate with any of the system components overall [46]. Consequently, the
previous researches were far away from providing a speed and lightweight authen-
tication. Therefore, the focus in this paper is to (1) Envision an machine learning-
secure methodology to overcome the security threats, (2) Achieve fast and progressive
authentication in IoET networks.
The unique features of IoET systems in education systems brings various vulnera-
bilities in security provisioning. Apparently, IoT devices that suffer from resource
restrictions would not provide the required security mechanisms for the marvelous
devices included in the IoET systems demand low-delay transmissions to assure the
performance of their communication [47]. Therefore, to provide protection against
those security challenges, this paper focuses on examining the challenges of the tradi-
tional authentication schemes in IoET systems and providing an method for enriching
security in IoET systems. Furthermore, the usage of (Resource-Constrained Devices)
can be risky and represent a huge threat to the whole IoET network by forging,
tampering, data injection and spoofing attacks [48]. With the consecutive effect,
those risks can lead to IoET network failure. Specifically, for the applications that
depend in cooperation with different entities [49]. Authentication has been defined
as a security key mechanism for IoET architecture design; as the hackers need to
have access to the IoET system to start their attack [50]. This method secures the
communication within the whole IoT network through approving the identities and
their right for accessing to the authorized resources in the network [51].
An Artificial Intelligence Authentication Framework … 585
The traditional cryptography methods need a protracted process and improved over-
head; in order to increase the security level. Therefore, it will lead to high overhead
in both computation and communication [54].
• Communication Latency
There are unendurable for IoET systems that has an expressively high number of
smart devices and resource-constrained machines with synchronized communica-
tions [5]. Furthermore, the traditional statistical methods for authentication need
sufficient time and high resources for attaining the statistical resources. Consequently,
it will lead to limited proficiency in detecting attacks instantly [55]. Therefore, there
is a necessity to have an effective multi-factor authentication method to be provided
for the applications in the IoET systems.
• Adapting Failures
Concisely, failing to adapt any dynamic environment can happen due to lack of
security; so enriching the security is necessary in IoET networks, mostly when it
comes to the data that uses Machine Learning [56].
The routers and gateways can use machine learning management in IoET systems,
such as information gathering, training and maintenance. Hence, the processes
overload in the IoT environment can be reduced in low-power devices. Machine
learning methods can use data history to simplify security management [59]. There-
fore, the IoT devices authentication processes can be enhanced as well as security
management. Hence, machine learning can achieve continues acceleration of identity
authentication in dynamic IoET networks relying on multi-dom Machine Learning
information [60].
• Provide real-time learning under limited statistical data
4 Problem Solution
Generally, security enrichment in IoT systems represents a big challenge for both
researchers and developers, the authentication method should be secured from
leaking any confidential data [62]. With the machine learning methods, the provision
of intelligent security can be designed by using channel reciprocity as well as the
precise connections, devices and biometric properties deprived of the confidential
data transmission [63]. Machine Learning methodologies can help in strengthening
the channel reciprocity, with the purpose of acquiring the exceedingly private infor-
mation on both the transmitter and the receiver. Furthermore, the machine learning
method can track certain authentication behavior features without the transmission
of any confidential data [64].
5 Experiment
In the proposed approach, in which the routing protocol is mainly relying on the Leach
Protocol [65], there is a new proposed protocol for the existing sensor networks, a set
of networks are formed through the usage of power data for better efficiency. There
An Artificial Intelligence Authentication Framework … 587
are many characteristics for the IoT environment, each sensor network in a certain
pre-defined zone can send the receiver sensor with one move.
Concerning the battery life, every sensor node has the ability to choose a middle
node (MN) and a network from this connection. In the time the sensor node makes
this connection with the chosen node, it starts collecting the needed information to
be transmitted for the middle node. From its side, the middle node confirms the
needed information and send it back. With the passing time, MN uses comparatively
higher power with a comparison to the other nodes, it can obstinately affect the whole
network in case some nodes became overloaded. Therefore, after a pre-defined certain
time, a specific MN is chosen for distributing its usage of power by the surrounding
nodes. In order to measure the power consumption for the sensor nodes, each node
that exist in every set will start transmitting their power usage data, after that the MN
will calculate the overall used power and send back the collected data to the sensor
node. The sensor node from its side will choose about 40% of the nodes which is
not representable in the MN calculations in a probability algorithm, as shown in the
equation below.
K opt
Pi(x) = Ci(t) = 0or 1
N − Kopt r mod N
Kopt
• i represents each node, it includes a value of the overall sensor nodes N that exists
in the system.
• t represents the time.
• Zi (t) is the event in which the selected i as MN at a certain time t;
• r represents the overall sensor network;
• Kopt represents the overall rounds number which reach its end at Round Set;
• C i (t) is a binary value, it is either or 1, depending on whether it is chosen as MN;
• The round is defined as a period between the cluster creation to its termination.
• Round Bunch is defined as the state in which all the nodes are qualified to be
cluster heads.
As the round advances, since the sensor’s number engaging in the middle node
choice decreases, there is a probability of getting to be a middle node MN increments.
At this moment, the possibility to become a middle node is Kopt .
• The sensor which make the Zi (t) calculation will choose a z a number value
between 0 and 1;
• In case z < Zi (t), it chooses itself as MN.
• In case z > Zi(t), the node is utilized to shape a cluster.
• In case the existing nodes got to become a middle node (MN) at Round Group,
Zi (t) = 0, so it is inconceivable to turn into a middle node (MN) before having
another round.
• In the last round, Zi (t) = 1, as a result the nodes that have never became middle
node will be chosen as middle node.
588 A. A. Mawgoud et al.
Table 1 Proposed
Parameter Meaning
authentication model
(Constraints Abbreviation) MN Middle Node
CA Certificate Authority
ID ID of the normal node
Cid ID of the middle node (MN)
R1i , R2i , R3i 3n Bit Divided Value
Sk Session Key
Nk f () Function Shared Key
Ni After distance bounding remaining bits
Ci Random bit
g(x) Polynomial Key
e(x,y) Encryption Polynomial Key
The main constraints that are used in our proposed protocol is used as following
(Table 1):
Stage 1. The nodes which have received a spam message creates M0 and H0 as a
verification step followed by sending the spams to the gateway.
Stage 2. The gateway which has gained both M0 and H0 , has to create M1 and H1
and transmits both of them to the Certificate Authority origin.
Stage 3. The Certificate Authority which has gained data, this data was sent from the
gateway and it has two irregular numbers using deciphering and approves H0 and
H1 . It creates M2 and M3 and then it transmits them to the gateway.
Stage 4. The gateway which has already verified both M1 and M2 gets R1 from deci-
phering M3 which is more checks to whatever slip in. Those accepted esteem through
a hash capacity. It conserves 3 × n-bit data produced by two irregular numbers for
confirmation after that transmits M2 to the smart node.
Stage 5. The node which obtained M2 gets R2 and approves the despatched value
via a hash feature. If the approval is completed, the node saves 3 × n-bit data.
Stage 6. The smart node generates random variety C1 to carry out the verification
step and sends it through bit. In this part, the time test for stopping relay attacks
occurs.
Stage 7. The gateway that acquired bits from the smart node sends the ith little bit of
R0 if C1 = 0 in acknowledgment, and sends the ith little bit of R1 to the smart node
if the acknowledgment is 1.
Stage 8. The smart node generates Rcn i depends on the c despatched to the gateway
and compares it to Rcn
i , that is the collected value of the cluster head’s response to
confirm whether the data was sent from the right node. After “time off,” it guesses
An Artificial Intelligence Authentication Framework … 589
the gap via time measurement and prevents communication if that is extra than a
specific time.
Stage 9. The smart node that validated the gateway sends the obtained Ricn values
the use of the f() feature to the gateway.
Stage 10. The smart node that received a certification cost from the gateway generates
a certification cost in same manner and compares it to the value obtained from the
cluster head to validate the smart node.
Stage 11. The both node and the gateway generate a session key through the left n bit
and a random range from the 3 × n-bits and ends the verification as shown in Fig. 6
Fig. 6 The proposed framework for authentication process and device registration
590 A. A. Mawgoud et al.
Stage 1. After finishing certification sensor set formation and session key distribution,
the sensor node includes out the set key distribution and renewal technique as follows.
the use of the polynomial distribution of PCGR, it makes MN do various of the
calculation, distributes portions of the particular value via the f(y) function, and
verifies whether or not the node is infected or should to be withdrawn.
Stage 2. The middle node defines and generates polynomials g(d) and f(z) for the set
distribution and node verification. After that, it generates verification values S and
Dn subsequent, it generates P1 for transmitting to nodes and deletes g(d) and e(d) to
stop them from being uncovered through an attacker.
Stage 3. The sensor node that obtained P1 decodes this, and obtains the group key and
mystery piece. It then informs middle node that the set key changed into effectively
acquired.
Step 4. After a specific time period, middle node transmits a message to update the
set key to the nodes inside the set. Nodes that acquire the key update message send
middle node an encrypted mystery piece with the session key in acknowledgment, the
middle node tests the validity of the obtained value through a lagrangian polynomial
and sends a verification message. If the value is different, it will inform certain nodes
of this as an intrusion.
Step 5. Nodes that obtained the message from middle node transmit nodes around
Pn .
Step 6. After the key is updated, it finishes the verification procedure with nodes via
the set key.
damaged nodes number is lower than λ; if the damaged nodes number is higher than
λ, most of the secure data are in-danger. In the case of the PCGR approach, data
can be saved and the needed calculations can differ based on the neighboring nodes
created the group key. Presumptuous that the neighboring nodes number is O and
the possibility that the neighboring nodes cause the group key is Pro(d), it provides
a polynomial expression of the encrypted group key with (p + 1) L-bit length and bit
data of n (p + 1) L, that is fractional data of the neighboring nodes.
The following experiment that was made through using the Matlab application to
measure the time efficiency for every IoET device for proposed solution and the
earlier encryption techniques. The initial values were set to perform the simulation
and the simulation time, the amount of node power were put into consideration.
For simulation, the efficiency of the time was measured based on the IoET devices
increment by setting a total of 20 to 300 nodes, and the experiment was compiled by
setting an independent environment for every encryption technique and the proposed
approach. The distribution of the sensor nodes was created randomly in a 50 m ×
50 m area in spite of the nodes number. The gate was allocated in a certain position
with the consideration of the placement area.
Based on the placement, the space between the gateway and the sensor nodes
continued between 50 and 20 m. Every round was configured to stay for 40 s. Figures 7
and 8 illustrates the simulation result after the network set was set. The results of the
simulation have proved that in the situation of the proposed approach, the operation
100%
90%
80%
70%
60% Our Model
MS
50% ECC
40%
SSL
30%
20% RSA
10%
0%
20 40 60 80 120 180
IOT DEVICES NUMBER
Fig. 7 Client performance between authentication time rates and IoT devices number
592 A. A. Mawgoud et al.
8000
7000
6000
5000
ms
4000
3000
2000
1000
0
20 40 60 80 120 180
Our Model 18 51 75 96 185 356
ECC 123 201 368 412 685 1306
SSL 158 535 954 1430 1921 3977
RSA 5 358 580 570 1058 1845
Fig. 8 Server performance between authentication time rates and IoT devices number
being completed in the device was depending on a primitive hash process and arith-
metic process with less operation usage time in comparison of other authentication
approaches.
Finally, during the disturbance of the group key and renewal operation, all poly-
nomial process was made in the server, thus the ability to reduce the device load and
as long as an inconsequential characteristic in comparison to other authentication
approaches.
7 Conclusion
In this paper, both the definitions and the characteristics of Internet of Educational
Things (IoET) were introduced with illustrations of the security challenges and
threats in the IoET environment. However, there is a possibility for false authentica-
tions to occur for devices in distributed IoT networks during the compromisation of
the trusted nodes. İt may lead to potential leakage and high-security threat. Machine
learning can represent a significant contribution for designing a proper multiple
authentication schemes for IoET systems. In this paper, a proposed model solution
through using easy hash calculations to authenticate different types of IoT devices
in educational environment with hardware limitations. Additionally, the routing
An Artificial Intelligence Authentication Framework … 593
protocol is considered plan in current sensor system environment for the proto-
cols that was designed to simply accept IoT devices that cannot provide encrypted
modules in a heterogeneous IoET system environment, which is a problem of existing
IoT environment in general.
References
1. McKnight, M.: IOT, Industry 4.0, Industrial IOT… Why connected devices are the future of
design, vol. 2, pp. 197. KnE Engineering (2017). https://doi.org/10.18502/keg.v2i2.615
2. Gabbai, A.: Kevin Ashton Describes “The Internet of Things”. In: Smithsonian Magazine
(2020). https://www.smithsonianmag.com/innovation/kevin-ashton-describes-the-internet-of-
things-180953749/. Accessed 26 Jan 2020
3. Singh, A.: Implementation of the IoT and cloud technologies in education system. SSRN
Electron. J. (2019). https://doi.org/10.2139/ssrn.3382475
4. Miyajima, H., Shiratori, N., Miyajima, H.: Proposal of security preserving machine learning
of IoT. Artif. Intell. Res. 7, 26 (2018). https://doi.org/10.5430/air.v7n2p26
5. Hwang, S., Seo, J., Park, S., Park, S.: A survey of the self-adaptive IoT systems and a compare
and analyze of IoT using self-adaptive concept. KIPS Trans. Comput. Commun. Syst. 5, 17–26
(2016). https://doi.org/10.3745/ktccs.2016.5.1.17
6. Mawgoud, A.A., Taha, M.H.N., Khlifa, N.E.M.: Security Threats of Social Internet of Things
in the Higher Education Environment, pp. 151–171 (2020)
7. Abbasy, M., Quesada, E.: Predictable influence of IoT (Internet of Things) in the higher
education. Int. J. Inf. Educ. Technol. 7, 914–920 (2017). https://doi.org/10.18178/ijiet.2017.7.
12.995
8. Gronau, N., Ullrich, A., Teichmann, M.: Development of the industrial IoT competences in
the areas of organization, process, and interaction based on the learning factory concept. Proc.
Manuf. 9, 254–261 (2017). https://doi.org/10.1016/j.promfg.2017.04.029
9. Bring the “Smart” into Smart Things at CES 2020- MicroEJ-Market-Leading Solutions for
Embedded and IoT Devices. (2020). Retrieved 15 July 2020, from https://www.microej.com/
news/microej-brings-the-smart-into-smart-things-at-ces-2020/
10. Ismail, N.: Gartner’s 2017 forecasts-information age. In: Information Age (2020). https://www.
information-age.com/gartners-2017-forecasts-123463932/. Accessed 26 Jan 2020
11. Pinka, K., Kampars, J., Minkevičs, V.: Case study: IoT data integration for higher education
institution. Inf. Technol. Manag. Sci. (2016). https://doi.org/10.1515/itms-2016-0014
12. Ayare, M.: A survey on IoT: architecture, applications and future of IoT. Int. J. Res. Appl. Sci.
Eng. Technol. 7, 1235–1239 (2019). https://doi.org/10.22214/ijraset.2019.4221
13. Moinuddin, K., Srikantha, N., Lokesh, K.S., Narayana, A.: A survey on secure communication
protocols for IoT systems. Int. J. Eng. Comput. Sci. (2017). https://doi.org/10.18535/ijecs/v6i
6.41
14. Mewada, D., Dave, N., Prajapati, P.: A survey: prospects of Internet of Things (IoT) using
cryptography based on its subsequent challenges. Aust. J. Wirel. Technol. Mobil. Secur. 1, 1–5
(2019). https://doi.org/10.21276/ausjournal.2019.1.3
15. Condry, M., Nelson, C.: Using smart edge IoT devices for safer, rapid response with industry
IoT control operations. Proc. IEEE 104, 938–946 (2016). https://doi.org/10.1109/jproc.2015.
2513672
16. Lamonaca, F., Scuro, C., Grimaldi, D., et al.: A layered IoT-based architecture for a distributed
structural health monitoring system. ACTA IMEKO 8, 45 (2019). https://doi.org/10.21014/
acta_imeko.v8i2.640
17. Top 10 Insights of 2018. (2020). Retrieved 19 November 2019, from https://www.mckinsey.
com/about-us/new-at-mckinsey-blog/top-10-insights-of-2018
594 A. A. Mawgoud et al.
18. Forrester Research Reports 2017 Fourth-Quarter And Full-Year Financial Results. Forrester.
(2020). Retrieved 3 September 2019, from https://go.forrester.com/pressnewsroom/forrester-
research-reports-2017-fourth-quarter-and-full-year-financial-results/
19. Hussein, D., Hamed, M., Eldeen, N.: A blockchain technology evolution between business
process management (BPM) and Internet-of-Things (IoT). Int. J. Adv. Comput. Sci. Appl.
(2018). https://doi.org/10.14569/ijacsa.2018.090856
20. El Karadawy, A.I., Mawgoud, A.A., Rady, H.M.: An empirical analysis on load balancing and
service broker techniques using cloud analyst simulator. International Conference on Innovative
Trends in Communication and Computer Engineering (ITCE), pp. 27–32. Aswan, Egypt, IEEE
(2020)
21. Lee, C.: An Adaptive traffic interference control system for wireless home IoT services. J.
Digit. Converg. 15, 259–266 (2017). https://doi.org/10.14400/jdc.2017.15.4.259
22. Yaswanth Sai, P.: Illustration of IOT with big data analytics. Int. J. Comput. Sci. Eng. (2017).
https://doi.org/10.26438/ijcse/v5i9.221223
23. Revathy, R., Aroul Canessane, R.: IoT based decision making system to improve veracity of
big data. Int. J. Eng. Technol. 7, 63 (2018). https://doi.org/10.14419/ijet.v7i3.1.16799
24. Dyagilev, V.: Target Attacks on IoT and network security vulnerabilities increase, pp. 72–73.
LastMile (2018). https://doi.org/10.22184/2070-8963.2018.75.6.72.73
25. Capella, J., Campelo, J., Bonastre, A., Ors, R.: A reference model for monitoring IoT WSN-
based applications. Sensors 16, 1816 (2016). https://doi.org/10.3390/s16111816
26. Vukovic, M.: Internet programmable IoT: on the role of APIs in IoT. Ubiquity 2015, 1–10
(2015). https://doi.org/10.1145/2822873
27. Sun, X., Ansari, N.: Traffic load balancing among brokers at the IoT application layer. IEEE
Trans. Netw. Serv. Manage. 15, 489–502 (2018). https://doi.org/10.1109/tnsm.2017.2787859
28. Majchrowicz, M., Hufnagiel, M.: Management of IOT devices in a smart home through the
application of an interactive mirror. Image Process. Commun. 22, 43–50 (2017). https://doi.
org/10.1515/ipc-2017-0020
29. Guo, L., Zhou, B., Sun, Z., Liu, X.: Correlation of symmetric raised cosine keying signals. J.
Electron. Inf. Technol. 34, 1793–1799 (2013). https://doi.org/10.3724/sp.j.1146.2011.01361
30. Dedy Irawan, J., Adriantantri, E., Farid, A.: RFID and IOT for attendance monitoring system.
In: MATEC Web of Conferences, vol. 164, pp. 01020 (2018). https://doi.org/10.1051/matecc
onf/201816401020
31. Maria de Fuentes, J., Gonzalez-Manzano, L., Solanas, A., Veseli, F.: Attribute-based credentials
for privacy-aware smart health services in IoT-based smart cities. Computer 51, 44–53 (2018).
https://doi.org/10.1109/mc.2018.3011042
32. Khoureich Ka, A.: RMAC-a lightweight authentication protocol for highly constrained IoT
devices. Int. J. Cryptogr. Inf. Secur. 8, 01–14 (2018). https://doi.org/10.5121/ijcis.2018.8301
33. Braeken, A.: PUF based authentication protocol for IoT. Symmetry 10, 352 (2018). https://doi.
org/10.3390/sym10080352
34. Wu, X.: Embedded physical-layer authentication in cognitive radio requires efficient low-rate
channel coding schemes. IET Commun. 11, 400–404 (2017). https://doi.org/10.1049/iet-com.
2016.0812
35. Chavan, A., Nighot, M.: Secure and cost-effective application layer protocol with authentication
interoperability for IOT. Proc. Comput. Sci. 78, 646–651 (2016). https://doi.org/10.1016/j.
procs.2016.02.112
36. Hasan, M., Islam, M., Zarif, M., Hashem, M.: Attack and anomaly detection in IoT sensors in
IoT sites using machine learning approaches. Internet of Things 7, 100059 (2019). https://doi.
org/10.1016/j.iot.2019.100059
37. Suzuki, H., Koyama, A.: An implementation and evaluation of IoT application development
method based on real object-oriented model. Int. J. Space-Based Situat. Comput. 8, 151 (2018).
https://doi.org/10.1504/ijssc.2018.10018388
38. Huang, Z., Wang, Q.: A PUF-based unified identity verification framework for secure IoT
hardware via device authentication (2019). https://doi.org/10.1007/s11280-019-00677-x
An Artificial Intelligence Authentication Framework … 595
39. Praseetha, V., Bayezeed, S., Vadivel, S.: Secure fingerprint authentication using deep learning
and minutiae verification. J. Intell. Syst. 29, 1379–1387 (2019). https://doi.org/10.1515/jisys-
2018-0289
40. Gokula Krishnan, C., Suphalakshmi, D.: An improved MAC address based intrusion detection
and prevention system in MANET sybil attacks. Bonfring Int. J. Res. Commun. Eng. 7, 01–05
(2017). https://doi.org/10.9756/bijrce.8315
41. Mawgoud, A.A., Taha, M.H.N., Khlifa, N.E.M.: Cyber security risks in MENA region: threats,
challenges and countermeasures. In: International Conference on Advanced Intelligent Systems
and Informatics, pp. 912–921. Springer, Cham (2020)
42. Yu, S., Park, K., Park, Y.: A secure lightweight three-factor authentication scheme for IoT in
cloud computing environment. Sensors 19, 3598 (2019). https://doi.org/10.3390/s19163598
43. Chen, J.: Hybrid blockchain and pseudonymous authentication for secure and trusted IoT
networks. ACM SIGBED Rev. 15, 22–28 (2018). https://doi.org/10.1145/3292384.3292388
44. Cui, Y., Gu, C., Ma, Q., Fang, Y., Wang, C., O’Neill, M., & Liu, W.: Lightweight modeling
attack-resistant multiplexer-based multi-PUF (MMPUF) Design on FPGA. Electron, 9(5), 815
(2020)
45. Mawgoud, A.A., Ali, I.: Statistical insights and fraud techniques for telecommunications sector
in Egypt. In: International Conference on Innovative Trends in Communication and Computer
Engineering (ITCE). IEEE, pp. 143–150 (2020) (Amar, S., Deep, P.: A review on various IOT
analytics techniques for bridge failure detection in fog computing. Int. J. Comput. Appl. 169,
38–43 (2017). https://doi.org/10.5120/ijca2017914586)
46. Elazhary, H.: Internet of Things (IoT), mobile cloud, cloudlet, mobile IoT, IoT cloud, fog,
mobile edge, and edge emerging computing paradigms: disambiguation and research directions.
J. Netw. Comput. Appl. 128, 105–140 (2019). https://doi.org/10.1016/j.jnca.2018.10.021
47. Anuradha, K., Nirmala Sugirtha Rajini, S.: Analysis of machine learning algorithm in IOT
security issues and challenges. J. Adv. Res. Dyn. Control Syst. 11, 1030–1034 (2019). https://
doi.org/10.5373/jardcs/v11/20192668
48. Brous, P., Janssen, M., Herder, P.: The dual effects of the Internet of Things (IoT): a systematic
review of the benefits and risks of IoT adoption by organizations. Int. J. Inf. Manag. 101952
(2019). https://doi.org/10.1016/j.ijinfomgt.2019.05.008
49. Ferrag, M., Maglaras, L., Derhab, A.: Authentication and authorization for mobile IoT devices
using biofeatures: recent advances and future trends. Secur. Commun. Netw. 2019, 1–20 (2019).
https://doi.org/10.1155/2019/5452870
50. Ullah, I., Tila, F., Kim, D.: Access rights management based on user profile ontology for IoT
resources authorization in smart home. Int. J. Control Autom. 11, 1–12 (2018). https://doi.org/
10.14257/ijca.2018.11.3.01
51. Xue, H., Wang, Q.: Authentication of the traditional Tibetan medicinal plant Lygodium japon-
icum using MALDI-TOF spectrometry. Planta Med. (2010). https://doi.org/10.1055/s-0030-
1264303
52. Yang, J., Yang, P., Wang, Z., Li, J.: Enhanced secure low-level reader protocol based on session
key update mechanism for RFID in IoT. Int. J. Web Grid Serv. 13, 207 (2017). https://doi.org/
10.1504/ijwgs.2017.083386
53. Parashar, N.: Design development and performance evaluation of low complexity cryptographic
algorithm for security in IOT. Int. J. Res. Appl. Sci. Eng. Technol. 7, 2481–2485 (2019). https://
doi.org/10.22214/ijraset.2019.3454
54. Nkenyereye, L., Jang, J.: Design of IoT gateway based event-driven approach for IoT related
applications. J. Korea Inst. Inf. Commun. Eng. 20, 2119–2124 (2016). https://doi.org/10.6109/
jkiice.2016.20.11.2119
55. Kannan, G., Manoharan, N.: Force multiplier effect of futuristic battlefield preparedness by
adapting the Internet of Things (IoT) concept. Indones. J. Electric. Eng. Comput. Sci. 9, 316
(2018). https://doi.org/10.11591/ijeecs.v9.i2.pp316-321
56. Kaur, G., Dutta, R.: Sentiment mining based on products reviews using machine learning. J.
Adv. Sch. Res. Allied Educ. 15, 185–191 (2018). https://doi.org/10.29070/15/57679
596 A. A. Mawgoud et al.
57. Rekleitis, E., Rizomiliotis, P., Gritzalis, S.: How to protect security and privacy in the IoT: a
policy-based RFID tag management protocol. Secur. Commun. Netw. 7, 2669–2683 (2011).
https://doi.org/10.1002/sec.400
58. Sung, M., Shin, K.: An efficient hardware implementation of lightweight block cipher LEA-
128/192/256 for IoT security applications. J. Korea Inst. Inf. Commun. Eng. 19, 1608–1616
(2015). https://doi.org/10.6109/jkiice.2015.19.7.1608
59. Jogdand, G., Kadam, S., Patil, K., Mate, G.: Iot transaction security. J. Adv. Sch. Res. Allied
Educ. 15, 711–716 (2018). https://doi.org/10.29070/15/57056
60. Yasuda, S., Miyazaki, S.: Fatigue crack detection system based on IoT and statistical analysis.
Proc. CIRP 61, 785–789 (2017). https://doi.org/10.1016/j.procir.2016.11.260
61. Lin, H., Bergmann, N.: IoT privacy and security challenges for smart home environments.
Information 7, 44 (2016). https://doi.org/10.3390/info7030044
62. Punithavathi, P., Geetha, S.: Partial DCT-based cancelable biometric authentication with secu-
rity and privacy preservation for IoT applications. Multimed. Tools Appl. 78, 25487–25514
(2019). https://doi.org/10.1007/s11042-019-7617-1
63. He, T., Jiao, L., Yu, M., et al.: DNA barcoding authentication for the wood of eight endangered
Dalbergia timber species using machine learning approaches. Holzforschung 73, 277–285
(2019). https://doi.org/10.1515/hf-2018-0076
64. Behera, T., Samal, U., Mohapatra, S.: Energy-efficient modified LEACH protocol for IoT
application. IET Wirel. Sens. Syst. 8, 223–228 (2018). https://doi.org/10.1049/iet-wss.2017.
0099
65. Alshowkan, M., Elleithy, K., & AlHassan, H.: LS-LEACH: a new secure and energy effi-
cient routing protocol for wireless sensor networks. In: 2013 IEEE/ACM 17th International
Symposium on Distributed Simulation and Real Time Applications (pp. 215-220). IEEE, (2013)
A Malware Obfuscation AI Technique
to Evade Antivirus Detection in Counter
Forensic Domain
1 Introduction
Counter Forensic is a domain where any developed technique, device or software can
obstruct an investigation in computer science field, there are multiple ways which
can be used to hide specific data through fooling the computer and mostly this can be
done by changing the file header. Encryption is another way of hiding data through
using complex algorithm to make data unreadable, to read the data it is must to
use the encryption. Other ways to manipulate data is by using tools that have the
ability to alter the files’ metadata, the malware success level can be measured by
passing many different antiviruses’ detection layers [1]. In this paper, an experiment
was implemented to illustrate how the malware can evade every detection layer the
antivirus uses, the first protection layer called static signature, it is an algorithm that
keeps looping on the disk file contents; in order to find a pre-defined sequence of
hexadecimal values [2]. There were many previous trials to diverse motivations to
exploit IoT systems. Recently, IoT malwares that were designed for IoT systems
have grown with over thousands of malware amendment. Although the most popular
A. A. Mawgoud (B)
Faculty of Computers and Artificial Intelligence, Information Technology Department, Cairo
University, Giza, Egypt
e-mail: [email protected]
H. M. Rady
Department of Information System, National Telecommunications Regulatory Authority, Smart
Village, Egypt
e-mail: [email protected]
B. S. Tawfik
Computer Networks Department, Faculty of Computers and Information, Suez University, Suez,
Egypt
e-mail: [email protected]
© The Editor(s) (if applicable) and The Author(s), under exclusive license 597
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_27
598 A. A. Mawgoud et al.
motivation was to design botnets that can be used to ease DDoS attacks, some of
those malwares were having a high detection rate and could be easily detected by anti-
malwares due to their low evasion rate [3]. The signature length usually equals bytes.
Another signature type has the ability for calculating the entire file hashes, it uses
MD5, SHA-1 and similar hashing function for calculating the file control sum [4].
The technique which sequels the signature-based scan contains the API anal-
ysis functions which is imported or exported by the software. Mostly, APIs are
used to escape from being detected (i.e. “Sleep Windows API”, “IsDebuggerPresent
Windows API”, “CreateRemoteThread Windows API” and “Portable Executable”),
the static analysis also looks at the header analysis into the code enumeration section
and ask for non- standard names [5].
Khaja et al. [6] have investigated a case study using BIM model through parametric
design tools, they studied more in-depth the metadata manipulation mechanism in
the ‘Portable Executable—PE’ header that has the ability to trick anti-malware tools.
In general, it shows the effectiveness of bytes changing which identify the file type
and trick the tools of ant malware to use a wrong tests set [6].
Tahir [7] has analyzed different evasion rates through using different payloads,
those payloads were created with metasploit/msfvenom and Veil, through his exper-
iment he changed the shell connection destination port and try different encoding
techniques in order to obfuscate the payload [7]. The weakness of this technique is that
by using both products of metasploit/msfvenom and Veil, it will be easily detected by
most of the antiviruses; as anti-malware industry is monitoring them and developing
techniques to detect malware binary, which is being generated by them [8].
Li et al. [9] have combined both antivirus’ frameworks msfvenom and Veil-
Evasion to create a payload that has the ability to bypass the control of any antivirus,
using these tools have a huge advantage, they are having the ability to convert an
existing executable file into obfuscated one [10]. Obfuscated executable has a higher
chance of bypassing the antivirus detection [11]. However, those tools are considered
as off-the-shelf products by information security vendors.
This paper is organized as following: Sect. 2, a study on previous researches for
different detection approaches on malware evasion. Section 3, it states the common
problems of previous researches related to obfuscation methods. Section 4, it repre-
sents the proposed methodology through four main stages for malware obfuscation.
Section 5, the experiment requirements to apply the proposed method by using three
different malware scanning engines, followed by two tables that explain the character-
istics of each sample. Section 6: A conclusion about the general idea of the paper, the
results of the proposed methodology and a comparison with other similar methods.
2 Literature Review
changing its signature with every new execution. Which later led to the appearance of
new techniques (e.g. Code Obfuscation, Metamorphism, and Polymorphism) using
compression [13]. The information security industry took some countermeasures to
face the possible risks of these kind of techniques [14].
Kong et al. [15] have developed a semantic-aware analysis, the semantic aware
inspection is having the ability to detect code manipulation such as renaming the
processor registers or instruction reordering–all common obfuscation techniques.
However, the disadvantage is that this tool can only identify a limited set of obfusca-
tion tricks (i.e. the tool will not detect the manipulation in mathematical operations
as (y = y * 2) will not be recognized in “y = y << 1”) [16].
Goh and Kim [17] have proposed a unique malware approach insight that is
commonly used, it can identify and load, the API’s based on hashes instead of using
the LoadLibrary/GetProcAddress standard [17], the hashing technique is an approach
that hides specific API functions names, this approach facilitates the algorithm obfus-
cation once it compiles into the OllyDbg or Ida Pro debuggers [18]. None of the
debuggers will have the ability to solve or show the API symbolic names called in
the Import Address Table—IAT.
Luckett et al. [19] have proposed in their study the emulator checks environment
attributes—API discrepancy, time difference and Inconsistencies—in CPU instruc-
tions execution mechanism—this was done using an emulator—those attributes may
deduce the information about an antivirus [19].
Kumar [20] has studied a practical coding snippets implementation, it does not
fingerprint antivirus. However, it attacks with an implemented pre-defined window
APIs in the emulator that will return a result inconsequent compared to the executed
results outside the emulator [20]. For example, usually opening an unreal URL will
return ‘true’, but it would return an error in multi-processor function, the main purpose
behind this work is to highlight the gap between the emulator of the sand box and
the APIs implementation of a fully operating system and use these vulnerabilities to
evade the detection [21].
Pektaş and Acarman [22] have proposed new techniques in their research—that
this paper is inspired from-, they have created a simulation of an environment where
the malware is executed when the user interacts with keyboard and mouse [22].
Joo et al. [23] have presented in their research a malware mechanism for virtual
machine/sandbox detection, some certain details illustrate the different attributes that
is found on the registry keys, system devices or commands output that fingerprints
the exact VM/Sandbox [23].
Maestre Vidal et al. [24] their research is similar to the proposed study in [22]
research, they have proposed an enhanced payload analyzer for malware detection
robust against adversarial threats [24].
Although the proposed techniques in the related work are based on dynamic
analysis, they still suffer from high detection rate from the new version of anti
malware/antiviruses. Such obfuscation approaches can represent a huge threat to
cyber security; because of the simplicity level in applying those mechanisms on
existing malicious codes and the lack of effective detection methods. Thus, there is
600 A. A. Mawgoud et al.
a critical need for an effective method that can provide the ability to achieve a high
level of both performance and accuracy in malware detection.
3 Problem Statement
From all the previous studies, the software used in the studied researches do not have
the ability to modify the payload once it is decrypted in the memory; because the
same malware set is being used in their experiments.
Nevertheless, Iwamoto et al. [25] used in their study a mechanism that obscure
the shellcode, anti-dynamic analysis techniques such as mathematical functions for
total compilation time increment. Captivatingly, the 64-bit payloads appeared to be
having a low detection rate [25]. Although, the findings of this study specifically
confirms the outputs of the previous studies, they share the common weakness that
is mentioned before; the off-shelf products create some limited malicious code and
evasion techniques. There is no doubt that the dynamic analysis has advantages over
the static one, but it still has disadvantages [26], the emulators of the sandbox are
not the perfect choice for the operating system; they do not achieve many features
as the operating system does, as Kruegel [27] stated that the majority of sandboxes
made their detection based on system calls. However, it is not sufficient; as these
tools mainly miss a huge amount of possibly relevant behaviors.
The other dynamic analysis challenge is the suspicious behavior that can be
detected by a certified software. There are some restricted detection policies for
the emulator that leads to false positives [28]. On the contrary, other emulators have
resilient rules that leads to false negatives. Machine assistedanalysis is one of the
new trends in the security industry for malware recognition [29].
Mahawer and Nagaraju [1] have proposed a framework to identify the malware
novel classes, this framework uses:
(1) Clustering Techniques.
(2) Automatic Classification.
Their study uses the behavior resemblance for a certain malware [1], but their
method has a weakness related to the used assumption for detecting the malware
execution using the CWSandbox. Although, many different techniques are avail-
able for circumventing the sandbox environment which is deployed by the modern
malware.
Both papers [30, 26] had similar contribution idea to the one presented in [31],
the main idea behind their studies is to use sandboxes in order to decrease the false
positives.
Zatloukal and Znoj [32] have studied the file format of windows PE attributes,
this research analyses a machine learning detecting mechanism for evaluating the
malware evasion success rate [32]. However, their study only targeted the attributes
of PE, the conclusion of their study is only limited to the attributes of static PE header.
A Malware Obfuscation AI Technique to Evade Antivirus … 601
Tomasi et al. [33] have studied both correlation optimized warping and dynamic
time, they have used the Dynamic Type Warping—DTW algorithm to detect the
system call injection attacks [33]. The technique of System Call Injection is used by
malware for confusing the antiviruses through injecting irrelevant system calls [34].
Modern malware has the ability to apply multi-techniques for anti-dynamic anal-
ysis detection techniques [35]. The main idea of our contribution is to propose an
effective malware obfuscation methodology; to provide a high evasion rate towards
anti-malware engines. However, the proposed study has some weakness; as it still is
not sufficient to challenge some of nowadays detection technique standards.
4 Proposed Solution
4.1 Stage 1
The msfvenom generates a reverse TCP shellcode without obfuscating the source
code. The payload generation command is
602 A. A. Mawgoud et al.
4.2 Stage 2
In this stage the shell code already existing from phase one will be modified with
two different methods. The shell’s binary will be obfuscated while keeping its
functionalities, the main source code objective is to preserving all the current.
In order to achieve those objectives, the source code is compiled into binary
for showing the byte-level performance, the binary byte-level shows the assembler
symbolic instruction output into hexadecimal [36]. As an example, global _start will
be represented into hexadecimal in one byte. In addition to syntax modifications, the
code will make some changes to the used algorithms for calculating the Windows
API functions hashes. The algorithms will load every character in the DLL library
name module. Then, the module name will be normalized through a lower cases
conversion into upper cases, after that the character bits will be rotated and print the
calculated value summation for every subsequent character. This will be an iterative
process until reaching the end of the module name.
The API hash can be calculated through a similar algorithm. This original shell-
code uses the hash to get the needed API in the execution phase. Hashing is considered
as a different strategy that uses GetProcessAddress API’s to get the function’s pointer.
After the checksums is being calculated by the original shell code, the shell code
includes them in its body. Then the algorithm checks if the API hash are found by
the search loop, the search loop will keep looping through the modules and the func-
tions sets iteratively. After that, it measures the hashes for each function, the original
algorithm alterations not just only change the shell code syntactic fingerprint, but
also re-compile the original hashes.
A Malware Obfuscation AI Technique to Evade Antivirus … 603
Fig. 2 API hashing algorithm change effect on the calculated checksum values
4.3 Stage 3
There are the algorithms implementations for bypassing AV engines the dynamic
analysis portion. The dynamic analysis detection became a complicated problem; as
the detection is not accurate for its ability to get a high level of predictable output
604 A. A. Mawgoud et al.
Fig. 3 Changing the API hash algorithm calculation causes incorrect original hard-coded values
using the identical rules set. There is no existence for a malware that has a capability
of avoiding the entire dynamic analysis technique; the same applies for the antivirus
capability of detecting all types of malware. In addition to the anti-dynamic analysis
technique that is illustrated through pseudo-code, this paper provides another two
detection techniques to detect the sandboxed environment. The first technique uses
the audio drivers’ detection that is setup on the victim machine. In addition to the
check for any previous audio driver setup. If there is existence for a (Windows Primary
Sound Driver) then the detection algorithm will conclude that there is no existence
for a sandbox environment.
The second technique verifies an existence for a USB connected devices. If the
number of USB connected devices is less than or equal one, then the detection
algorithm will conclude that there is no existence for a sandbox environment.
A Malware Obfuscation AI Technique to Evade Antivirus … 605
Fig. 4 A CMD screenshot shows an example of the process of Audio devices enumeration
This method assumes that there is no implementation for devices enumeration APIs
by any anti-malware tool. If the APIs are implemented, the sandboxes will not detect
any audio device except the windows default one that is usually known as (Primary
Sound Driver). As a result, the sandboxes may delete them in the implementation
stage, the developed method uses DirectSoundEnumerate API that calls the windows
callback function, this callback retrieves the device (description—name—version).
However, the audio detection algorithm keeps searching for any additional audio
driver. If the search result finds another audio driver in addition to the default one,
the audio check algorithm then keeps iterating to proceed with anti-dynamic checks.
Figure 4 illustrates a developed code for testing the enumeration process of audio
device (the output is from Windows 7 Desktop). The changed algorithm of the
reverse_tcp ignores the Windows default audio driver API. Most of recent windows’
versions have a default audio driver installed on them.
Many studies such as [37–39] did not use the method of existing USB devices enumer-
ation. The main purpose behind this check is as same as searching for audio drivers.
The main hypothesis is that there will be no developed APIs for the USB device in the
sandbox. In a case the APIs are developed, there will be no single entry returned. The
USB authenticating method in the main algorithm enumerates all the existing USB
devices. If the mapped devices number bigger than one, then it assumes automati-
cally that it is a normal windows desktop Installation and the anti-dynamic analysis
keeps executing for further checks.
4.4 Stage 4
This stage is inspired by both researches [39, 40]. Both of them have provided a
modified static PE header as an effective method for antivirus evasion. To achieve
this evasion, a small algorithm was developed to import the PE file input then mutates
its attributes (PE fields name, version and data stamp). These values are changeable
606 A. A. Mawgoud et al.
with every execution. The mutation engine changes the input file static signature; the
change happens when the identical PE file is detected again.
Figure 5 illustrates PE header manipulation method process. The mutation
engine of PE attribute is having a mapping file method from the disk to the
memory. This process is done through using CreateFileMapping and sometimes
using MapViewOfFile APIs.
A Portable Execution (PE) is a format of a file that is considered as an identi-
fier for files commonly in windows operating systems such as (exe, png, dll, dbo,
pdf…etc.). The definition of “portable” indicates the formats’ inconstancy in many
different operating systems’ environments. The format of PE in mainly a data struc-
ture contains the important information for the operating system loader for managing
the executablecode. The PE is having a dynamic library references—DLR for linking
API importing and exporting tables and resource attributes. The PE file consists of
many different section, those sections indicate the dynamic library reference to the
file mapping into the memory [41]. The executable figure consists of many sections;
each section needs a specific type of protection. Import Address Table (IAT), is used
to search for a different module attribute in a table; as the application compilation
cannot recognize the libraries location in the memory and a jump is needed in an
indirect way in case pf API call [40].
The UnmapViewOfFile writes these values to a file. As a result, the engines of
the antivirus that depends on the file hashes calculations will not be able to identify
any of the.exe file posterior mutation. The pseudo-code below clarifies the method
that is used for compiling the three steps above.
1. Begin
2. {
3. IS_Debuger_exist:
4. If yes > End Compilation
5. If no > Continue next check
6. IS_big_memory_block_success
7. If no > End Compilation
8. If yes > Continue next check
9. If_Audio_Driver_Check
10. If yes > Continue next check
11. If no > End Compilation
12. IS_USB_Listing_Check
13. If yes > Continue next check
A Malware Obfuscation AI Technique to Evade Antivirus … 607
Fig. 6 A random compilation names for both PE header and data stamp
608 A. A. Mawgoud et al.
5 Experiment
Static analysis observes the malware without executing it, while the dynamic analysis
actually executes the malware in well monitored environment to monitor its behavior.
The experiment process of the malware consists of two phases.
First Phase: is to submit the created samples of the TCP shell to a Kaspersky
2018 build 18.0.0.405. Kaspersky was installed on ten virtual machines (VirtualBox
Platform) with (Windows as an operating system. Each antivirus’ application version
is compatible with the neoteric malware signatures (July 2018).
Second Phase: submitting the samples of TCP shell into two online scan engines,
those online scan engines are:
a. Virustotal.com: The website uses 68 online anti-malware engines.
b. Virusscan.com: The website uses 42 online anti-malware engines.
The main reason for using two different online scan engines is to compare between
the detection rate and the correlate the contradictions in the results, while the reason
for using Kaspersky as a local antivirus on the virtual machines is to identify the
differences of evasion rate between the local antivirus and the online antivirus engines
for each sample.
The samples consist of different phases from the code development process; those
samples are described with their characteristics and generation methods in both
Tables 1 and 2.
Testing dynamic and static analysis together provides a feedback analysis to iden-
tify the malware capabilities through analyzing a sequence of technical indicators
that cannot be achieved through the simple static analysis alone, as the focus in this
paper to make the detection rate of the dynamic analysis harder we will perform some
tests through various antivirus’ engines to measure the performance of our proposed
methodology. The evasion rate presents the AV’s ratio that does not have the ability to
detect each sample from both Tables 1 and 2 divided by AV’s overall number. Figure 7
shows the evasion rate results from every sample that being scanned by the antivirus.
Three anti-malware scanners categories have tested each sample from Table 1:
(1) Kaspersky: The first category represents the installed local antivirus which is
Kaspersky 2018 (build 18.0.0.405) on ten virtual machines.
(2) Virustotal.com: The second category represents the samples’ scan result
through antivirus online engine.
(3) Viruscan.com: The third category represents the samples’ scan result through
antivirus online engine.
610 A. A. Mawgoud et al.
Fig. 7 The evasion rate level results for samples from Tables 1 to 2 using three scanners engines
The experiment was made through virtual machines in private cloud network in
four labs at Suez University:
Lab 1A: 45 PCs
Lab 2A: 40 PCs
Lab 5B: 24 PCs
Lab 6B: 24 PCs.
It is important to clarify that:
• The local antivirus (Kaspersky) has been tested in an isolated environment.
• The experiment using the local antivirus engines lasted for 480 min (8 h).
• Each sample was tested in a separate device (all devices with are having the same
system requirements), the evasion rate is measured in percentage.
From Fig. 7, it shows the samples’ evasion rate with anti-dynamic techniques through
being scanned by three different scanners engines, there is a symmetry in the evasion
ratio results in the three scanners engines; as the samples the had a low evasion rate
result in the local antivirus also had a proportionately evasion rate result in the online
scanning engines and vice versa with the samples that had a high evasion rate result.
The reason behind that is the techniques which are used by both local antiviruses
and online engines.
This study mainly focuses on the malware evasion rate development and propose
a new methodology. However, it does not intent to highlight the differences between
the local antiviruses and the online engines. Kaspersky shows a lower evasion rate
of the samples comparing to the online scanners’ engines. Hence, the accuracy level
of the locally installed antivirus is having more detection capabilities.
Hidost is categorized as SVM a classification model. It is the best margin classifier
which attempts to search for a small data points number, which divides the whole data
A Malware Obfuscation AI Technique to Evade Antivirus … 611
points of two sets with a hyperplane of a dimensional space. With kernel tricks, this
can be extended to suit complex classifications problems using nonlinear classifier.
Radial basis function (RBF) is used in Hidost for mapping data points into endless
dimensional space. In experiment, the data point distance (positive or negative) to
the hyper plane is a result of a prediction output, the positive distance is classified
as a malicious while the negative one is classified as a benign. Precisely, the most
smart evasion technique was successful according to Hidost—for only 0.024% of
malicious samples tested against nonlinear SVM classifier with radial basis function
kernel through the binary embedding.
Our experiment uses Hidost for testing the malware evasion rate, the experiment
took 74 h (around 3 days) to execute. Even though Hidost was mainly designed for
resisting evasion attempts, our method has achieved a rate that reached more than
75% for a lot of samples. We have tested 1749 generated samples (from the samples
in both Tables 1 and 2) in total for 400 seeds (4.4 evasive samples for each seed).
Trace Analysis: Each mutation analysis trace has been analyzed in the same way
for Virustotal. Figure 8 shows both length and success rate for every mutation trace.
Generally, it needs shorter mutation traces to be able to achieve about more that
75% evasion rate in attacking Hidost than it has achieved for Virustotal. We detected
two main differences compared to Virustotal.
Firstly, there is no any increment in trace length for the new mutation traces,
unlike Virustotal where the trace length directly proportional with the trace ID.
Secondly, there is a relation between the trace length and the success rate. The
longer the traces become, the more success in generating evasive variants.
Figure 9 shows the success rate results of the scanned samples from three different
antiviruses’ engines, the success rate in our experiment is measured by the average
rate of the three scanners’ engines results from Fig. 7. It is initially expected that
the modified samples could achieve a noticeable success evasion rate than the pure
shell. However, the success rate of the modified shellcode is higher by levels than
the plain shellcode that is created by msfvenom.
Additionally, the anti-dynamic techniques proved low relevant to the modified
assembler code (i.e. Samples with low evasion rate ‘X2 and 4’ versus samples with
Fig. 8 The mutation traces length and success rate for evading Hidost
612 A. A. Mawgoud et al.
Fig. 9 The average evasion success rates for each sample using three samples against Hidost rates
in our experiment
high evasion rate ‘X9 and X14’). Unexpectedly, the modified syntactic samples that
achieved a high evasion rate such as X2, X6 and X8 proved that the most of the
antiviruses’ engines struggle in recognizing the alteration in code syntax. There is an
equalization with the distributed samples, starting from sample X9 to sample X14.
Those samples are using anti-dynamic technique and show an equalized evasion
rate. Meanwhile, there is no noticeable evasion success in the enumeration of sound
devices, there is no any type of anti-dynamic techniques shows considerable failure
or success over one to another.
Xu et al. [42] in their study, they first classify the detected JavaScript obfuscation
methods. Then they created a statistic analysis on the usage of different obfuscation
methods classifications in actual malicious JavaScript samples. Based on the results,
the have studied the differences between benign obfuscation and malicious obfusca-
tion as well as explaining in-detail the reasons behind the obfuscation methods choice
When we compare the implemented fifteen samples using the proposed methodology
with fifteen implemented data obfuscated samples from [41] as shown in Fig. 10.
Fig. 10 A comparison of the average detection ratio results from the proposed methodology samples
to data obfuscated samples from Xu et al. [42]
A Malware Obfuscation AI Technique to Evade Antivirus … 613
7 Conclusion
The main purpose of this paper is to propose a malware obfuscation methodology with
a high evasion ratio, antidynamic techniques have the ability for evasion increment
rate with some limitations; as audio and USB enumeration have the ability of fulfilling
the same evasion level effect of any antidynamic method. However, the changes in
the algorithm on the assembler level proved to be the most vigorous technique.
In this paper, we have introduced methodology that aids a malware to avoid
anti malware tools, these techniques were mainly developed to avoid detection of a
malware via static analysis technique. On the other hand, the proposed intuitive anti-
virtualization techniques that will avoid analysis under a sandbox. The evasion incre-
ment and decrement level depends on the code obfuscation. It still not known from
our evaluation the reason of this behavior. Even though many developed malware
methods were studied in previous researches, the wellknown antivirus engines keep
showing interests of such techniques to test their engines’ detection power level. The
ineffectual method based on dynamics and static analysis is the common deployed
method.
Finally, the provided technique that is used for developing the malware samples—
armlessly modified programs—proved its efficiency after scanning them on multiple
antivirus engines and the results showed minimum proportion of false positives.
614 A. A. Mawgoud et al.
References
1. Mahawer, D., Nagaraju, A.: Metamorphic malware detection using base malware identification
method. Secur. Commun. Netw. 7(11), 1719–1733 (2013)
2. Nai Fovino, I., Carcano, A., Masera, M., Trombetta, A.: An experimental investigation of
malware attacks on SCADA systems. Int. J. Crit. Infrastruct. Prot. 2(4), 139–145 (2009)
3. Mawgoud, A.A., Taha, M.H.N., Khalifa, N.E..M.: Security Threats of Social Internet of Things
in the Higher Education Environment, pp. 151–171. Springer, Cham (2019)
4. Xu, D., Yu, C.: Automatic discovery of malware signature for anti-virus cloud computing. Adv.
Mater. Res. 846–847, 1640–1643 (2013)
5. Kumar, A., Goyal, S.: Advance dynamic malware analysis using Api hooking. Int. J. Eng.
Comput. Sci. 5(3) (2016)
6. Khaja, M., Seo, J., McArthur, J.: Optimizing BIM Metadata Manipulation Using Parametric
Tools. Procedia Engineering 145, 259–266 (2016)
7. Tahir, R.: A study on malware and malware detection techniques. Int. J. Educ. Manag. Eng.
8(2), 20–30 (2018)
8. El Karadawy, A.I., Mawgoud, A.A., Rady, H.M.: An empirical analysis on load balancing
and service broker techniques using cloud analyst simulator. In: International Conference on
Innovative Trends in Communication and Computer Engineering (ITCE), pp. 27–32. IEEE,
Aswan, Egypt (2020)
9. Li, J., Sun, L., Yan, Q., Li, Z., Srisa-An, W., Ye, H.: Significant permission identification for
machine-learning-based android malware detection. IEEE Trans. Industr. Inf. 14(7), 3216–3225
(2018)
10. Li, Q., Larsen, C., van der Horst, T.: IPv6: a catalyst and evasion tool for botnets and malware
delivery networks. Computer 46(5), 76–82 (2013)
11. Suk, J., Kim, S., Lee, D.: Analysis of virtualization obfuscated executable files and imple-
mentation of automatic analysis tool. J. Korea Inst. Inform. Secur. Cryptol. 23(4), 709–720
(2013)
12. MaHussein, D.M.E.D.M., Taha, M.H., Khalifa, N.E.M.: A blockchain technology evolution
between business process management (BPM) and Internet-of-Things (IoT). Int. J. Advanc.
Comput. Sci. Appl. 9, 442–450 (2018)
13. Malhotra, A., Bajaj, K.: A survey on various malware detection techniques on mobile platform.
Int. J. Comput. Appl. 139(5), 15–20 (2016)
14. Kritzinger, E., Smith, E.: Information security management: an information security retrieval
and awareness model for industry. Comput. Secur. 27(5–6), 224–231 (2008)
15. Kong, D., Tian, D., Pan, Q., Liu, P., Wu, D.: Semantic aware attribution analysis of remote
exploits. Secur. Commun. Netw. 6(7), 818–832 (2013)
16. Khalifa N.M., Taha M.H.N., Saroit, I.A.: A secure energy efficient schema for wireless
multimedia sensor networks. CiiT Int. J. Wirel. Commun. 5(6) (2013)
17. Goh, D., Kim, H.: A study on malware clustering technique using API call sequence and locality
sensitive hashing. J. Korea Inst. Inform. Secur. Cryptol. 27(1), 91–101 (2017)
18. Pandey, S., Agarwal, A.K.: Remainder quotient double hashing technique in closed hashing
search process. In: Proceedings of 2nd International Conference on Advanced Computing and
Software Engineering (ICACSE) (2019, March)
19. Luckett, P., McDonald, J., Glisson, W., Benton, R., Dawson, J., Doyle, B.: Identifying stealth
malware using CPU power consumption and learning algorithms. J. Comput. Secur. 26(5),
589–613 (2018)
20. Kumar, P.: Computer virus prevention & anti-virus strategy. Sahara Arts & Management
Academy Series (2008)
21. Yoshioka, K., Inoue, D., Eto, M., Hoshizawa, Y., Nogawa, H., Nakao, K.: Malware sandbox
analysis for secure observation of vulnerability exploitation. IEICE Trans. Inform. Syst. E92-
D(5), 955–966 (2009)
22. Pektaş, A., Acarman, T.: A dynamic malware analyzer against virtual machine aware malicious
software. Secur. Commun. Netw. 7(12), 2245–2257 (2013)
A Malware Obfuscation AI Technique to Evade Antivirus … 615
23. Joo, J., Shin, I., Kim, M.: Efficient methods to trigger adversarial behaviors from malware
during virtual execution in sandbox. Int. J. Secur. Appl. 9(1), 369–376 (2015)
24. Maestre Vidal, J., Sotelo Monge, M., Monterrubio, S.: EsPADA: enhanced payload analyzer for
malware detection robust against adversarial threats. Fut. Gene. Comput. Syst. 104, 159–173
(2019)
25. Iwamoto, K., Isaki, K.: A method for shellcode extraction from malicious document files using
entropy and emulation. Int. J.Eng. Technol. 8(2), 101–106 (2016)
26. Zakeri, M., Faraji Daneshgar, F., Abbaspour, M.: A static heuristic method to detecting malware
targets. Secur. Commun. Netw. 8(17), 3015–3027 (2015)
27. Kruegel, C.: Full system emulation: achieving successful automated dynamic analysis of
evasive malware. In: Proc. BlackHat USA Security Conference, 1–7 August 2014
28. Zhong, M., Tang, Z., Li, H., Zhang, J.: Detection of suspicious communication behavior of one
program based on method of difference contrast. J. Comput. Appl. 30(1), 210–212 (2010)
29. Mawgoud, A.A.: A survey on ad-hoc cloud computing challenges. In: International Conference
on Innovative Trends in Communication and Computer Engineering (ITCE), pp. 14–19. IEEE
(2020)
30. Eskandari, M., Raesi, H.: Frequent sub-graph mining for intelligent malware detection. Secur.
Commun. Netw. 7(11), 1872–1886 (2014)
31. Barabas, M., Homoliak, I., Drozd, M., Hanacek, P.: Automated malware detection based on
novel network behavioral signatures. Int. J. Eng. Technol. 249–253 (2013)
32. Zatloukal, F., Znoj, J.: Malware detection based on multiple PE headers identification and
optimization for specific types of files. J. Adv. Eng. Comput. 1(2), 153 (2017)
33. Tomasi, G., van den Berg, F., Andersson, C.: Correlation optimized warping and dynamic
time warping as preprocessing methods for chromatographic data. J. Chemom. 18(5), 231–241
(2004)
34. Vinod, P., Viswalakshmi, P.: Empirical evaluation of a system call-based android malware
detector. Arab. J. Sci. Eng. 43(12), 6751–6770 (2017)
35. Saeed, I.A., Selamat, A., Abuagoub, A.M.A.: A survey on malware and malware detection
systems. Int. J. Comput. Appl. 67(16), 25–31 (2013)
36. Mawgoud, A.A., Ali, I.: Statistical insights and fraud techniques for telecommunications sector
in Egypt. In: International Conference on Innovative Trends in Communication and Computer
Engineering (ITCE), pp. 143–150. IEEE (2020)
37. Mawgoud A.A., Taha, M.H.N., El Deen, N., Khalifa N.E.M.: Cyber security risks in MENA
region: threats, challenges and countermeasures. In: International Conference on Advanced
Intelligent Systems and Informatics, pp. 912–921. Springer, Cham (2020)
38. Ismail, I., Marsono, M., Khammas, B., Nor, S.: Incorporating known malware signatures to
classify new malware variants in network traffic. Int. J. Netw. Manag. 25(6), 471–489 (2015)
39. Toddcullumresearch.com: Portable executable file corruption preventing malware from
running—Todd Cullum Research (2019). https://toddcullumresearch.com/2017/07/16/portab
leexecutable-file-corruption/ Accessed 10 June 2019
40. Blackhat.com (2019). https://www.blackhat.com/docs/us-17/thursday/us-17-Anderson-Bot-
VsBot-Evading-Machine-Learning-Malware-Detection-wp.pdf. Accessed 9 Apr 2019
41. Ye, Y., Wang, D., Li, T., Ye, D., Jiang, Q.: An intelligent PE malware detection system based
on association mining. J. Comput. Virol. 4(4), 323–334 (2008)
42. Xu, W., Zhang, F., Zhu, S.: The power of obfuscation techniques in malicious JavaScript code:
a measurement study, 9–16 (2012). https://doi.org/10.1109/malware.2012.6461
An Artificial Intelligence Resiliency
System (ARS)
1 Introduction
© The Editor(s) (if applicable) and The Author(s), under exclusive license 617
to Springer Nature Switzerland AG 2021
A.-E. Hassanien et al. (eds.), Enabling AI Applications in Data Science,
Studies in Computational Intelligence 911,
https://doi.org/10.1007/978-3-030-52067-0_28
618 A. Hussein et al.
featuring multiple levels of security measures throughout different nodes over the
network.
Network administrators handle various applications on the control side to perform
different management tasks such as firewall, monitoring, routing, and others. Most
of these applications have complex interactions among each other, creating difficult
challenge when reasoning about their behaviors. We argue that there is no security
without consistency, such that enforcing security policies to a network without the
ability to verify any misbehavior is a challenge by itself.
Our fundamental motivation was the fascinating tactics of the human immunity
system, that is based on a double line of defense. A first line responsible for the
deflection of unfamiliar attackers at various exterior contact points in the body without
the necessity to identify their type. A second line responsible for the detection of
the attacks that passed through the first line, by distributing watchers throughout the
blood stream, keeping an eye on any abnormal. This system earned its efficiency from
one layer using minimum energy, and a second layer requiring distributed techniques
and more processing.
Another inspiration was our brain functioning as a single controlling unit that,
aside from its millions of functions, controls the consistency of our body through
keeping in touch with all our organs to make sure that they are functioning correctly.
In simple words, our brain handles this task by knowing ahead the correct function
of each organ, then alerts our awareness if any part of the body is malfunctioning
after comparing the intended function and the actual one.
This paper is an extension of our series that started with [1, 2], proposing a
general solution to maintain network resiliency in terms of security and consistency,
and correctness of hybrid networks, while minimizing the processing overhead, in
a software-based network with remote management environment. Our proposal is
a multi-layer AI-based resiliency enhancement system that builds its strength from
both a consistency establishment system, and a hybrid (centralized and distributed)
security technique in terms of accuracy, efficiency, scalability, robustness, and others.
The rest of this paper is organized as follows: Related AI-based SDN solutions are
reviewed in the next section. Section 3 introduces our architecture design and explains
how it would achieve a balance between security, consistency and performance in an
SDN network. Finally, Sect. 4 presents the system analysis and simulation followed
by system testing and results. Finally, we conclude the paper in Sect. 5.
2 Related Work
The authors of [8] worked on combining a fuzzy inference system and both of Rate
Limiting and TRW-CB to create an SDN based information security management
system.
A survey presenting a classification of intrusion detection systems (IDS) was
discussed in [9]. The authors reviewed anomaly detection techniques and the perfor-
mance of machine learning in such domain. They also included a study included a
on the false and true positive alarm rates. In addition to discussing feature selection
and its importance in the classification and training phase of machine learning IDS.
The authors of [3] proposed a DDoS attack detection technique through traffic
flow tracing. The authors relied on different ML algorithms including: K-Nearest
neighbor, Naive Bayes, K-medoids and K-means to classify traffic as normal and
abnormal. The authors discussed how DDoS attacks can be detected using these
techniques to classify incoming requests.
620 A. Hussein et al.
Another attempt to tackle DDoS attacks was using the Self-Organized Maps
(SOM) approach [10–12]. when dealing with unlabeled input vectors, SOM can
perform as a classification technique. The proposed approach was compared against
different methods on the well-known KDD-99 data set.
SOM also participated in the field of intrusion detection as shown in the work of
[10]. The author’s experiments showed that anomalous behavior could be detected
using a single SOM trained on normal behavior. They argued that the ratio of differ-
ence between normal and abnormal packets is greater by an order of magnitude.
Their conclusion showed that the strength of SOM based IDS comes from it not
being told what abnormal behaviors is.
Another proposed method for anomaly detection is MLP with a single hidden
layer neural network [13]. The authors tested their work on the DARPA 1998 dataset
and achieved a DR of 77%. The authors of [14] used selected generic keywords to
detect attack preparations, achieving a DR of 80% using the same dataset.
Another IDS work was the proposal of [15], who presented a 57 gene chromosome,
with each gene representing a single connection feature such as: destination or source
IP address. The authors of [16] achieved a 97.04% on the DARPA dataset for their
IDS work on linear genetic programming (LGP). However, the main disadvantage of
this kind of IDS, as concluded by the authors of [17], is the high resource consumption
involved.
The authors of [18] also presented an IDS based on fuzzy c-means and rough sets
theory, achieving an average of 93.45% accuracy on the KDD’99 dataset.
The authors of [19] proposed an IDS based on KNN with similarity as a quan-
titative measure for distance, achieving a DR of 90.28% on the KDD’99 dataset.
Similarly, the authors of [20] achieved a DR of 91.70% on the DARPA dataset while
testing their implementation of KNN based IDS. The authors of [21] proposed a Naïve
Bayesian based IDS and achieved a 91.52% on the KDD’99 dataset. Similarly, the
authors of [22] achieved a 94.90% on the same dataset.
The authors of [23] discussed the effect of principal component analysis (PCA)
on the work of Decision Tree (DT). The processing time was reduced by a factor
around thirty on the KDD’99 dataset, with a slight decrease in accuracy from 92.60%
till 92.05%.
The authors of [24] proposed dimension reduction through preprocessing using the
rough set theory (RST) before applying SVM for intrusion detection. They achieved
an accuracy of 86.79% on the KDD’99. The authors of [25] concluded that the
combination of SVM and DT techniques results in higher results compared with
each individual technique.
The authors of [26] discussed their Random Forest (RF) based security model,
achieving high DR and concluded with a stable set of important features. The
authors of [27] reached an average DR of 92.58 and 99.86%, on the KDD’99 the
balanced dataset, respectively. The balanced dataset was tested to increase the DR of
minority intrusions through over-sampling the minority classes and down sampling
the majority.
An Artificial Intelligence Resiliency System (ARS) 621
The authors of [28] proposed a deep learning approach based on Self-taught Learning
(STL) to implement a Network IDS. They tested their work on the NSL-KDD dataset.
The authors of [29] proposed a new method that illustrates network traffic as
images using convolutional neural network for malware traffic classification. The
authors argued that applying CNN on images rather than raw data results in higher
classification accuracy.
Deep learning techniques showed much improvements in the DDoS detection
domain. The authors of [30] proposed a deep learning multi-vector-based DDoS
detection system in an SDN environment. They discussed feature reduction derived
from network traffic headers. Their results show an accuracy of 99.75% between
normal and attack classes.
A list of ML-based proposed techniques is presented in Table 1. As shown, AI-
based techniques are being used, offering high results. Thus, the problem remains in
systemizing these techniques, making such solutions inflexible, and vulnerable. The
main issue is that these solutions are focusing on increasing accuracy regardless of
the other feature of a network that may be affected. First, through redirecting traffic
to a fixed point for processing and analysis, such as the Controller, introducing high
traffic overhead and security risks. Second, through redirecting only packet headers,
which decrease the overhead, but still inherits similar weaknesses.
Our vision to enhance network security and resilience, in addition to the influence of
AI-based systems over the network domains, drove us to the combination of AI and
programmable networking such as SDN. Hence, we began researching the diverse
AI techniques alongside an efficient data management system, so as to supply our
system with the required information while maintaining the processing load and
overhead as low as could be expected under the circumstances. Our proposals and
contribution are summarized as follow:
An Artificial Intelligence Resiliency System (ARS) 623
Fig. 2 Architectural
distribution
Our proposed system relies on a hybrid architecture that includes both centralized
processing at the controller level and distributed processing at the nodes level over
the data plane. Figure 2 illustrates a general overview of our system.
1. Edge feature extraction to abstain from diverting traffic to the controller to limit
the security risks. We are considering an edge node being any relevant node in
the network, specified by to network environment and differs from one scenario
to another. The Feature extraction process includes only the relevant features,
624 A. Hussein et al.
which are then forwarded in a vector to the following destination, every time
cycle, tc.
2. The multi-layer detection technique provides consistency verification as a first
measure, followed by anomaly detection and finishing with an attack specifica-
tion approach. The proposed technique allows us to distinguish anomalies and
unknown attacks at a faster rate and lower processing overhead, while preserving
consistency.
3. Feature marking at this stage the extracted features undergo a marking process
at the ingress points. Including the mark as a feature in the training and decision
making allows the detection of slight changes in the global flow or randomness
of the network through triggering a change in the output of the AI system.
Inspired by the brain-awareness human consistency techniques, that is also part of the
total human security system, we are proposing a network consistency establishment
system that relies on the controller’s complete view of the network to check for
any misbehavior in the network nodes. Such feature would increase our confidence
regarding the correctness of the network nodes, before starting an external network
security analysis.
For the controller to check for inconsistencies every time cycle, tc’, it should
be able to know what each node should be doing at every time t. The consistency
module at the controller would enable the controller to save an updated image of
the matching (flow) table of each node. Before any consistency check, the module
should make sure that no updates are being injected at this time. At tc’ the module
would probe the security agent at the node level for a hash of the local matching
(flow) table. This process ensures the security and privacy of this sensitive data not
to be exchanged through the network. At the same tc’, the module would calculate
its own local hash outputs. Since no updates are done at this time, the output of each
network node (actual behavior) should match the output of its corresponding image
at the controller (intended behavior).
Deploying this technique would allow us to identify any compromised or misbe-
having nodes, therefore leading to inefficient and unreliable security analysis at that
point. In case of such events, a specialized module should be alerted in order to
identify the cause of such inconsistencies and take necessary actions. Such solution
is kept for future work in the following issues of this series of articles.
Diverse solutions have been explored preceding work with a fundamental obstacle
regarding the traffic and processing overhead, as a result of port mirroring, packet
An Artificial Intelligence Resiliency System (ARS) 625
cloning, even header extraction. Such techniques have opened a backdoor for
vulnerabilities and security risks, while transferring duplicate data to the controller.
Hence, to avoid duplicated and extra traffic we proposed feature extraction at
the data plane level. At this point, participating input nodes will handle the task of
extracting the required features from incoming packets, forming the feature vector,
and forwarding it, each time cycle tc, to the following destination. In this paper we
are proposing two techniques:
1. Enabling the participating nodes to upload one vector, containing the necessary
extracted features, to the controller each tc.
2. Designing an NN based overlay network over the control plane, where every
node handles a sample processing portion (e.g. the processing load of a single
or multiple neurons). Instead of the vector being uploaded to the controller, it
would be sent to the next hidden layer, and so on, reaching the output layer for
decision making.
The extraction stage and feature marking guarantee little overhead and real-time
processing while examining the traffic flow and analyzing its randomness. Such a
technique process equips the security module with the ability to detect distributed
attacks including DDoS and multi-sequence attacks.
Regarding the first technique the AI processing is handled at the controller side
after receiving the required input vectors. Here we would gain both faster processing
and quick mitigation after the detection stage.
Regarding the second technique, an independent overlay solution, managed by
the controller, acts autonomously starting from the feature extraction reaching the
detection stage. The processing load assigned to each node is minimal, with no
duplicated traffic.
Another unique technique regarding our work is the two-step process security solu-
tion design. We intend to distinguish irregularities inside the system traffic, which
might be activated by a security risk, a system architectural change, or system admin-
istrative change. It has been proven in the literature that anomaly detection needs
fewer features and less processing for it to work. Thus, it is more effective and effi-
cient to depend on anomaly detection to monitor the traffic flow and its randomness.
It is for these reasons that we are proposing to apply anomaly detection as a first line
of defense. In the case of an anomaly the second line is activated, applying an attack
specification process to the flagged features.
Detecting unknown attack is an important feature of such an architecture. The
attack identification model was trained to classify according to both the general
attack class followed by its specific type. Hence, if an anomaly is detected and the
second model was unable to identify the type of this attack, or even if it identified
626 A. Hussein et al.
its general class and was unable to specify the type, then we can assume that we
are dealing with an unknown attack. At this stage, new techniques should be used to
label these features and teach the system of the new attack. Another feature of such
an architecture is shown in the case of new attacks, where there is no need to train the
whole system again. In this case, only the second system is retrained offline, while
the first stage remains online.
D~C2 architecture
2 Extraction & Marking ARS Security
D~C Analysis & Processing & Consistency
Monitoring & Management
Remote Management
Network Domain
Edge Node
ARS Agent
Regular Node
D 2~C architecture
Remote Management
ARS Monitoring & Network Domain
Extraction & Marking
Management Edge Node
Analysis & Processing
Monitoring & Management ARS AI Agent
Regular Node
D 2~C
Fig. 4 D2 ~C architecture
Remote Management
ARS Security & Network Domain
Consistency Syncs
Consistency AI Node
Hash Processing
Monitoring & ARS Agent
Management
3 hidden
Input Layer
Layers
Input Layer
consistency solution work in parallel without interfering with one another to achieve
a more resilient network.
Figure 6 presents an illustration on our distributed NN overlay network that inte-
grates 3 overlay systems each defending an ingress point of the network. Each overlay
consists of 3 layers (input, 1 hidden, output), with each layer distributed over a single
node. The distribution width depends on the network parameters and availability,
which ensures the scalability of such an approach. In the presented example we
chose a single node to handle the processing of each hidden layer of each overlay.
the module should select a set of nodes to act as the edge nodes, and should these
network features change, the module should activate a new set of nodes, if possible.
Another technique is to schedule multiple sets, each working for a specific period
of time. These techniques would allow to minimize the traffic and processing load,
compared with activating a large set of nodes working at the same time, all the time.
The system could always fall back to the full analysis technique when needed or in
case of an attack. On the other hand, regarding the consistency system, the module
would follow a similar technique were the checks are divided into multiple sets of
nodes from all over the network and activate a different set during each time cycle.
The system can also fall back to full system check if needed or if triggered by the
security system.
4 System Discussion
Considering AI-based techniques being part of our solution; the first step involves
finding different sets of efficient ensembles of AI models that would achieve
high accuracy rates under the proposed conditions. Afterwards integrating these
techniques to become part of our proposed system.
Next would be to analyze and implement the consistency establishment module
and providing the ARS agents with ability to securely communicate and cooperate
with the consistency module, in addition to handling the necessary processing
At this point of our work we are considering two datasets. The first being the bench-
mark NSL-KDD dataset, a modified real dataset proposed to solve a number of
the existing problems of the older KDD’99 data set mentioned in [49]. The second
being the Intrusion Detection Evaluation Dataset (CICIDS2017) [50], a more recent
dataset published by the Canadian Institute for Cybersecurity (CIC), providing a
more reliable dataset that covers the variety of recent attacks.
The NSL-KDD dataset consists of around 150,000 records including both normal
and anomaly traffic categorized into 4 attack classes:
1. Denial of Service (Dos).
2. User to Root (U2R): unauthorized access to root privileges.
3. Remote to Local (R2L): unauthorized access using a remote site.
4. Probing (Probe): traffic monitoring, surveillance and other probing such as port
scanning.
5. Normal.
The different attack classes are presented in Table 2, including a set of specific
attack types that fall under those class.
An Artificial Intelligence Resiliency System (ARS) 631
The training and testing simulation took part on different sections of the dataset
as follows:
1. The training subset: 60%.
2. The testing and evaluation subsets: 40%.
Our current tests are also performed on the 41 features included in the dataset and
in the literature for network security.
The CICIDS2017 dataset contains 350,000 records including normal traffic as
well as anomaly traffic classified as attack types rather that attack classes, which
leads to a more accurate result regarding our second security layer that aims towards
identifying specific attack types. The attack types include Brute Force FTP, Brute
Force SSH, DoS, Heartbleed, Web Attack, SQL Injection, Port scanning, Infiltration,
Botnet and DDoS. In addition, the data set offers 80 network features, and traffic
is categorized according to time stamps, which allows us to extract more statistical
features that would enhance the reliability of the system.
Table 3 presents a sample of attack types and the extracted set of features for each
attack detection process.
For the purpose of our consistency system, we have generated our own real traffic
to be able to extract the necessary data from our simulated SDN network represented
in the following figure. The network consists of 9 nodes with 3 border nodes, 3 core
nodes, and 3 edge nodes.
After the first traffic flow have ended and the network have converged, the data
extraction stage starts. The flow table of each node contained around 100 flows. At
the same time t, the controller was keeping a database of each update being inserted
on each node for future checks.
At time tc, the first extraction took place from each node. This process was repeated
for 100 traffic flows with each traffic flow = tc giving a total of 100tc data collection
duration. Hence, the total data collected consists of 100 instances of each node with
each instance containing around 100 flows.
During each traffic flow, we performed multiple consistency-based attacks on
multiple flows on each node in the network. Our data was classified into 6 classes as
shown in Table 4.
632 A. Hussein et al.
Table 3 (continued)
Attack type Features
Infiltration Subflow F.Bytes
Total Len F.Packets
Flow Duration
Active Mean
PortScan Init Win F.Bytes
B.Packets/s
PSH Flag Count
DDoS B.Packet Len Std
Avg Packet Size
Flow Duration
Flow IAT Std
In general, the training algorithm for a BP neural network consists of two part. First
the outputs of each layer are calculated successively by the end of the forwarding
phase. Second weights for all connections are recalculated in order minimize the
backward error generated from the test using the training subset.
For our BP network training we considered the resilient backpropagation
approach (RPROP) [51]. The RPROP converges faster than the gradient-descent
[52] algorithm. For our scheme, the objective function for learning mode is:
1 2
E= y p,q − d p,q
2 p q
The gradient-descent approach depends directly on the learning rate η and the
size of partial derivative dE/dW to determine the size of the weight update. While
the RPROB approach depends only on the derivative to determine the direction of
weight update, ω. This creates an update-value ij , directed by the following rule:
⎧ + ∂ E(t) ∂ E(t−1)
⎨ min n · i j , max , if ∂ωi j (t) ·
⎪ ∂ωi j (t−1)
>0
∂ E(t) ∂ E(t−1)
i j (t) = max n − · i j , min , if ∂ω · <0
⎪
⎩ i j (t) ∂ωi j (t−1)
i j (t − 1), else
0, else
While if ∂E/∂ωij undergoes a sign change, the backtracking [51] method would
be activated. Then the update of ω follows the rule:
Also, set ∂E(t)/∂ωij(t) = 0, meaning that if the algorithm took a large missing a
minimal error value, it will be able to return to the previous location.
Finally, the weights are corrected under the following rule:
the color itself. A RED image implies that the altered flow belongs to the first class
of features (constant class), while the GREEN color implies that the altered flow
belongs to the second class (time class), final the BLUE color represents the third
class (statistical class).
Once an inconsistency has been detected, a second layer in the system will be
triggered. This layer will take as input the vector of XOR-ed data that was responsible
for triggering the inconsistency in order to specify the class of attack that may have
caused this issue. Furthermore, knowing the attack class and the color of the image
of this specific flow allows us to identify the features that were modified within that
corresponding flow. The consistency system is presented in Fig. 7.
Our human brain has a distinguished technique of learning new attacks (viruses
and others) through learning their symptoms. These attacks, that could affect our
body on their first attempt, would be recognized by our immunity system in later
attacks. Thus, it could be treated or mitigated according to similar symptoms in the
future. Inspired for this technique and benefiting from what AI has to offer in the area
on known and unknown attacks, we have modified our system to detect unknown
consistency attacks. A third layer was implemented with the purpose of handling
attacks that were flagged by the first layer but failed to be recognized by the second.
The third layer would detect such an attempt an unknown new attack and integrate
it with the set of altered featured to be taught, offline, to our AI system. The new
weights are integrated and updated to the current system at a specific time chosen by
the administrator as a network idle time (Fig. 8).
5 System Simulation
The setup consisted of a Linux Intel Core i7 CPU (8 cores) (Ubuntu 16.04) server
with Python3 installed. All tests were performed on both balanced and unbalanced
data, in order to undergo the train process over a more realistic traffic environment.
Hence, we would be able to study the response of each technique under different
network conditions. Principle Component Analysis (PCA) was performed for feature
reduction in order to reduce the entire set of features into 4 input features for the
different techniques.
The next step on our list was the edge node feature extraction. We assigned
an agent residing next to each edge node. The agent was programmed to analyze
specific headers of each incoming packet and extract certain features. The extraction
was limited to only the participating interfaces. The agents are also responsible for
relating the extracted features to an individual connection in a statistical manner. The
future goal is to reach an efficient statistically based set of features that is resilient
against attacker’s interventions and manipulations. This technique helps our system
to focus on the general functionality and randomness of the network rather than
on a specific entry point. Overall, allowing better detection of unknown distributed
behavior.
An Artificial Intelligence Resiliency System (ARS)
At the end, all agents would form a vector of extracted features and send it securely
to the ARS module at the controller. The feature marking techniques allows a single
server to handle and track the work of multiple clients. If any further processing is
required on the uploaded set of features, it would be handled by the ARS module
before preparing the features for the AI model.
Another phase, working in parallel, is the consistency check, where we imple-
mented a second client for each agent with the main purpose of waiting for a command
from the controller that includes the time cycle tc’ + x with (x < tc’); x being a
period that varies according to the network condition, such that for the controller to
ensure that no updates will be injected at this moment, which helps to sync the time
between the ARS agent and controller. The controller could send a new set of tc’
and x according to the update frequency of the network. Next, the client will handle
the output checks to be forwarded to the controller.
The following section presents a comparison between the different techniques tested
for both stages on both datasets, using balanced and unbalanced data, BD and UBD,
respectively. The Results in Tables 5 and 6 show that random forest achieves better
results as compared to the other techniques tested at this stage, followed by the neural
network-based technique. Figures 9 and 10 represent the random forest test confusion
matrix (CM) for both unbalanced (UBD) and balanced (BD) data, respectively. We
can notice an increase in the precision of the DNN from the first dataset to the
second. This reflects the strength of deep AI techniques when handling different
attack types and as the dataset increases in size. Note that although the Random
forest technique showed the higher precision, yet the DNN showed faster processing
with competitive precision. This sheds light on part of our future work were our
system should be equipped with an algorithm aimed for chosen the best ensemble
An Artificial Intelligence Resiliency System (ARS) 639
Fig. 10 RF CM for BD
of AI techniques for the first and second security layers for a prestored database
according to different network parameters and conditions.
Another test scenario that was done on the full system till this point aimed to
simulate both the security and consistency systems working together. The simula-
tion was based on Mininet [53], an SDN network emulator, and mini-edit [54], an
extension to Mininet for graphical network topologies, were we constructed a fully
connected tree network topology that consists of 20 OVS switches and 14 virtual
hosts. We manipulated the OVS code in order to connect two physical servers on two
different OVS switches as shown in Fig. 11.
The purpose of this test was to lunch different attacks at the same time from each
external physical server on the SDN controller. For this task we assigned the edge
nodes being the directly connected OVS switches as shown in the figure.
The tests were based on the D~C2 architecture, were the edge nodes are responsible
for extracting the specified features and uploading them to the controller during each
time cycle. Since the network converged and no frequent updates were needed, we
activated the consistency client within the same time cycle as the security client such
that the consistency client would include the hash output in the same vector sent each
time cycle to the controller.
The tests were done on both the RF technique for the anomaly detection layer,
and DNN for the attack identification process. Both systems were trained by the
CICIDS2017 dataset. After time t 0 we initiated both a DoS attack from server-1 and
port scanning on the controller from server-2. It took around 6 s from t 0 for the first
layer to detect a change in the network traffic; at this point the second layer was
activated due to the first alert, and the abnormal traffic were injected in the second
system. It took around 3 s from t 0 + 6 for the second layer to identify both attacks.
Even though the second layer requires more processing time, but as the results show,
it took less time. This is due to the double layered technique since we are injecting
An Artificial Intelligence Resiliency System (ARS) 641
the second system with a subset of the traffic that were flagged by the first system.
Therefore, resulting in a faster and more efficient detection process.
We added an extra security feature to our system: during an attack, the security
module is able to block the attack source from the directly connected edge node. This
process was possible by injecting a new rule on the edge nodes forcing them to block
all traffic matching the attacker’s source-IP. Through monitoring the attacker’s traffic
on the egress ports of the edge node, we can see the specific traffic being blocked as
illustrated in Fig. 12.
Figure 11 shows how the DoS attack intensity (packet numbers) increases expo-
nentially throughout the attack period while the port scanning traffic is constant. The
attack started at t = 0 s and at t = 6 s, the anomaly detection flagged the incoming
traffic before the attack identification system detected the attack type at t = 9 s and
injected the attack mitigation rule to block the attack before t = 10 s.
After we have tested the centralized mode of our work, we started our tests on the
distributed NN overlay security layer. We selected the Neural Network AI technique
for this stage due to its high accuracy results in the anomaly detection tests discussed
earlier. Also due to its architecture that made it possible to consider such a new
distribution technique.
The security module in this case with run an algorithm to select the best suitable
candidates to participate in the distribution process. The algorithm can query the
network for the number of physical connections n of each node, and distance from
edge node d. If the network is already running, the security module would query the
ARS agent for the CPU load percentage (e.g. iostat -c). The other parameter should
be inserted by the administrator, which are the processing capability PC = CPU
Speed (Mhz) + Number of CPUs + Ram (Ghz). Finally, PC = (PC x (1−load))/n.
The nodes with the highest parameters would be chosen as candidates. Furthermore,
we would refer to d, for nodes with equal PC, such that node with shorter d will have
a higher probability to be chosen. The edge nodes are excluded from this procedure
since they are included by default. The number of participating nodes from the chosen
candidates depends on the number of NN to be overlaid, which also depends on the
number of input points on the network. Another feature was implemented was a
database for manual selection by the administrator, which would override this stage.
After the participating nodes are set, the ARS agent is enabled to play its new role.
The controller would send a packet to each ARS agent to let it know which part of its
NN script to enable (hidden or output), along with other parameters (number of layers,
number of neurons in each layer, and the number of NN network to participate with).
After this stage, each edge agent with be informed of the ID of the participating
internal agents in its own NN. Each edge agent with then send a packet to the
corresponding internal agents to be linked as their next hop in the AI processing
phase. At this point all the NN networks are set, each protecting a specific entry
point of the network.
We tested the distributed system for abnormal traffic using the same test done on
the first layer of the centralized security mode. The test was done on the network of
Fig. 8.1, with 2 edge nodes. To protect the 2 ingress points we create 2 NN overlays.
The 2 edge nodes played the role of the input layers and (s5, s8) along with (s18, s19)
played the 2 hidden layers. while s6 and s17 where the output layers. The same NN
architecture was tested for the centralized test. After the overlay was set, the weights
were injected in the network. The weights are calculated after training the same NN
network offline using the 60% of the CICIDS2017 dataset. We then injected the
remaining 40% of the dataset as traffic in the network from both external servers.
The overall results of the 2 NNs was around 94.8% accuracy. The same latency test
An Artificial Intelligence Resiliency System (ARS) 643
was done as the centralized mode, were a DDoS attack was launched from one server
into s7 and a port scan attack was launched into s20. The attack started at t = 0 s
and at t = 4 s the alarm was triggered detecting an abnormal traffic. The detection
latency will vary between the centralized and distributed security layer depending
on the network itself and the node capabilities. The higher the capabilities the more
a node can handle hidden layer, thus less processing time. On the other hand, the
network congestion and the distance between the controller and the edge nodes play
a role in the latency of the centralized mode. Another test was to estimate the CPU
processing overhead introduced by the distributed NN on each participating node.
The node working on a hidden layer showed an 11% increase in CPU load, each
time a processing phase took place; while each output layer nodes showed a 3%
increase.
This section discusses all the necessary simulations and results that provided our
consistency system with the proof of concept and effectiveness. The following
sections includes our data collection phase, which presents the flow-based attack
classes and data structure. Followed by the data preprocessing phase, which also
discusses the first consistency verification layer till the point where our system is
able to flag any inconsistent flow.
Next, we present the classification results, which include the second layer of the
system where the AI module is able to identify the class for each inconsistency flagged
by the first layer. Also, a third consistency layer is discussed, which is responsible
for detecting unknown consistency attacks in case an attack is triggered and not
classified. The first simulations where done on our dataset, followed by real time
consistency check test scenarios presented in the following sections.
After the data preprocessing stage, we obtain our labeled data of the XOR-ed
output between the nodes and controller. The obtained 16 * 16 * 3 vectors are passed
to the classifier.
In our test, we compared Different deep learning architectures: LeNet5, AlexNet,
ConvNet, GoogleNet, ResNet, RNN, and DNN. Cross-validation was applied with
4 folds. At each fold, the different architectures were trained with 50 epochs and 50
as batch size. Moreover, the data was split into 60% for training, 20% for validation,
and 20% for testing. Moreover, at each fold, the following performance measures
were recorded and at the end, the average over the 4 folds was computed.
The comparison results, presented in Table 6, show that the ConvNet architecture
gives the best results with 99.39% as accuracy, precision, recall and f1-score. In
fact, ConvNet was able to differentiate between different attack classes even though
the images resulting for the same attack, may be caused by altering features from
different classes and thus, resulting in images of different colors. As such, ConvNet
has shown its ability to recognize the patterns contained in the XOR-ed data of each
type of attack (Table 7).
644 A. Hussein et al.
Another real-time consistency test was done on our system. The test network
environment was the same of Fig. 10. The purpose of this test was to lunch different
configuration-based attacks from the two external servers on different switches in
the network. At this point, our AI system is fully trained offline as discussed earlier.
After we attacked the flow table of 4 switches (s5, s12, s15, s17) in the network (12
flows in each switch), we wait for tc = 10 s, as programmed, for the consistency
module to start the next check. The altered 12 flows in each switch are based on
multiples flow-based attacks belonging to different attack classes. We attacked each
switch with two types of classes as shown in Table 8.
The test was extended to include an unknown attack through altering a random
set of features. This attack was done on S4. This set was not taught or included in
any of the previously mentioned attacks. Such a test would show us the precious and
effectiveness of the third consistency check layer.
We have chosen the best three deep learning AI techniques to be tested for this
second scenario. The techniques are CNN, DNN, RNN.
The test started with feature extraction, followed by the first layer of consistency
verification after the first vector of hashed features was upload to the controller at
tc = 10 s. After the first layer, the system was able to detect the inconsistencies found
in the targeted 4 switches while all other 16 switches returned a consistent result.
• In centralized mode: we are introducing a single extra packet every time tc, shared
by for security and consistency related data.
• In distributed mode: we are introducing an extra 26 packets every tc, to provide
our network with both security and consistency. The packets are distributed as
follow: 20 packets (a single packet every tc from each switch to the controller for
consistency measures), and 6 packets every tc forwarder from each layer to the
next, in our two distributed (3 layered) NN overlays.
We used our test network environment to further optimize both the DDN layer for
attack specification, and CNN layer for consistency class specification. The systems
were previous trained using the CICIDS2017 and the consistency dataset respectively.
In order to optimize the AI system, we needed to optimize the dropout technique to
best fit our data first. This is done through test repetitions to choose the best dropout
probability p. Another parameter was the number of repetitions r that should be done
on each input to set a good balance between the processing overhead and confidence
interval convergence (the difference between the probabilities of the top classes). For
our data and network, the tests showed an optimal p = 0.375. In order to test the
efficiency of the optimization phase we created a new security attack with similar
features to both a DoS attack and port scanning attack. Also, we created a new
consistency attack with similar features to both a deadlock and DoU attacks.
After testing these attacks while enabling the optimization system, we noticed a
security confidence interval of {53, 0, 1, 46, 0} and a consistency interval of {0, 1,
52, 0, 47}. We need to mention that the AI system classified the attacks as DoS and
Deadlocks attack. This test can prove that AI system would predict and output even
with low confidence, such as this case, which will provide an incorrect result. Such
false classification would occur regardless of the very high accuracy result that was
obtained during the training phase.
Regarding the probabilities in the confidence interval of the security test, they
varied between 15% and converged at 7%, after 13 repetitions. While during the
consistency test the probabilities varied between 12 and 5%, after 16 repetitions.
These percentages control the threshold decision that will be set to check each confi-
dence interval for decision confidence. Such that, if the probabilities of two classes
are less than a threshold then a warning will be set. In the cases of the normal class
being one of these classes then an alert will be triggered.
This module enhanced our systems abilities to detect unknown attacks though
retraining the system with any features that results in low confidence measure. These
new features that may belong to a new attack or a new legal behavior in the network,
would be given a new ID and require administrative intervention in the future, for a
more accurate class name.
An Artificial Intelligence Resiliency System (ARS) 647
6 Conclusion
This paper we presented a full system analysis including the general architectural
design. We discussed modifying and managing AI techniques to design a general
system solution to protect our network against different types of attacks based on a
combination of both centralized and distributed capabilities with the least possible
overhead. Our goal is to deploy a state-of-the-art adaptive security system over a
consistency verified network for better network resiliency.
We proposed and implemented a virtual ANN network overlay as a first layer of
security. It is from the simple and parallel computational capabilities of neurons in
an ANN, that it is possible to distribute the processing of a traditional ANN over
the network. The ARS agents played role of different layers in the neural network.
Every node/agent contributes a free part of its storage and computational capacities
to virtualize one or more NN layer. These agents will connect to each other using
logical links to exchange data and results; hence, leading to an ANN-based overlay
network. Such a design minimized traffic overhead through enabling independent
processing, achieving real-time detection. The second security layer consists of an
ensemble of AI techniques centralized at the controller with higher processing power
for attack specification and mitigation.
We presented a new AI-based consistency verification system integrated with
our general ARS architectural design. We discussed how adopting distributing data
extraction techniques can provide the necessary information to any AI system while
keeping a low traffic overhead and preserving privacy.
The consistency solution provides a double layer consistency verification system
inspired by the human brain-immunity cooperation system. The first layer works
on comparing a hashed version of specific extracted features from the flow table of
each network node with its corresponding image at the controller. The comparison
is based on a simple XOR between the two hash vectors. We equipped our system
with a graphical representation of the consistency results.
In case of any inconsistency in any node, a second AI-based layer is triggered
to identify the attack class that triggered these inconsistencies on all the flagged
switches. Our results show that by adopting a double layer technique, we can
perform faster checks, and in-depth classification only when necessary, minimizing
processing time and overhead.
At the end of our research we have proof of concept of our work after presenting our
ARS system through both implementation and test results. We have proven that with
both multiple layers of security and consistency constructed in efficient techniques,
we can reach a well resilient network. Our work has provided network consistency
with real-time protection, while preserving privacy and minimizing traffic overhead.
Acknowledgements This research was funded by TELUS Corp., Canada, AUB University
Research Board and Lebanese National Council for Scientific Research, Lebanon.
648 A. Hussein et al.
References
1. Hussein, A., et al.: Machine learning for network resilience: the start of a journey. In: IEEE,
2018 Fifth International Conference on Software Defined Systems (SDS), Barcelona, Spain
(2018)
2. Hussein, A. et al.: SDN security plane: an architecture for resilient security services. In: 2016
IEEE International Conference on Cloud Engineering Workshop (IC2EW), Berlin, Germany
(2016)
3. Barki, L., et al.: Detection of distributed denial of service attacks in software defined
networks. In: 2016 International Conference on Advances in Computing, Communications
and Informatics (ICACCI), pp. 2576–2581 (2016)
4. Akhunzada, A., et al.: Securing software defined networks: taxonomy, requirements, and open
issues. IEEE Commun. Mag. 53(4), 36–44 (2015)
5. Bai, H.: A Survey on Artificial Intelligence for Network Routing Problems. University of New
Mexico, Albuquerque, NM (2007)
6. Mustafa, U., et al.: Firewall performance optimization using data mining techniques. In: Wire-
less Communications and Mobile Computing Conference (IWCMC), 2013 9th International,
pp. 934–940 (2013)
7. Mukherjee, D., Acharyya, S.: Ant colony optimization technique applied in network routing
problem. Int. J. Comput. Appl. 1(15), 66–73 (2010)
8. Dotcenko, S., Vladyko, A., Letenko, I.: A fuzzy logic-based information security manage-
ment for software-defined networks. In: 2014 16th International Conference on Advanced
Communication Technology (ICACT), pp. 167–171 (2014)
9. Hodo, E., et al.: Shallow and deep networks intrusion detection system: a taxonomy and survey.
arXiv Preprint arXiv:1701.02145 (2017)
10. Rhodes, B.C., Mahaffey, J.A., Cannady, J.D.: Multiple self-organizing maps for intrusion detec-
tion. In: Proceedings of the 23rd National Information Systems Security Conference, pp. 16–19
(2000)
11. Braga, R., Mota, E., Passito, A.: Lightweight DDoS flooding attack detection using
NOX/OpenFlow. In 2010 IEEE 35th Conference on Local Computer Networks (LCN),
pp. 408–415 (2010)
12. Yan, Q., et al.: Software-defined networking (SDN) and distributed denial of service (DDoS)
attacks in cloud computing environments: a survey, some research issues, and challenges. IEEE
Commun. Surv. Tutor. 18(1), 602–622 (2016)
13. Ghosh, A.K., Schwartzbard, A.: A study in using neural networks for anomaly and misuse
detection. In: USENIX Security Symposium, p. 12 (1999)
14. Lippmann, R.P., Cunningham, R.K.: Improving intrusion detection performance using keyword
selection and neural networks. Comput. Netw. 34(4), 597–603 (2000)
15. Li, W.: Using genetic algorithm for network intrusion detection. In: Proceedings of the United
States Department of Energy Cyber Security Group, vol. 1, pp. 1–8 (2004)
16. Mukkamala, S., Sung, A.H., Abraham, A.: Modeling intrusion detection systems using linear
genetic programming approach. In: International Conference on Industrial, Engineering and
Other Applications of Applied Intelligent Systems, pp. 633–642 (2004)
17. Garcia-Teodoro, P., et al.: Anomaly-based network intrusion detection: techniques, systems
and challenges. Comput. Secur. 28(1), 18–28 (2009)
18. Chimphlee, W., et al.: Anomaly-based intrusion detection using fuzzy rough clustering. In:
International Conference on Hybrid Information Technology, 2006. ICHIT’06, pp. 329–334
(2006)
19. Ma, Z., Kaban, A.: K-nearest-neighbours with a novel similarity measure for intrusion detec-
tion. In: 2013 13th UK Workshop on Computational Intelligence (UKCI), pp. 266–271
(2013)
20. Liao, Y., Vemuri, V.R.: Use of k-nearest neighbor classifier for intrusion detection. Comput.
Secur. 21(5), 439–448 (2002)
An Artificial Intelligence Resiliency System (ARS) 649
21. Amor, N.B., Benferhat, S., Elouedi, Z.: Naive bayesian networks in intrusion detection systems.
In: Proceedings Workshop on Probabilistic Graphical Models for Classification, 14th European
Conference on Machine Learning (ECML) and the 7th European Conference on Principles and
Practice of Knowledge Discovery in Databases (PKDD), in Cavtat–Dubrovnik, Croatia, 23rd
September, p. 11 (2003)
22. Panda, M., Patra, M.R.: Network intrusion detection using naive bayes. Int. J. Comput. Sci.
Netw. Secur. 7(12), 258–263 (2007)
23. Bouzida, Y., et al.: Efficient intrusion detection using principal component analysis. In: 3éme
Conférence Sur La Sécurité Et Architectures Réseaux (SAR), La Londe, France, pp. 381–395
(2004)
24. Chen, R., et al.: Using rough set and support vector machine for network intrusion detection
system. In: First Asian Conference on Intelligent Information and Database Systems. ACIIDS
2009, pp. 465–470 (2009)
25. Mulay, S.A., Devale, P., Garje, G.: Intrusion detection system using support vector machine
and decision tree. Int. J. Comput. Appl. 3(3), 40–43 (2010)
26. Kim, D.S., Lee, S.M., Park, J.S.: Building lightweight intrusion detection system based on
Random Forest. In: International Symposium on Neural Networks, pp. 224–230 (2006)
27. Zhang, J., Zulkernine, M.: Network intrusion detection using random forests. Pst (2005)
28. Javaid, A., et al.: A deep learning approach for network intrusion detection system. In: Proceed-
ings of the 9th EAI International Conference on Bio-Inspired Information and Communications
Technologies (Formerly BIONETICS), pp. 21–26 (2016)
29. Aminantoa, M.E., Kimb, K.: Deep learning in intrusion detection system: an overview
30. Niyaz, Q., Sun, W., Javaid, A.Y.: A deep learning based DDOS detection system in software-
defined networking (SDN). arXiv Preprint arXiv:1611.07400 (2016)
31. Gupta, S.: An effective model for anomaly IDS to improve the efficiency. In: 2015 International
Conference on Green Computing and Internet of Things (ICGCIoT), pp. 190–194 (2015)
32. Jongsuebsuk, P., Wattanapongsakorn, N., Charnsripinyo, C.: Network intrusion detection with
fuzzy genetic algorithm for unknown attacks. In: 2013 International Conference on Information
Networking (ICOIN), pp. 1–5 (2013)
33. Senthilnayaki, B., Venkatalakshmi, K., Kannan, A.: Intrusion detection using optimal genetic
feature selection and SVM based classifier. In: 2015 3rd International Conference on Signal
Processing, Communication and Networking (ICSCN), pp. 1–4 (2015)
34. Masduki, B.W., et al.: Study on implementation of machine learning methods combina-
tion for improving attacks detection accuracy on intrusion detection system (IDS). In: 2015
International Conference on Quality in Research (QiR), pp. 56–64 (2015)
35. Enache, A., Sgârciu, V.: Anomaly intrusions detection based on support vector machines with
an improved bat algorithm. In: 2015 20th International Conference on Control Systems and
Computer Science (CSCS), pp. 317–321 (2015)
36. Akbar, S., et al.: Improving network security using machine learning techniques. In: 2012 IEEE
International Conference on Computational Intelligence & Computing Research (ICCIC),
pp. 1–5 (2012)
37. Aziz, A.S.A, et al.: Multi-layer hybrid machine learning techniques for anomalies detection and
classification approach. In: 2013 13th International Conference on Hybrid Intelligent Systems
(HIS), pp. 215–220 (2013)
38. Lin, W., Ke, S., Tsai, C.: CANN: an intrusion detection system based on combining cluster
centers and nearest neighbors. Knowl. Based Syst. 78, 13–21 (2015)
39. Aburomman, A.A., Reaz, M.B.I.: A novel SVM-kNN-PSO ensemble method for intrusion
detection system. Appl. Soft Comput. 38, 360–372 (2016)
40. Horng, S., et al.: A novel intrusion detection system based on hierarchical clustering and support
vector machines. Expert Syst. Appl. 38(1), 306–313 (2011)
41. Hodo, E., et al.: Threat analysis of IOT networks using artificial neural network intrusion detec-
tion system. In: 2016 International Symposium on Networks, Computers and Communications
(ISNCC), pp. 1–6 (2016)
650 A. Hussein et al.
42. Sakic, E., et al.: Towards adaptive state consistency in distributed SDN control plane. In 2017
IEEE International Conference on Communications (ICC), Paris, France (2017)
43. Liu, W., et al.: Inter-flow consistency: a novel SDN update abstraction for supporting inter-
flow constraints. In: 2015 IEEE Conference on Communications and Network Security (CNS),
Florence, Italy (2015)
44. Hussein, A., et al.: SDN verification plane for consistency establishment. In: The Twenty-First
IEEE Symposium on Computers and Communications, Messina, Italy (2016)
45. Aslan, M., Matrawy, A.: Adaptive consistency for distributed SDN controllers. In: IEEE,
2016 17th International Telecommunications Network Strategy and Planning Symposium
(Networks), Montreal, QC, Canada (2016)
46. Khurshid, A., et al.: Veriflow: Verifying network-wide invariants in real time. ACM SIGCOMM
Comput. Commun. Rev. 42(4), 467–472 (2012)
47. Ball, T., et al.: Vericon: towards verifying controller programs in software-defined networks.
In ACM SIGPLAN Notices, pp. 282–293 (2014)
48. Skowyra, R., et al.: A verification platform for sdn-enabled applications. In: 2014 IEEE
International Conference on Cloud Engineering (IC2E), pp. 337–342 (2014)
49. Tavallaee, M., et al.: A detailed analysis of the KDD CUP 99 data set. In: IEEE Symposium
on Computational Intelligence for Security and Defense Applications. CISDA 2009, pp. 1–6
(2009)
50. Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A.: Toward generating a new intrusion detection
dataset and intrusion traffic characterization. In: Th International Conference on Information
Systems Security and Privacy (ICISSP), Purtogal (2018)
51. Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the
RPROP algorithm. In: IEEE International Conference on Neural Networks, pp. 586–591 (1993)
52. Xu, D., et al.: Convergence of gradient descent algorithm for a recurrent neuron. In: Advances
in Neural Networks–ISNN 2007, pp. 117–122 (2007)
53. Mininet: Available: http://mininet.org/
54. MiniEdit. Available: https://github.com/mininet/mininet/wiki/. Mininet-Apps