0% found this document useful (0 votes)
7 views

5ANFIS and Deep Learning based missing sensor data prediction in IoT

This research article presents novel prediction models based on Adaptive-Network based Fuzzy Inference System (ANFIS) and Deep Learning (DL) to address the issue of missing sensor data in the Internet of Things (IoT) ecosystem. The authors optimize the parameters of both ANFIS and Long Short Term Memory (LSTM) networks and validate their performance using the Intel Berkeley Lab dataset. Experimental results indicate that the proposed models significantly improve prediction accuracy for missing sensor data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

5ANFIS and Deep Learning based missing sensor data prediction in IoT

This research article presents novel prediction models based on Adaptive-Network based Fuzzy Inference System (ANFIS) and Deep Learning (DL) to address the issue of missing sensor data in the Internet of Things (IoT) ecosystem. The authors optimize the parameters of both ANFIS and Long Short Term Memory (LSTM) networks and validate their performance using the Intel Berkeley Lab dataset. Experimental results indicate that the proposed models significantly improve prediction accuracy for missing sensor data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Received: 1 February 2019 Revised: 27 April 2019 Accepted: 18 May 2019

DOI: 10.1002/cpe.5400

RESEARCH ARTICLE

ANFIS and Deep Learning based missing sensor data


prediction in IoT

Metehan Guzel1 Ibrahim Kok2 Diyar Akay3 Suat Ozdemir4

1 Department of Computer Engineering,

Graduate School of Natural and Applied Summary


Sciences, Gazi University, Ankara, Turkey
2 Department of Computer Sciences,
Internet of Things (IoT) consists of billions of devices that generate big data which is characterized
Informatics Institute, Gazi University, by the large volume, velocity, and heterogeneity. In the heterogeneous IoT ecosystem, it is not
Ankara, Turkey so surprising that these sensor-generated data are considered to be noisy, uncertain, erroneous,
3 Department of Industrial Engineering, Faculty
and missing due to the lack of battery power, communication errors, and malfunctioning devices.
of Engineering, Gazi University, Ankara, Turkey
4 Department of Computer Engineering, This paper presents Deep Learning (DL) and Adaptive-Network based Fuzzy Inference System
Faculty of Engineering, Gazi University, (ANFIS) based prediction models for missing sensor data problem in IoT ecosystem. First, we
Ankara, Turkey build ANFIS based models and optimize their parameters. Then, we construct DL based models
by using Long Short Term Memory (LSTM) network structure and optimize its parameters by
Correspondence
Suat Ozdemir, Department of Computer applying the grid search method. Finally, we evaluate all the proposed models with Intel Berkeley
Engineering, Faculty of Engineering, Lab dataset. Experimental results demonstrate that the proposed models can significantly
Gazi University, 06570 Ankara, Turkey.
improve the prediction accuracy and may be promising for missing sensor data prediction.
Email: [email protected]

Funding information KEYWORDS


The Scientific and Technological Research adaptive-network based fuzzy inference system (ANFIS), Deep Learning, Internet of Things (IoT),
Council of Turkey (TUBITAK), Grant/Award
IoT data analysis, missing sensor data prediction
Number: 118E212

1 INTRODUCTION

The concept of the Internet of Things (IoT) refers to a sensor-rich world where physical objects in our environment are increasingly enriched
with computing, sensing, and communication capabilities. Sensor technology is one of the core enabling technologies in this world. Sensors
are utilized to collect the large amount of heterogeneous data for large scale IoT applications such as environmental monitoring, e-health,
intelligent transportation systems, military, smart agriculture,1 and industrial plant monitoring.2 Sensors and connected devices with diverse digital
technologies generate an excessive amount of data, which are multi-source, real-time, dynamic, sparse, highly heterogeneous, and semantically
rich. In the large scale IoT platforms, due to the lack of battery power, communication errors, and malfunctioning devices, sensor-generated data
are considered to be inherently noisy, uncertain, erroneous, and missing.3 Therefore, data generation and quality become a critical issue in data
processing and analysis.
In this work, we address the problem of missing sensor data. This problem is very common in IoT for various reasons, such as unstable
network communication, synchronization issues, unreliable sensors, and other types of equipment failure. Eliminating missing data results in
loss of information and may lead to incorrect analytical results. Thus, the prediction and assessment of missing values become an imperative
task.4 Hence, there is still a need for novel prediction models to predict the missing data. In order to address the missing sensor data problem,
we propose two prediction models based on Deep Learning (DL) and Adaptive-Network based Fuzzy Inference System (ANFIS). We focus on
sensory-rich IoT applications, where our models learn how to infer the missing data from different sensors' data optimally.
ANFIS is a soft computing method that combines the advantages of Artificial Neural Networks (ANN) and Fuzzy Inference Systems (FIS).
ANFIS has high generalization ability supported with fast and accurate learning phase.5 Based on this, we decided to solve the missing sensor
data prediction problem with ANFIS.
Recently, DL is attracting widespread interest in academic and industrial fields due to the state of art performance in many domains such as
computer vision, natural language processing, speech recognition, visual object recognition, and many other domains.6 DL also indicates good

Concurrency Computat Pract Exper. 2019;e5400. wileyonlinelibrary.com/journal/cpe © 2019 John Wiley & Sons, Ltd. 1 of 15
https://doi.org/10.1002/cpe.5400
2 of 15 GUZEL ET AL.

potential for analyzing vast volumes of data and discriminative tasks such as a classification and prediction.7 We secondly employ DL for the
missing data problem due to its predictive analytics' power for large-scale data sets. The key contributions of this paper are summarized as follows.
• We propose novel prediction models based on ANFIS and DL to solve the missing data problem in IoT.
• We conduct extensive experiments to validate the performance of the proposed prediction models.
The rest of this paper is organized as follows. Section 2 includes related works on the missing sensor data prediction. Section 3 explains
the dataset and model descriptions. Section 4 and Section 5 introduce the details of the proposed prediction models. Section 6 presents the
experimental results and comparison of prediction models. Finally, Section 7 concludes the paper.

2 RELATED WORK

The presence of missing/corrupted values in the databases/datasets has been a big problem for decades. In addition, with the newly emerged
concepts like Wireless Sensor Networks (WSN) and IoT, total data generation speed has skyrocketed, but the quality/reliability of equipment
went down. That caused more and more missing values. Therefore, numerous research has been conducted to overcome this problem. In this
section, we briefly present the proposed methods to estimate missing data and show a general overview of the literature in this domain. It must
be noted that, in compliance with the scope of this paper, we exclude temporal and spatiotemporal estimation methods. We grouped methods
into three categories, namely, Statistical Methods, Optimization Methods, and Machine Learning Methods.

2.1 Statistical methods


Statistical methods are the oldest set of methods used for estimation. Regression methodology is commonly used among statistical methods.
In the work of Qin et al,8 regression model is proposed for imputation of missing values. Their proposed method aims to minimize RMSE
metric after every imputation by optimizing regression model. In one aspect, random and scholastic approaches are utilized and compared to
conventional deterministic approach. The second aspect of the work is focused on the use of semi-parametric approach. The semi-parametric
approach is utilized and compared to non-parametric and linear approaches. The proposed semi-parametric regression model showed significant
improvement over conventional methods on both synthetic and real datasets.
Hidden Markov Model (HMM) is a stochastic finite state machine that is composed of four elements.9 These elements are (i) states, (ii) possible
observations, (iii) state transmission probability matrix, and (iv) emission probability matrix.10 HMM has a number of advantages, but in the scope
of this paper, we would like to focus on HMM's prediction capability. With its strong statistical foundation,11 HMM can detect patterns on time
series efficiently.9-11 In the work of Hassan and Nath,11 HMM is used to predict stock market prices. The proposed model takes four inputs,
which are opening, closing, high, and low prices of the current day. Using these inputs, the model predicts closing price of tomorrow. The results
acquired justifies the use of HMM on time series prediction.11 In the work of Li et al,12 HMM is used to predict Virtual Machine (VM) failures in
cloud platforms. The proposed hybrid method which is composed of AdaBoost algorithm and HMM can predict failures of VM's by observing
CPU, memory, network, and VM load.
Expected maximization (EM) algorithm is another approach used for estimation. EM algorithm expresses data points as mixture of models.13
To optimize models, maximum likelihood method is used. Mentioned ‘‘models’’ of EM algorithm can differ, but the most used model description
is based on gaussian distribution. In this approach, data points are expressed as a mixture of gaussian models and likelihood's are calculated
accordingly. However, gaussian mixture models suffer from the curse of dimensionality, the high number of features causes high computational
complexity. Even in some cases, a gaussian model cannot be fitted to the distribution of data. Therefore, the use of mixture models requires an
approach to overcome this problem. In the work of Delalleau et al,14 an EM algorithm that features gaussian models is used for dealing with
missing data. Their proposal uses a tree based approach to deal with curse of dimensionality. The proposed method is composed of 5 steps as
follows. (i) Unique missing patterns in the dataset are identified. (ii) Graph representation of missing patterns are fitted into a minimum spanning
tree. (iii) Minimum spanning tree is used to deduce ordering of training samples. (iv) Mean values for mixture models are calculated using k-Means
and covariances of models are initialized using empirical covariances of each cluster. (v) Iteration is performed through EM steps. In short, the
proposed approach optimizes the samples ordering with a minimum spanning tree to reduce computational cost and, for missing data estimation,
intra-cluster mean values are used. In the work of Eirola et al,15 EM algorithm based on gaussian mixture models are used to deal with missing
data problem. The method is composed of 4 steps as follows: (i) Fit a Gaussian mixture model by the EM algorithm; (ii) calculate log-likelihood and
Akaike information criteria (AICC ); (iii) choose the model which minimizes AICC and calculate mean and variance values for each missing data using
the chosen model; and (iv) perform task using conditional mean and variances calculated for missing data. AICC is susceptible to high dimensional
data, with high dimensional datasets, validity of AICC vanishes.15 To overcome this problem, a clustering based method High Dimensional Data
Clustering (HDDC)16 is utilized.

2.2 Optimization methods


Interpreting missing data estimation as an optimization problem is an alternative and rather innovative approach. Optimization algorithms use
evolution-like approach to reach a near-optimal solution in NP-hard problems where search space is vast. Despite the fact that missing data
GUZEL ET AL. 3 of 15

problem is not a native optimization problem, these algorithms ability to quickly explore search space increases the applicability in missing data
domain. Among optimization algorithms, genetic algorithm (GA), ant colony optimization (ACO), and particle swarm optimization (PSO) algorithms
are commonly used.
GA imitates the evolution of species.17 For a given problem, a population composed of candidate solutions is generated. Each solution is
evaluated according to a fitness function. Then, new populations are generated from the most successful individuals of previous population
iteratively. By utilizing addition of new random solutions and mutations at each generation, problem of falling into local minimum pits is avoided.
GA is one of the most frequently used algorithms in numerous domains and missing value estimation is not an exception. In the work of
García et al,18 GA is used to impute missing values. The proposed method handles the data as matrix and aims to find missing values in a way
that does not alter statistical characteristic of initial dataset. Suppose that X is the dataset matrix with missing values, Y is a matrix composed
of missing values and combination of X and Y, and X ̂ is the completed dataset. The method tries to find Y, which minimizes the difference of
̂ Fitness function is constructed upon this criteria and candidate Y matrixes are handled as individuals of population.
statistics between X and X.
The GA based approach is compared to EM and auxiliary regression based estimation models and surpasses methods in manner of preserving
statistical variables. Furthermore, GA based method claimed to be more flexible and responsive.18 Another GA based approach is utilized in the
work of Lobato et al.19 The proposed MOGAImp is a multi-objective GA (MOGA), which is based on NSGA-II.20 The reason behind using a
MOGA is to be able to optimize missing value selections on different metrics. In MOGAImp, used metrics are classification accuracy and RMSE.
Fitness function of is constructed upon these two metrics. The complete set of missing values is encapsulated as a single individual of population
and phases of GA are performed.
ACO based methods are easily applicable if data can be formulated as a graph problem.21 Parallel to this principle, in the work of Priya et al,21
missing value estimation problem formulated as a graph problem and an ACO based method is proposed. In conversion to graph, each covariant
is turned into a level composed of covariant values and the target (missing) attribute is turned into final level. On this graph, an ACO based
method, namely, Dual Repopulated Bayesian ACO (DPBACO), is applied. The reason behind this selection is ACO's susceptibility to fall into
local minimum. By duplicating population (main and reserve population) and crossing over individuals from different population in each iteration,
DPBACO increases variety, therefore overcomes the local minimum problem. Adding different Bayesian functions to ant traversal is applicable
to different data characteristics.
PSO22 is a stochastic swarm intelligence based optimization algorithm that is frequently utilized because of its simplicity, accuracy, and fast
convergence ability.17 In the work of Nekouie and Moattar,23 PSO based hybrid missing value estimation method is used on breast cancer
diagnosis data. To overcome PSO's weakness of getting stuck in local optimum, chaotic reduced adaptive PSO (CRAPSO) is employed. The
proposed method firstly generates a set of values to impute missing one using Bayesian networks. In the next step, tensor is used for estimation.
Tensor-based estimation is performed by calculating missing attribute as a linear function of present attributes. However, in case of data
insufficiency, like other mean square error minimization based models, tensor based estimation model suffers from accuracy loss. Therefore, an
automatic data generation phase that utilizes CRAPSO is placed before tensor phase. CRAPSO and tensor phases run iteratively until convergence
is achieved. After convergence, the acquired results are used for imputation.
In addition to stand-alone solutions, optimization methods are generally used for optimizing machine learning algorithms that are used for
missing data estimation. Research works that fall under this category are given in the next section.

2.3 Machine learning methods


Recently, machine learning (ML) methods are gaining popularity for missing data estimation.24 Among ML methods, k-Nearest Neighbor (k-NN)
algorithm is one of the most popular ones. Despite the fact that k-NN is a classification algorithm, it is highly applicable to missing data problem,
especially if data dimensionality is high and number of observations is low.25 In case of a missing data, a number of (k, a predetermined number)
of similar samples are selected and used for estimation. In one of the earlier research works, k-NNimpute is proposed and implemented on
DNA microarrays.26 Proposed k-NNimpute treats every attribute in an equal manner. In the work of García-Laencina,27 a weighting approach
is utilized upon attributes based on attributes' affinity with the class label. The proposed method enhances k-NNimpute in manner of accuracy.
Another weighted k-NN method is proposed in the work of Tutz and Ramzan,25 where a subset of attributes are selected and used for distance
calculations in a weighted manner.
ANN is a bio-inspired method that mimics the neuronic structure of the human brain. In missing data problem, numerous research featuring
different types of ANN are proposed. An auto-associative NN (Neural Network) creates a bottleneck between input and output layers and has
a remarkable ability to learn linear and non-linear relationships. In the works of Abdella and Marwala28 and Nelwamondo et al,29 auto-encoder
NNs are utilized for missing data problem, and a GA is utilized to optimize error function. In the work of Ravi and Krishna,30 Particle Swarm
Optimization is employed to optimize the hidden layer in auto-associative NN. Another NN type used is Extreme Learning Machines (ELM), a
feed-forward neural network does not require updating of weights.31,32 Due to this characteristic, ELM has a significantly shorter training time
in comparison to networks using backpropagation. In the work of Sovilj et al,33 a hybrid method that utilizes Gaussian mixture models (EM
algorithm) and ELM is proposed. It is based on performing multiple imputations via EM algorithms, generating an ELM for each imputed dataset,
and combining all ELM to perform a final estimation. A rather refined approach is proposed in the work of Laña et al,34 where a GA optimized
4 of 15 GUZEL ET AL.

ELM is applied. Except than the mentioned earlier, research works featuring multi-layer perceptron networks (MLP),35 self-organizing maps
(SOM),36,37 probabilistic NNs,38 and other types of NN are also present in the literature.
Clustering methodologies are another ML method used for estimation. Missing data can be estimated from the other data that share the same
cluster. Fuzzy c-Means (FCM) is a fuzzy based clustering algorithm that allows inter-lapping between different clusters. In other works,39-41 FCM
based methods are used for missing data problem.

3 DATASET AND MODEL DESCRIPTIONS

3.1 Dataset
In this paper, we used the Intel Berkeley Research Lab dataset, which is publicly available. This dataset is collected from 54 sensors, which
were deployed in the Intel research laboratory at Berkeley between February 28 and April 5, 2004. It contains 2.3 million sensor readings
with time-stamped topology information, humidity, temperature, light intensity, and voltage values in ‘‘date:yyyy-mm-dd, time:hh:mm:ss.xxx,
epoch:int, moteid:int, temperature:real, humidity:real, light:real, voltage:real‘‘ format. Herein, the temperature unit is degrees Celsius. Humidity unit
is temperature corrected relative humidity, ranging from 0-100%. Light intensity is in lux and voltage is expressed in volts.42 The sensors and
sensor ids were arranged in the lab according to the diagram given in Figure 1.
In this study, humidity, temperature, and light intensity observations of 19th, 20th, and 21st sensors are used for evaluating the proposed
prediction models. The reason behind this selection is the completeness of the mentioned nodes. Most of the nodes in the dataset have missing
or corrupted readings. The selected nodes have relatively higher data density, especially between the 29th of February and the 7th of March.
Eight-day period between these dates has 100% density for observations when sensor reading are grouped by 3-minute intervals. Another
reason for the selection of nodes is the proximity of node locations. The selected nodes are adjacent to each other, which ensures similar sensing
environment. This enables us to use data in two different forms:

• Merging nodes' reading data together and process like all data is coming from a single node. This approach is utilized at DL based estimation
model.
• Using readings from different nodes separately to evaluate a single model with three different data sources. This approach is utilized at ANFIS
based estimation model.

3.2 Descriptive statistical analysis


In this section, we calculated the descriptive statistics and correlation for each sensor value type of the dataset. In the light of the descriptive
statistics given in Table 1, it has been observed that sensor values have an unbalanced data characteristic and not normally distributed.

FIGURE 1 Sensor locations and


selected nodes in the Intel
Berkeley Research Lab42

TABLE 1 Descriptive statistics of dataset Features


Statistical Parameters Temperature Humidity Light
Min 15.270 14.430 0.460
Max 37.660 51.720 1847.360
Mean 23.216 35.481 323.887
Standard deviation 4.651 7.767 436.857
GUZEL ET AL. 5 of 15

FIGURE 2 Spearman correlation matrices of sensor values

Model Abbreviation Input I Input II Output TABLE 2 Inputs and output of the models
Mdl1 Humidity Light Temperature
Mdl2 Temperature Light Humidity
Mdl3 Temperature Humidty Light

Due to the not-normally distribution, the Spearman correlation coefficient is calculated among sensor values. The correlation matrix obtained
from the calculation is given in Figure 2.
It is used to determine the relationship between the inputs and outputs of the proposed models and to interpret the overall results. The fact
that inputs represent the output well means that there is a high correlation between inputs and output, which will contribute positively to the
performance of the models. The proposed models produce high accuracy prediction results when the correlation between the sensor nodes is
high. The accuracy decreases when the correlation becomes less. According to Figure 2, the highest correlation values between input and output
sensor values are obtained as Mdl1, Mdl2, and Mdl3, respectively.
Due to an unbalanced data characteristic, we also performed Min-Max Normalization to change their values to a common scale, without
distorting differences in the ranges of values. These normalized values were used in the training and testing processes of both DL and ANFIS
based models.

3.3 Inputs and outputs of the models


In this paper, we aimed to construct models that are capable of estimating the value of a sensor using the other two sensor's values of the same
time. The used sensor values are temperature, humidity, and light. We defined the three models given in Table 2. The abbreviations defined in
Table 2 are used throughout the paper.
To provide for a better understanding, visualization of models and actions taken in case of a missing value are given in Figure 3. Figure 3
depicts a sensor value stream between times {t − 2} and {t + 4}. In the figure, Tempx , Humidx , and Lightx stand respectively for temperature,
humidity, and light intensity observation of time (x). As seen from the figure, missing sensor value incidents occur at times {t − 1, t + 2, t + 4} and
missing value is predicted from other readings of the same time. Flowchart representation of the proposed models application on an IoT data
stream is also given in Figure 4.

4 ADAPTIVE-NETWORK BASED FUZZY INFERENCE SYSTEM BASED MISSING VALUE


PREDICTION

In this section, fuzzy logic based method utilized for missing sensor data prediction, ANFIS,43 and predecessor of ANFIS, FIS44 are briefly explained.

4.1 Adaptive-network based fuzzy inference system


Estimation models based on mathematical/statistical methods perform remarkably well on mathematically formulable problems. However,
real-world problems, which are usually ill-defined and uncertain, do not fall under this category. In this type of problems, FIS is one of the
frequently used approaches. An FIS model employs non-linear mapping from an input space to an output space using fuzzy rules.45 Rules used
by FIS are generated by humans.45 Therefore, the success of model hugely depends on the expertise of the expert human who generates the
rules. There are two widely used FIS types present in the literature, ie, Mamdani-type,46 and Sugeno-type.44 At some cases, it might be quite
challenging to find suitable experts or create an accurate rule-base. Besides, even if a rule-base is acquired, tuning of membership functions'
parameters is still a requirement. ANFIS is proposed to overcome this challenge. Proposed by Jang,43 ANFIS method utilizes ANN to optimize FIS
parameters and only requires to input-output tuples to create a rule-base.
Takagi-Sugeno's method of reasoning44 is considered to be a good approximator.45 Fuzzy reasoning creates a rule for each if-then state. Each
of the fuzzy rules generated by Sugeno-type reasoning has a single output that is a linear combination of inputs and a constant term. The output
6 of 15 GUZEL ET AL.

Light Related
Time Temperature Humidity
Density Model

t-2 Tempt-2 Humidt-2 Lightt-2 -

t-1 ??? Humidt-1 Lightt-1 Mdl1 Humidt-1 Lightt-1 > Tempt-1

t Tempt Humidt Lightt -

t+1 Tempt+1 Humidt+1 Lightt+1 -

t+2 Tempt+2 ??? Lightt+2 Mdl2 Tempt+2 Lightt+2 > Humidt+2

t+3 Tempt+3 Humidt+3 Lightt+3 -

t+4 Tempt+4 Humidt+4 ??? Mdl3 Tempt+4 Humidt+4 > Lightt+4

FIGURE 3 Missing sensor value situations (left) and actions (right) taken in missing sensor value occurences

FIGURE 4 Flowchart representation of proposed models application on IoT data streams

of the system is calculated as a weighted average of the output of each rule. To present ANFIS, an FIS with a 2-inputs where each input is
assumed to have two fuzzy linguistic terms is considered;
Rule 1: IF (x = A1 ) AND (y = B1 ) THEN f11 = p11 x + q11 y + r11
Rule 2: IF (x = A1 ) AND (y = B2 ) THEN f12 = p12 x + q12 y + r12
Rule 3: IF (x = A2 ) AND (y = B1 ) THEN f21 = p21 x + q21 y + r21
Rule 4: IF (x = A2 ) AND (y = B2 ) THEN f22 = p22 x + q22 y + r22
{pij , qij , rij } are the parameters that are determined during the training phase of ANFIS and {Ai , Bj } are fuzzy terms that are used for defining
data points.
GUZEL ET AL. 7 of 15

FIGURE 5 Sample ANFIS architecture

Figure 5 is an ANFIS structure which has two inputs (x, y) and an output (f). In the figure, circle nodes are fixed nodes that does not change
throughout the training phase, whereas square nodes are adaptive nodes that are calibrated through the training phase. An ANFIS is consisted
of 5 layers.
Layer 1: Every node in first layer is adaptive and calculates degree of membership value for each input variable. For a 2-input model, node
functions for each input are given as Equation (1) and Equation (2), ie,

O1,i = 𝜇Ai (x), i = 1, 2 (1)

O1,j = 𝜇Bj (y), j = 1, 2. (2)

In Equation (1) and Equation (2), 𝜇 Ai and 𝜇Bj are the selected membership functions. These functions can be Gaussian membership function
(given in Equation (3)), generalized bell membership function (given in Equation (4)), or another one, ie,

[ ( )2 ]
x − ci
𝜇Ai (x) = exp − (3)
2ai

2b
1
𝜇Ai (x) = . (4)
| x−ci |
1+| a |
| i |

In Equation (3) and Equation (4), {ai , bi , ci } are the parameters of membership function and can change shape of the function. They are referred
as premise parameters.

Layer 2: Every node in this layer is fixed and labeled with . Nodes in this layer multiply incoming signals and send the product to the next
layer. Output of each node symbolizes firing strength of each rule. Output function of nodes in this layer is given as Equation (5), ie,

O2i,j = wi,j = 𝜇Ai (x) × 𝜇Bj (y), i, j = 1, 2 (5)

Layer 3: Every node in this layer is fixed and labeled with N. Nodes in this layer normalizes firing strengths of rules. Every ith node calculates
ratio of ith rule to sum of all rules' firing strengths using Equation (6).

wi,j
O3i,j = wi,j = ∑ , i, j = 1, 2. (6)
wi,j

Layer 4: Every node in this layer is adaptable with a node function given as Equation (7), ie,

O4i,j = wi,j fi,j = wi,j (pij x + qij y + rij ), i, j = 1, 2. (7)

Output of ith node is wi . Variables {pij , qij , rij } are referred as consequent parameters.
Layer 5: The fifth layer is the output layer of ANFIS structure and contains a single node that performs summation of all signals from the fourth

layer. The node in this layer is labeled as and performs summation using Equation (8), ie,

∑∑
Output = O5 = wi,j fi,j i, j = 1, 2. (8)
i j

As mentioned, ANFIS structure has two adaptive layers, namely, the first and the fourth layers. Their ability to adapt roots from parameters
of these layers, namely, premise parameters of the first layer and consequent parameters of the fourth layer. The training phase of ANFIS consists
8 of 15 GUZEL ET AL.

of tuning of the premise and consequent parameter. For this purpose, ANFIS utilizes a hybrid learning algorithm.43 This algorithm is composed
of two passes. A forward pass is used for tuning of consequent parameters and a backward pass is used for tuning of premise parameters. In
the forward pass, premise parameters are fixed and signals proceed to layer four. In the fourth layer, consequent parameters are determined by
using the least square method. In the backward pass, consequent parameters are fixed, error rates propagate back to the first layer. In the first
layer, premise parameters are tuned based on membership function using Gradient Descent method.47

4.2 ANFIS clustering


As mentioned earlier, Sugeno-type ANFIS method utilizes a reasoning mechanism based on fuzzy IF-THEN rules. In FIS models, fuzzy terms are
used for defining variables, therefore choosing rules. However, in ANFIS, there are no fuzzy terms generated by a human expert. Instead, clusters
are used. All input and output variables are clustered and these clusters are used for rule generation. Numerous clustering methods are present
in the literature. In this work, three of these methods are utilized, namely, Grid Partitioning (GP), Subtractive Clustering (SC), and Fuzzy C-Means
Clustering (FCM).
Based on Zadeh's proposal of fuzzy sets48 and linguistic terms,49-51 GP is a widely used clustering method for obtaining fuzzy inference
systems. In this method, input-output couplings create a surface and this surface partitioned to grids based on linguistic labels of inputs. Each
grid represents a fuzzy inference area and creates a rule for ANFIS. It also must be noted that quantitative values can be clustered and treated
as linguistic terms.52 That flexibility allows GP to be used without prior knowledge or human expertise. The drawback of the GP method is its
vulnerability against the number of input variables. The number of rules increases exponentially as input variable number increases. Furthermore,
GP requires a large number of observations to perform well. That being said, GP model is suitable when number of variables are low and the size
of observations is high.53
SC method treats each data point as a potential cluster center and calculates a measurement of possibility based on density. Data points that
already formed a cluster subtracted from the complete set of data points, remaining non-cluster-member data points, are used for generation of
new cluster centers. Unlike GP, SC can be used for models that have high number of inputs because the number of rules of an SC based ANFIS
model is determined by number of clusters.54 However, number of observations is a problem for SC, high number of observations causes a high
computational complexity, which effects run-times. Furthermore, calibration of the cluster radius is crucial for performance of SC. Small cluster
radius results a high number of rules. On the other hand, a big radius results with highly general clusters, which causes non-acceptable results.55
FCM method is firstly introduced by Dunn56 and later improved by Bezdek.57 In FCM, each data point belongs to a fuzzy cluster with a
membership degree and can belong to multiple clusters with different membership grades.58,59 This approach removes sharp boundaries between
clusters.59 It must be noted that FCM clustering method partitions data points to a pre-determined number of clusters. In this research, we used
MATLAB implementation of FCM Clustering, which determines the number of clusters using SC before performing FCM clustering.

4.3 ANFIS optimization


To reveal the true potential of ANFIS, optimization of clustering methods poses crucial importance. Based on this, first,a set of experiments is
conducted to optimize parameters of clustering methods. In our experiment method, all parameters are fixed to the default value of clustering
method except the one that is being investigated. Investigated parameter changes in range of pre-determined values and for each value tests
are performed. Parameters' effect on clustering methods are examined for each model. Results of examination are used for generating new
test scenarios in a way that combines high performance parameter selections together. New test scenarios are performed and best performing
parameter configurations are acquired.

4.3.1 Grid partitioning parameter optimization


GP method has three parameters, namely, ‘‘Input Membership Function Type’’ (IMFT), ‘‘Number of Membership Functions’’ (NMF), and ‘‘Output
Membership Function Type’’ (OMFT).

• IMFT is crucial for the performance of fuzzy sets. Membership function calculates a membership degree between [0,1] for data points. In
this research eight different membership functions (mf) are used: generalized bell-shaped mf, gaussian curve mf, gaussian combination mf,
triangular-shaped mf, trapezoid-shaped mf, difference between two sigmoidal mf, product of two sigmoidal mf, and pi-shaped mf. IMFT is
specified for each input parameter.
• NMF specifies the number of membership functions for each input variable and directly effects number of rules. In this research, cluster
numbers between {2} and {10} are tested for all models. NMF is specified for each input parameter.
• OMFT specifies the type of membership function for output which can be linear or constant.

Best performing parameter configuration for GP is given in Table 3, where IMFT Input I, IMFT Input II, NMF Input I, NMF Input II, and OMFT
respectively stand for IMFT for the first input, IMFT for second input, NMF for first input, NMF for second input, and OMFT of ANFIS model.
GUZEL ET AL. 9 of 15

Model IMFT IMFT NMF NMF OMFT TABLE 3 Best performing parameter configurations for GP based ANFIS
Input I Input II Input I Input II
Mdl1 gaussmf trimf 2 3 linear
Mdl2 trimf trapmf 2 constant
Mdl3 gaussmf gaussmf 2 2 constant

Model CIR CIR CIR SF AR RR TABLE 4 Best performing parameter configurations for SC based ANFIS
Input I Input II Output
Mdl1 0.90 0.30 0.30 0.90 0.50 0.25
Mdl2 0.70 0.30 0.70 0.90 0.50 0.20
Mdl3 0.70 0.60 0.70 0.90 0.50 0.30

Model CN Expo MNI MI TABLE 5 Best performing parameter configurations for FCM based ANFIS
Mdl1 3 1.2 15 1.00E-5
Mdl2 2 1.2 50 1.00E-5
Mdl3 3 1.2 10 1.00E-5

4.3.2 Subtractive clustering parameter optimization


SC method has four parameters. These are Cluster Influence Range (CIR), Squash Factor (SF), Accept Ratio (AR), and Reject Ratio (RR).

• CIR is the influence range of clusters. Default value of CIR is {0.5}. In this research, input and output CIR values are tested with values
between {0.1} and {0.9}.
• SF is the factor used for scaling of influence range. Default value of SF is {1.25}. In this research, SF value is tested with values between {0.3}
and {1.50}.
• AR is used for acceptance of new clusters, values between {0.30} and {0.95} are used for tests but no significant effect of AR is observed.
• RR used for rejection of new clusters. RR values between {0.05} and {0.30} are used for testing.

Best performing parameter configuration for SC is given in Table 4, where CIR Input I, CIR Input II, and CIR Output respectively stand for CIR
for first input, CIR for second input, and CIR for output.

4.3.3 Fuzzy C-means clustering parameter optimization


FCM Clustering has four parameters, namely, Number of Clusters (NC), Exponent (Expo), Maximum Number of Iterations (MNI), and Minimum
Improvement of Objection Function (MI). Among these parameters, MNI and MI are related to termination of clustering process. Process
terminates when improvement between two iterations falls under MI or number of iterations reach to MNI.

• NC specifies number of clusters, therefore directly affects the number of generated rules. In implementation, if not specified, NC is decided
by subtractive clustering phase, which has a cluster range of {0.5}. In our parameter tests, default option and cluster numbers between {2} and
{50} are used for testing.
• Expo controls fuzzy overlapping between clusters, has a default value of {2.0} in MATLAB implementation of SC. In this research, values
between {1.2} and {3.0} are tested.
• MNI is the number of iterations in the training phase. In this research, MNI values between {5} and {50} are tested.
• MI is the minimum improvement value that is used for termination of algorithm. MI has a default value of {1.00E-5}. Values
{1.00E-4,1.00E-5,1.00E-6} are tested.

Best performing parameter configuration for FCM is given in Table 5.

5 DEEP LEARNING BASED MISSING VALUE PREDICTION

In this section, we introduce the proposed DL models and Long Short Term Memory (LSTM) network. Then, parameter optimization and training
processes are explained, respectively.

5.1 Deep Learning based prediction models


In this section, we have developed three DL models based on the LSTM network structure. The proposed DL models aim to perform data analysis
operations quickly and effectively by estimating the sensor values, which may be missing in real-time data analysis. Since there are three types
10 of 15 GUZEL ET AL.

FIGURE 6 General architecture of Deep Learning models (Left), LSTM Memory Block (Right)

of sensor data in the data set, three different models are proposed for the estimation of each type. The relevant model will work and complete
the missing data in case of one of the readings is missing. The overall architecture of the proposed models with all cases is shown in Figure 6.
Accordingly, Mdl1, Mdl2, and Mdl3 are proposed for estimating the missing temperature, humidity, and light sensor values, respectively.
Herein, the models predict temperature, humidity, and light data by taking humidity-light, temperature-light, and temperature-humidity value
pairs as inputs. The input and output values of the models are given in Table 2. In addition, the hyper-parameter values and processes of the
models are given in detail in Section 5.3.

5.2 Long Short Term Memory


Long Short Term Memory (LSTM) is an extension of Recurrent Neural Network (RNN). It has been proposed to overcome training problems such
as vanishing and exploding gradients by Hochreiter and Schmidhuber.60 Thanks to advances in their architecture, RNNs have been found quite
successful in predicting sequential and time series data.6 In LSTM, a memory unit takes the place of each ordinary neuron in the hidden layer
of standard RNN. The LSTM Memory block shown in Figure 6 has an input gate, a forget gate and an output gate which regulate the flow of
information in and out of the cell.61,62 The equations for these gates and cell states are presented as follows:

ft = 𝜎(Wf · [ht−1 , xt ] + bf ) (9)

it = 𝜎(Wi · [ht−1 , xt ] + bi ) (10)

C̃ t = tanh(Wc · [ht−1 , xt ] + bc ) (11)

Ct = ft ∗ Ct−1 + it ∗ C̃ t (12)

ot = 𝜎(Wo · [ht−1 , xt ] + bo ) (13)

ht = ot ∗ tanh Ct , (14)

where xt and ht are input and output vector at time t, C̃ t is the old cell state, Ct is the new cell state, it , ft , and ot are the input, forget, and output
gates, respectively. Wc , Wi , Wf , Wo are the input weights matrices, ∗ is the element-wise product and operates on the two vectors of the same size,
1
bc , bi , bf , bo are the bias vectors. 𝜎(·) represents the logistic sigmoid function, ie, 𝜎(x) = 1
+ e−x and tanh(·) represents hyperbolic tangent function.

5.3 Parameter optimization and training process


Parameter optimization is one of the most important steps to get the most effective results from DL models. Moreover, it is important to use
descriptive statistics to determine the appropriate model parameter for the prediction of each sensor value. In particular, the network parameters
such as LSTM units, batch size, and layer size are carefully identified. Therefore, we applied grid search method under different number of layers
(1, 2, 3), hidden units (50, 120, 240, 480) and batch size (60, 72, 240, 480, 1440) for preliminary model optimization. The obtained optimum
parameters are given in Table 6.
GUZEL ET AL. 11 of 15

Model Name Hidden Layers LSTM Units (L1xL2) Batch Sizes Epochs TABLE 6 Hyper-parameters of the DL models
Mdl1 2 240x240 1440 200
Mdl2 2 240x240 1440 200
Mdl3 2 60x60 60 200

In the process of model construction and training, we use TensorFlow63 and Keras64 framework as program computing environment. Adaptive
Moment Estimation (Adam) optimizer which computes individual adaptive learning rate for different parameters is used to minimize the loss
function.65 A mini-batch strategy is utilized in our implementation to reduce loss fluctuation so the gradients are calculated with respect to
mini-batches.

6 EXPERIMENTAL RESULTS

The performance of the proposed models is evaluated with the Root Mean Squared Error (RMSE) metric, given in Equation (15), ie,


√ N
√∑
RMSE = √ (y0 − ye )2 ∕N. (15)
i=1

Here, yo , ye , and N represent the observed sensor value, the estimated sensor value, and the total number of observations, respectively.
RMSE metric is used for the measurement of error amount between the estimated value and real observed value. It must be noted that RMSE
metric changes depending on the value range of variables. To acquire RMSE metrics in a proportional manner, all experiments are conducted on
normalized data.
To verify the prediction accuracy, we compare our models with SVM Regression (SVR) and Gaussian Kernel Regression (GKR), which are two
non-linear regression methods. For comparison, we performed experiments according to the inputs and outputs in Table 2. In these experiments,
each node is addressed as a different data source. Each experiment is conducted on all three data sources using 10-fold cross validation method
which results ten test results in per data source (sensor) and thirty test results per test model. Error ratios of ANFIS, DL, SVR, and GKR based
prediction methods we present throughout this section are the averages of acquired thirty results. Therefore, the results presented in this section
are generalized and effects of data selection is minimized. Experiments of ANFIS based methods, SVR, and GKR are performed on MatLab2018a.
Parameter configurations of SVR and GKR are default parameters that are defined in MATLAB. The normalized RMSE values of all models are
presented in Table 7 and Figure 7.
In case of Mdl1 and Mdl3, the proposed models have lower error ratios than implemented non-linear regression methods (SVR and GKR).
Among the proposed methods, DL based method demonstrates the best performance. In case of Mdl2, DL based method has the lowest error
ratio but error ratios of ANFIS based methods do not show any significant difference when compared to regression models. In total, proposed
models show improved results over SVR and GKR.
Among the models, Mdl1 seems to be the most predictable, which shows us that relations of temperature-light and temperature-humidity
tuples are highly correlated. A similar trend is observed in Mdl2. However, relations between humidity-temperature and humidity-light seem to
have a different characteristic. DL and ANFIS based methods show higher error ratios on Mdl2 compared to Mdl1, unlike SVR and GKR. SVR and
GKR perform better on Mdl2 compared to Mdl1. Mdl3 is the least correlated relation. All methods perform poorly on Mdl3 compared to other
methods.
As seen in Table 7, the proposed ANFIS and DL based methods fall behind regression methods in a timely manner. GKR method has the
lowest training time among all methods. The training time of SVR and GP based ANFIS is also below 1 second for training set composed of 3970
observations. On the other hand, the training time of SC based ANFIS and FCM based ANFIS is relatively longer but the excess time used in
training does not reflect on prediction results. Among ANFIS based methods, all three models have resembling results but GP based method has
significantly lower training time. In the case of Mdl1 and Mdl3, GP based ANFIS outperforms GKR and SVR in manner of prediction accuracy
with reasonable training time.
In DL, training time depends on a large number of factors such as network architecture, output channels, batch sizes, and other hyper-parameters.
Therefore, training times of DL based methods resulted as higher compared to the other methods. In light of the results in Table 7, the training
periods of Mdl1, Mdl2, and Mdl3 lasted for 24.28, 26.63, and 320.82 seconds, respectively. In particular, with lower batch size, Mdl3 has higher
computational complexity than Mdl1 and Mdl2, which causes significantly higher training time. It must be noted that priorities must be set before
method selection. Acquired results show a trade-off between robustness and accuracy among methods. If robustness is the most desirable
aspect, ANFIS based methods (in Mdl1 and Mdl3) and regression models are the right choices. However, if accuracy is the top priority, DL based
method is the right selection on all three models.
In general, the proposed models can improve the prediction accuracy and stability of missing sensor data greatly and effectively. Therefore,
ANFIS and DL based models are promising choices for prediction models.
12 of 15

TABLE 7 Normalized RMSE metrics and training/testing times


Method Mdl1 Mdl2 Mdl3
Train Test Training Testing Train Test Training Testing Train Test Training Testing
Time Time Time Time Time Time

DL 0.0659 0.0701 24.28 0,5168 0.0726 0.0880 26.63 0,5877 0.1101 0.1711 320.82 0,6237
GP-ANFIS 0.1487 0.0990 0.79 0.0086 0.1109 0.1086 0.37 0.0074 0.1096 0.1733 0.49 0.0063
SC-ANFIS 0.1581 0.0979 5.55 0.0081 0.1040 0.1089 5.69 0.0065 0.0905 0.1762 5.85 0.0064
FCM-ANFIS 0.1662 0.1031 1.65 0.0067 0.1106 0.1061 0.60 0.0055 0.0943 0.1751 1.56 0.0062
GKR 0.0907 0.1139 0.07 0.0007 0.1094 0.1079 0.06 0.0006 0.1690 0.1940 0.08 0.0007
SVR 0.1068 0.111 0.44 0.0014 0.1140 0.1055 0.41 0.0015 0.1797 0.1833 0.41 0.0012
GUZEL ET AL.
GUZEL ET AL. 13 of 15

0,20 0,20 0,20

0,15 0,15 0,15

0,10 0,10 0,10

0,05 0,05 0,05

0,00 0,00 0,00


DL GP SC FCM GKR SVR DL GP SC FCM GKR SVR DL GP SC FCM GKR SVR
ANFIS ANFIS ANFIS ANFIS ANFIS ANFIS ANFIS ANFIS ANFIS

Mdl1 Mdl2 Mdl3

FIGURE 7 Performance comparisons of all models

7 CONCLUSION AND FUTURE WORK

Missing sensor values are a big problem for both IoT and WSN. In this work, we proposed two models to tackle this problem, namely, ANFIS
and DL based models. DL models have shown state-of-art performance in computer vision, natural language processing, and robotics. These
models have an interesting potential solution for many areas including classification, prediction, and control problem. On the other hand, ANFIS
is used successfully in controlling, modeling, and parameter estimation of complex systems due to adaptation capability, nonlinear ability, and
rapid learning capacity. The motivation of this paper is to utilize their advantages in the IoT missing sensor data problem. For this purpose, firstly,
optimization processes are carried out for proposed models for identifying the optimal model parameters. Secondly, the models are constructed
by using obtained optimal parameters, and then train and test procedures are performed. The results indicate that both DL and ANFIS methods
are remarkably well in terms of normalized RMSE metrics compared to the selected non-linear regression models. Through comparisons with SVR
and GKR, our proposed models show their advantages on the prediction accuracy. Particularly, DL obviously outperforms the other methods.
Moreover, ANFIS based models work quite well for estimating missing values.
In this work, the use of different sensor data types to estimate a sensor value is investigated. Sensor reading from other sensor nodes and
previous readings from the sensor are completely ignored. Nevertheless, even with the ignored data experiment showed that the proposed
methods perform remarkably well. Based on this, DL and ANFIS based methods deserve further investigation on IoT data analysis problems. Our
next work will be about immersing previous readings and reading of neighbor nodes for the estimation process in a spatiotemporal manner.

ORCID

Suat Ozdemir https://orcid.org/0000-0002-4588-4538

REFERENCES
1. AlZu'bi S, Hawashin B, Mujahed M, Jararweh Y, Gupta BB. An efficient employment of internet of multimedia things in smart and future agriculture.
Multimed Tools Appl. 2019:1-25.
2. Atzori L, Iera A, Morabito G. The internet of things: a survey. Computer Networks. 2010;54(15):2787-2805.
3. Karkouch A, Mousannif H, Al Moatassime H, Noel T. Data quality in internet of things: a state-of-the-art survey. J Netw Comput Appl. 2016;73:57-81.
4. Qin Y, Sheng QZ, Falkner NJG, Dustdar S, Wang H, Vasilakos AV. When things matter: a survey on data-centric internet of things. J Netw Comput
Appl. 2016;64:137-153. https://doi.org/10.1016/j.jnca.2015.12.016
5. Vural Y, Akay D, Pourkashanian M, Ingham DB. Modeling of an intermediate temperature solid oxide fuel cell using the adaptive neuro-fuzzy inference
system (ANFIS). J Fuel Cell Sci Technol. 2010;7(3):034501.
6. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444.
7. Kök I, Simsek MU, Özdemir S. A deep learning model for air quality prediction in smart cities. In: Proceedings of the IEEE International Conference on
Big Data (Big Data); 2017; Boston, MA.
8. Qin Y, Zhang S, Zhu X, Zhang J, Zhang C. Semi-parametric optimization for missing data imputation. Applied Intelligence. 2007;27(1):79-88.
9. AlZu'bi S, AlQatawneh S, ElBes M, Alsmirat M. Transferable HMM probability matrices in multi-orientation geometric medical volumes segmentation.
Concurrency Computat Pract Exper. e5214.
10. AlZu'bi S, Islam N, Abbod M. Enhanced hidden Markov models for accelerating medical volumes segmentation. In: Proceedings of the 2011 IEEE GCC
Conference and Exhibition (GCC); 2011; Dubai, UAE.
11. Hassan MR, Nath B. Stock market forecasting using hidden Markov model: a new approach. In: Proceedings of the 5th International Conference on
Intelligent Systems Design and Applications (ISDA'05); 2005; Warsaw, Poland.
12. Li Z, Liu L, Kong D. Virtual machine failure prediction method based on AdaBoost-hidden Markov model. In: Proceedings of the 2019 International
Conference on Intelligent Transportation, Big Data & Smart City (ICITBS); 2019; Changsha, China.
13. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B (Methodol). 1977;39(1):1-38.
14. Delalleau O, Courville A, Bengio Y. Efficient em training of Gaussian mixtures with missing data. arXiv preprint arXiv:1209.0521. 2012.
14 of 15 GUZEL ET AL.

15. Eirola E, Lendasse A, Vandewalle V, Biernacki C. Mixture of Gaussians for distance estimation with missing data. Neurocomputing. 2014;131:32-42.
16. Bouveyron C, Girard S, Schmid C. High-dimensional data clustering. Comput Stat Data Anal. 2007;52(1):502-519.
17. Elbes M, Alzubi S, Kanan T, Al-Fuqaha A, Hawashin B. A survey on particle swarm optimization with emphasis on engineering and network applications.
Evolutionary Intelligence. 2019;12(2):113-129.
18. García JCF, Kalenatic D, Bello CAL. Missing data imputation in multivariate data by evolutionary algorithms. Comput Hum Behav. 2011;27(5):1468-1474.
19. Lobato F, Sales C, Araujo I, et al. Multi-objective genetic algorithm for missing data imputation. Pattern Recognit Lett. 2015;68:126-131.
20. Deb K, Agrawal S, Pratap A, Meyarivan T. A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In:
Proceedings of the International Conference on Parallel Problem Solving from Nature; 2000; Paris, France.
21. Priya RD, Sivaraj R, Priyaa NS. Heuristically repopulated bayesian ant colony optimization for treating missing values in large databases. Knowl Based
Syst. 2017;133:107-121.
22. Kennedy J. Particle swarm optimization. In: Encyclopedia of Machine Learning. New York, NY: Springer; 2010:760-766.
23. Nekouie A, Moattar MH. Missing value imputation for breast cancer diagnosis data using tensor factorization improved by enhanced reduced adaptive
particle swarm optimization. J King Saud Univ Comput Inf Sci. 2018.
24. Richman MB, Trafalis TB, Adrianto I. Missing data imputation through machine learning algorithms. In: Artificial Intelligence Methods in the Environmental
Sciences. Berlin, Germany: Springer; 2009:153-169.
25. Tutz G, Ramzan S. Improved methods for the imputation of missing data by nearest neighbor methods. Comput Stat Data Anal. 2015;90:84-99.
26. Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520-525.
27. García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR, Verleysen M. K nearest neighbours with mutual information for simultaneous classification
and missing data imputation. Neurocomputing. 2009;72(7-9):1483-1493.
28. Abdella M, Marwala T. The use of genetic algorithms and neural networks to approximate missing data in database. In: Proceedings of the IEEE 3rd
International Conference on Computational Cybernetics (ICCC 2005); 2005; Mauritius.
29. Nelwamondo FV, Golding D, Marwala T. A dynamic programming approach to missing data estimation using neural networks. Information Sciences.
2013;237:49-58.
30. Ravi V, Krishna M. A new online data imputation method based on general regression auto associative neural network. Neurocomputing.
2014;138:106-113.
31. Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the 2004 IEEE
International Joint Conference on Neural Networks; 2004; Budapest, Hungary.
32. Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications. Neurocomputing. 2006;70(1-3):489-501.
33. Sovilj D, Eirola E, Miche Y, et al. Extreme learning machine for missing data using multiple imputations. Neurocomputing. 2016;174:220-231.
34. Laña I, Olabarrieta II, Vélez M, Del Ser J. On the imputation of missing data for road traffic forecasting: new insights and novel techniques. Transp Res
C Emerg Technol. 2018;90:18-33.
35. Silva-Ramírez EL, Pino-Mejías R, López-Coello M, Cubiles-de-la- Vega M-D. Missing value imputation on missing completely at random data using
multilayer perceptrons. Neural Networks. 2011;24(1):121-129.
36. Folguera L, Zupan J, Cicerone D, Magallanes JF. Self-organizing maps for imputation of missing data in incomplete data matrices. Chemom Intell Lab
Syst. 2015;143:146-151.
37. Saitoh F. An ensemble model of self-organizing maps for imputation of missing values. In: Proceedings of the 2016 IEEE 9th International Workshop
on Computational Intelligence and Applications (IWCIA); 2016; Hiroshima, Japan.
38. Nishanth KJ, Ravi V. Probabilistic neural network based categorical data imputation. Neurocomputing. 2016;218:17-25.
39. Zhang L, Lu W, Liu X, Pedrycz W, Zhong C. Fuzzy C-Means clustering of incomplete data based on probabilistic information granules of missing values.
Knowl Based Syst. 2016;99:51-70.
40. Li T, Zhang L, Lu W, et al. Interval kernel Fuzzy C-Means clustering of incomplete data. Neurocomputing. 2017;237:316-331.
41. Sefidian AM, Daneshpour N. Missing value imputation using a novel grey based Fuzzy C-Means, mutual information based feature selection, and
regression model. Expert Syst Appl. 2019;115:68-94.
42. Madden S. Intel Berkeley research lab data. 2004.
43. Jang J-SR. ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Man Cybern. 1993;23(3):665-685.
44. Takagi T, Sugeno M. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern. 1985;SMC-15:116-132.
45. Akay D, Chen X, Barnes C, Henson B. ANFIS modeling for predicting affective responses to tactile textures. Hum Factors Ergon Manuf Serv Ind.
2012;22(3):269-281.
46. Mamdani EH, Assilian S. An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man Mach Stud. 1975;7(1):1-13.
47. Werbos P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences [PhD dissertation]. Cambridge, MA: Harvard University;
1974.
48. Zadeh LA. Fuzzy sets. Inf Control. 1965;8(3):338-353. https://doi.org/10.1016/S0019-9958(65)90241-X
49. Zadeh LA. The concept of a linguistic variable and its application to approximate reasoning—I. Information Sciences. 1975;8(3):199-249.
50. Zadeh LA. The concept of a linguistic variable and its application to approximate reasoning—II. Information Sciences. 1975;8(4):301-357.
51. Zadeh LA. The concept of a linguistic variable and its application to approximate reasoning-III. Information Sciences. 1975;9(1):43-80.
52. Hu Y-C. Simple fuzzy grid partition for mining multiple-level fuzzy sequential patterns. Cybern Syst Int J. 2007;38(2):203-228.
53. Cobaner M. Evapotranspiration estimation by two different neuro-fuzzy inference systems. J Hydrol. 2011;398(3-4):292-302. https://doi.org/10.
1016/j.jhydrol.2010.12.030
54. Castellanos F, James N. Average hourly wind speed forecasting with ANFIS. In: Proceedings of the 11th Americas Conference on Wind Engineering
(ACWE); 2009; San Juan, Puerto Rico.
55. Moradi F, Bonakdari H, Kisi O, Ebtehaj I, Shiri J, Gharabaghi B. Abutment scour depth modeling using neuro-fuzzy-embedded techniques.
Mar Georesources Geotechnol. 2018;37(2):190-200.
GUZEL ET AL. 15 of 15

56. Dunn JC. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. 1973;3(3):32-57.
57. Bezdek JC, Ehrlich R, Full W. FCM: the Fuzzy C-Means clustering algorithm. Comput Geosci. 1984;10(2-3):191-203.
58. Fattahi H. Adaptive neuro fuzzy inference system based on fuzzy c–means clustering algorithm, a technique for estimation of TBM penetration rate.
Int J Optim Civ Eng. 2016;6(2):159-171.
59. Abdulshahed AM, Longstaff AP, Fletcher S. The application of ANFIS prediction models for thermal error compensation on CNC machine tools.
Appl Soft Comput. 2015;27:158-168. https://doi.org/10.1016/j.asoc.2014.11.012
60. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation. 1997;9(8):1735-1780.
61. Lipton ZC, Berkowitz J, Elkan C. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019. 2015.
62. Wei D, Wang B, Lin G, et al. Research on unstructured text data mining and fault classification based on RNN-LSTM with malfunction inspection
report. Energies. 2017;10(3):406.
63. Abadi M, Barham P, Chen J, et al. Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th Usenix Symposium on Operating
Systems Design and Implementation (OSDI'16); 2016; Savannah, GA.
64. Chollet F. Keras. 2015.
65. Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014.

How to cite this article: Guzel M, Kok I, Akay D, Ozdemir S. ANFIS and Deep Learning based missing sensor data prediction in IoT.
Concurrency Computat Pract Exper. 2019;e5400. https://doi.org/10.1002/cpe.5400

You might also like