0% found this document useful (0 votes)

7 views

5ANFIS and Deep Learning based missing sensor data prediction in IoT

This research article presents novel prediction models based on Adaptive-Network based Fuzzy Inference System (ANFIS) and Deep Learning (DL) to address the issue of missing sensor data in the Internet of Things (IoT) ecosystem. The authors optimize the parameters of both ANFIS and Long Short Term Memory (LSTM) networks and validate their performance using the Intel Berkeley Lab dataset. Experimental results indicate that the proposed models significantly improve prediction accuracy for missing sensor data.

Uploaded by

birinchi.orcheetech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

5ANFIS and Deep Learning based missing sensor data prediction in IoT

Uploaded by

birinchi.orcheetech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Received: 1 February 2019 Revised: 27 April 2019 Accepted: 18 May 2019

DOI: 10.1002/cpe.5400

RESEARCH ARTICLE

ANFIS and Deep Learning based missing sensor data

prediction in IoT

Metehan Guzel1 Ibrahim Kok2 Diyar Akay3 Suat Ozdemir4

1 Department of Computer Engineering,

Graduate School of Natural and Applied Summary

Sciences, Gazi University, Ankara, Turkey
2 Department of Computer Sciences,
Internet of Things (IoT) consists of billions of devices that generate big data which is characterized
Informatics Institute, Gazi University, by the large volume, velocity, and heterogeneity. In the heterogeneous IoT ecosystem, it is not
Ankara, Turkey so surprising that these sensor-generated data are considered to be noisy, uncertain, erroneous,
3 Department of Industrial Engineering, Faculty
and missing due to the lack of battery power, communication errors, and malfunctioning devices.
of Engineering, Gazi University, Ankara, Turkey
4 Department of Computer Engineering, This paper presents Deep Learning (DL) and Adaptive-Network based Fuzzy Inference System
Faculty of Engineering, Gazi University, (ANFIS) based prediction models for missing sensor data problem in IoT ecosystem. First, we
Ankara, Turkey build ANFIS based models and optimize their parameters. Then, we construct DL based models
by using Long Short Term Memory (LSTM) network structure and optimize its parameters by
Correspondence
Suat Ozdemir, Department of Computer applying the grid search method. Finally, we evaluate all the proposed models with Intel Berkeley
Engineering, Faculty of Engineering, Lab dataset. Experimental results demonstrate that the proposed models can significantly
Gazi University, 06570 Ankara, Turkey.
improve the prediction accuracy and may be promising for missing sensor data prediction.
Email: [email protected]

Funding information KEYWORDS

The Scientific and Technological Research adaptive-network based fuzzy inference system (ANFIS), Deep Learning, Internet of Things (IoT),
Council of Turkey (TUBITAK), Grant/Award
IoT data analysis, missing sensor data prediction
Number: 118E212

1 INTRODUCTION

The concept of the Internet of Things (IoT) refers to a sensor-rich world where physical objects in our environment are increasingly enriched
with computing, sensing, and communication capabilities. Sensor technology is one of the core enabling technologies in this world. Sensors
are utilized to collect the large amount of heterogeneous data for large scale IoT applications such as environmental monitoring, e-health,
intelligent transportation systems, military, smart agriculture,1 and industrial plant monitoring.2 Sensors and connected devices with diverse digital
technologies generate an excessive amount of data, which are multi-source, real-time, dynamic, sparse, highly heterogeneous, and semantically
rich. In the large scale IoT platforms, due to the lack of battery power, communication errors, and malfunctioning devices, sensor-generated data
are considered to be inherently noisy, uncertain, erroneous, and missing.3 Therefore, data generation and quality become a critical issue in data
processing and analysis.
In this work, we address the problem of missing sensor data. This problem is very common in IoT for various reasons, such as unstable
network communication, synchronization issues, unreliable sensors, and other types of equipment failure. Eliminating missing data results in
loss of information and may lead to incorrect analytical results. Thus, the prediction and assessment of missing values become an imperative
task.4 Hence, there is still a need for novel prediction models to predict the missing data. In order to address the missing sensor data problem,
we propose two prediction models based on Deep Learning (DL) and Adaptive-Network based Fuzzy Inference System (ANFIS). We focus on
sensory-rich IoT applications, where our models learn how to infer the missing data from different sensors' data optimally.
ANFIS is a soft computing method that combines the advantages of Artificial Neural Networks (ANN) and Fuzzy Inference Systems (FIS).
ANFIS has high generalization ability supported with fast and accurate learning phase.5 Based on this, we decided to solve the missing sensor
data prediction problem with ANFIS.
Recently, DL is attracting widespread interest in academic and industrial fields due to the state of art performance in many domains such as
computer vision, natural language processing, speech recognition, visual object recognition, and many other domains.6 DL also indicates good

Concurrency Computat Pract Exper. 2019;e5400. wileyonlinelibrary.com/journal/cpe © 2019 John Wiley & Sons, Ltd. 1 of 15
https://doi.org/10.1002/cpe.5400
2 of 15 GUZEL ET AL.

potential for analyzing vast volumes of data and discriminative tasks such as a classification and prediction.7 We secondly employ DL for the
missing data problem due to its predictive analytics' power for large-scale data sets. The key contributions of this paper are summarized as follows.
• We propose novel prediction models based on ANFIS and DL to solve the missing data problem in IoT.
• We conduct extensive experiments to validate the performance of the proposed prediction models.
The rest of this paper is organized as follows. Section 2 includes related works on the missing sensor data prediction. Section 3 explains
the dataset and model descriptions. Section 4 and Section 5 introduce the details of the proposed prediction models. Section 6 presents the
experimental results and comparison of prediction models. Finally, Section 7 concludes the paper.

2 RELATED WORK

The presence of missing/corrupted values in the databases/datasets has been a big problem for decades. In addition, with the newly emerged
concepts like Wireless Sensor Networks (WSN) and IoT, total data generation speed has skyrocketed, but the quality/reliability of equipment
went down. That caused more and more missing values. Therefore, numerous research has been conducted to overcome this problem. In this
section, we briefly present the proposed methods to estimate missing data and show a general overview of the literature in this domain. It must
be noted that, in compliance with the scope of this paper, we exclude temporal and spatiotemporal estimation methods. We grouped methods
into three categories, namely, Statistical Methods, Optimization Methods, and Machine Learning Methods.

2.1 Statistical methods

Statistical methods are the oldest set of methods used for estimation. Regression methodology is commonly used among statistical methods.
In the work of Qin et al,8 regression model is proposed for imputation of missing values. Their proposed method aims to minimize RMSE
metric after every imputation by optimizing regression model. In one aspect, random and scholastic approaches are utilized and compared to
conventional deterministic approach. The second aspect of the work is focused on the use of semi-parametric approach. The semi-parametric
approach is utilized and compared to non-parametric and linear approaches. The proposed semi-parametric regression model showed significant
improvement over conventional methods on both synthetic and real datasets.
Hidden Markov Model (HMM) is a stochastic finite state machine that is composed of four elements.9 These elements are (i) states, (ii) possible
observations, (iii) state transmission probability matrix, and (iv) emission probability matrix.10 HMM has a number of advantages, but in the scope
of this paper, we would like to focus on HMM's prediction capability. With its strong statistical foundation,11 HMM can detect patterns on time
series efficiently.9-11 In the work of Hassan and Nath,11 HMM is used to predict stock market prices. The proposed model takes four inputs,
which are opening, closing, high, and low prices of the current day. Using these inputs, the model predicts closing price of tomorrow. The results
acquired justifies the use of HMM on time series prediction.11 In the work of Li et al,12 HMM is used to predict Virtual Machine (VM) failures in
cloud platforms. The proposed hybrid method which is composed of AdaBoost algorithm and HMM can predict failures of VM's by observing
CPU, memory, network, and VM load.
Expected maximization (EM) algorithm is another approach used for estimation. EM algorithm expresses data points as mixture of models.13
To optimize models, maximum likelihood method is used. Mentioned ‘‘models’’ of EM algorithm can differ, but the most used model description
is based on gaussian distribution. In this approach, data points are expressed as a mixture of gaussian models and likelihood's are calculated
accordingly. However, gaussian mixture models suffer from the curse of dimensionality, the high number of features causes high computational
complexity. Even in some cases, a gaussian model cannot be fitted to the distribution of data. Therefore, the use of mixture models requires an
approach to overcome this problem. In the work of Delalleau et al,14 an EM algorithm that features gaussian models is used for dealing with
missing data. Their proposal uses a tree based approach to deal with curse of dimensionality. The proposed method is composed of 5 steps as
follows. (i) Unique missing patterns in the dataset are identified. (ii) Graph representation of missing patterns are fitted into a minimum spanning
tree. (iii) Minimum spanning tree is used to deduce ordering of training samples. (iv) Mean values for mixture models are calculated using k-Means
and covariances of models are initialized using empirical covariances of each cluster. (v) Iteration is performed through EM steps. In short, the
proposed approach optimizes the samples ordering with a minimum spanning tree to reduce computational cost and, for missing data estimation,
intra-cluster mean values are used. In the work of Eirola et al,15 EM algorithm based on gaussian mixture models are used to deal with missing
data problem. The method is composed of 4 steps as follows: (i) Fit a Gaussian mixture model by the EM algorithm; (ii) calculate log-likelihood and
Akaike information criteria (AICC ); (iii) choose the model which minimizes AICC and calculate mean and variance values for each missing data using
the chosen model; and (iv) perform task using conditional mean and variances calculated for missing data. AICC is susceptible to high dimensional
data, with high dimensional datasets, validity of AICC vanishes.15 To overcome this problem, a clustering based method High Dimensional Data
Clustering (HDDC)16 is utilized.

2.2 Optimization methods

Interpreting missing data estimation as an optimization problem is an alternative and rather innovative approach. Optimization algorithms use
evolution-like approach to reach a near-optimal solution in NP-hard problems where search space is vast. Despite the fact that missing data
GUZEL ET AL. 3 of 15

problem is not a native optimization problem, these algorithms ability to quickly explore search space increases the applicability in missing data
domain. Among optimization algorithms, genetic algorithm (GA), ant colony optimization (ACO), and particle swarm optimization (PSO) algorithms
are commonly used.
GA imitates the evolution of species.17 For a given problem, a population composed of candidate solutions is generated. Each solution is
evaluated according to a fitness function. Then, new populations are generated from the most successful individuals of previous population
iteratively. By utilizing addition of new random solutions and mutations at each generation, problem of falling into local minimum pits is avoided.
GA is one of the most frequently used algorithms in numerous domains and missing value estimation is not an exception. In the work of
García et al,18 GA is used to impute missing values. The proposed method handles the data as matrix and aims to find missing values in a way
that does not alter statistical characteristic of initial dataset. Suppose that X is the dataset matrix with missing values, Y is a matrix composed
of missing values and combination of X and Y, and X ̂ is the completed dataset. The method tries to find Y, which minimizes the difference of
̂ Fitness function is constructed upon this criteria and candidate Y matrixes are handled as individuals of population.
statistics between X and X.
The GA based approach is compared to EM and auxiliary regression based estimation models and surpasses methods in manner of preserving
statistical variables. Furthermore, GA based method claimed to be more flexible and responsive.18 Another GA based approach is utilized in the
work of Lobato et al.19 The proposed MOGAImp is a multi-objective GA (MOGA), which is based on NSGA-II.20 The reason behind using a
MOGA is to be able to optimize missing value selections on different metrics. In MOGAImp, used metrics are classification accuracy and RMSE.
Fitness function of is constructed upon these two metrics. The complete set of missing values is encapsulated as a single individual of population
and phases of GA are performed.
ACO based methods are easily applicable if data can be formulated as a graph problem.21 Parallel to this principle, in the work of Priya et al,21
missing value estimation problem formulated as a graph problem and an ACO based method is proposed. In conversion to graph, each covariant
is turned into a level composed of covariant values and the target (missing) attribute is turned into final level. On this graph, an ACO based
method, namely, Dual Repopulated Bayesian ACO (DPBACO), is applied. The reason behind this selection is ACO's susceptibility to fall into
local minimum. By duplicating population (main and reserve population) and crossing over individuals from different population in each iteration,
DPBACO increases variety, therefore overcomes the local minimum problem. Adding different Bayesian functions to ant traversal is applicable
to different data characteristics.
PSO22 is a stochastic swarm intelligence based optimization algorithm that is frequently utilized because of its simplicity, accuracy, and fast
convergence ability.17 In the work of Nekouie and Moattar,23 PSO based hybrid missing value estimation method is used on breast cancer
diagnosis data. To overcome PSO's weakness of getting stuck in local optimum, chaotic reduced adaptive PSO (CRAPSO) is employed. The
proposed method firstly generates a set of values to impute missing one using Bayesian networks. In the next step, tensor is used for estimation.
Tensor-based estimation is performed by calculating missing attribute as a linear function of present attributes. However, in case of data
insufficiency, like other mean square error minimization based models, tensor based estimation model suffers from accuracy loss. Therefore, an
automatic data generation phase that utilizes CRAPSO is placed before tensor phase. CRAPSO and tensor phases run iteratively until convergence
is achieved. After convergence, the acquired results are used for imputation.
In addition to stand-alone solutions, optimization methods are generally used for optimizing machine learning algorithms that are used for
missing data estimation. Research works that fall under this category are given in the next section.

2.3 Machine learning methods

Recently, machine learning (ML) methods are gaining popularity for missing data estimation.24 Among ML methods, k-Nearest Neighbor (k-NN)
algorithm is one of the most popular ones. Despite the fact that k-NN is a classification algorithm, it is highly applicable to missing data problem,
especially if data dimensionality is high and number of observations is low.25 In case of a missing data, a number of (k, a predetermined number)
of similar samples are selected and used for estimation. In one of the earlier research works, k-NNimpute is proposed and implemented on
DNA microarrays.26 Proposed k-NNimpute treats every attribute in an equal manner. In the work of García-Laencina,27 a weighting approach
is utilized upon attributes based on attributes' affinity with the class label. The proposed method enhances k-NNimpute in manner of accuracy.
Another weighted k-NN method is proposed in the work of Tutz and Ramzan,25 where a subset of attributes are selected and used for distance
calculations in a weighted manner.
ANN is a bio-inspired method that mimics the neuronic structure of the human brain. In missing data problem, numerous research featuring
different types of ANN are proposed. An auto-associative NN (Neural Network) creates a bottleneck between input and output layers and has
a remarkable ability to learn linear and non-linear relationships. In the works of Abdella and Marwala28 and Nelwamondo et al,29 auto-encoder
NNs are utilized for missing data problem, and a GA is utilized to optimize error function. In the work of Ravi and Krishna,30 Particle Swarm
Optimization is employed to optimize the hidden layer in auto-associative NN. Another NN type used is Extreme Learning Machines (ELM), a
feed-forward neural network does not require updating of weights.31,32 Due to this characteristic, ELM has a significantly shorter training time
in comparison to networks using backpropagation. In the work of Sovilj et al,33 a hybrid method that utilizes Gaussian mixture models (EM
algorithm) and ELM is proposed. It is based on performing multiple imputations via EM algorithms, generating an ELM for each imputed dataset,
and combining all ELM to perform a final estimation. A rather refined approach is proposed in the work of Laña et al,34 where a GA optimized
4 of 15 GUZEL ET AL.

ELM is applied. Except than the mentioned earlier, research works featuring multi-layer perceptron networks (MLP),35 self-organizing maps
(SOM),36,37 probabilistic NNs,38 and other types of NN are also present in the literature.
Clustering methodologies are another ML method used for estimation. Missing data can be estimated from the other data that share the same
cluster. Fuzzy c-Means (FCM) is a fuzzy based clustering algorithm that allows inter-lapping between different clusters. In other works,39-41 FCM
based methods are used for missing data problem.

3 DATASET AND MODEL DESCRIPTIONS

3.1 Dataset
In this paper, we used the Intel Berkeley Research Lab dataset, which is publicly available. This dataset is collected from 54 sensors, which
were deployed in the Intel research laboratory at Berkeley between February 28 and April 5, 2004. It contains 2.3 million sensor readings
with time-stamped topology information, humidity, temperature, light intensity, and voltage values in ‘‘date:yyyy-mm-dd, time:hh:mm:ss.xxx,
epoch:int, moteid:int, temperature:real, humidity:real, light:real, voltage:real‘‘ format. Herein, the temperature unit is degrees Celsius. Humidity unit
is temperature corrected relative humidity, ranging from 0-100%. Light intensity is in lux and voltage is expressed in volts.42 The sensors and
sensor ids were arranged in the lab according to the diagram given in Figure 1.
In this study, humidity, temperature, and light intensity observations of 19th, 20th, and 21st sensors are used for evaluating the proposed
prediction models. The reason behind this selection is the completeness of the mentioned nodes. Most of the nodes in the dataset have missing
or corrupted readings. The selected nodes have relatively higher data density, especially between the 29th of February and the 7th of March.
Eight-day period between these dates has 100% density for observations when sensor reading are grouped by 3-minute intervals. Another
reason for the selection of nodes is the proximity of node locations. The selected nodes are adjacent to each other, which ensures similar sensing
environment. This enables us to use data in two different forms:

• Merging nodes' reading data together and process like all data is coming from a single node. This approach is utilized at DL based estimation
model.
• Using readings from different nodes separately to evaluate a single model with three different data sources. This approach is utilized at ANFIS
based estimation model.

3.2 Descriptive statistical analysis

In this section, we calculated the descriptive statistics and correlation for each sensor value type of the dataset. In the light of the descriptive
statistics given in Table 1, it has been observed that sensor values have an unbalanced data characteristic and not normally distributed.

FIGURE 1 Sensor locations and

selected nodes in the Intel
Berkeley Research Lab42

TABLE 1 Descriptive statistics of dataset Features

Statistical Parameters Temperature Humidity Light
Min 15.270 14.430 0.460
Max 37.660 51.720 1847.360
Mean 23.216 35.481 323.887
Standard deviation 4.651 7.767 436.857
GUZEL ET AL. 5 of 15

FIGURE 2 Spearman correlation matrices of sensor values

Model Abbreviation Input I Input II Output TABLE 2 Inputs and output of the models
Mdl1 Humidity Light Temperature
Mdl2 Temperature Light Humidity
Mdl3 Temperature Humidty Light

Due to the not-normally distribution, the Spearman correlation coefficient is calculated among sensor values. The correlation matrix obtained
from the calculation is given in Figure 2.
It is used to determine the relationship between the inputs and outputs of the proposed models and to interpret the overall results. The fact
that inputs represent the output well means that there is a high correlation between inputs and output, which will contribute positively to the
performance of the models. The proposed models produce high accuracy prediction results when the correlation between the sensor nodes is
high. The accuracy decreases when the correlation becomes less. According to Figure 2, the highest correlation values between input and output
sensor values are obtained as Mdl1, Mdl2, and Mdl3, respectively.
Due to an unbalanced data characteristic, we also performed Min-Max Normalization to change their values to a common scale, without
distorting differences in the ranges of values. These normalized values were used in the training and testing processes of both DL and ANFIS
based models.

3.3 Inputs and outputs of the models

In this paper, we aimed to construct models that are capable of estimating the value of a sensor using the other two sensor's values of the same
time. The used sensor values are temperature, humidity, and light. We defined the three models given in Table 2. The abbreviations defined in
Table 2 are used throughout the paper.
To provide for a better understanding, visualization of models and actions taken in case of a missing value are given in Figure 3. Figure 3
depicts a sensor value stream between times {t − 2} and {t + 4}. In the figure, Tempx , Humidx , and Lightx stand respectively for temperature,
humidity, and light intensity observation of time (x). As seen from the figure, missing sensor value incidents occur at times {t − 1, t + 2, t + 4} and
missing value is predicted from other readings of the same time. Flowchart representation of the proposed models application on an IoT data
stream is also given in Figure 4.

4 ADAPTIVE-NETWORK BASED FUZZY INFERENCE SYSTEM BASED MISSING VALUE

PREDICTION

In this section, fuzzy logic based method utilized for missing sensor data prediction, ANFIS,43 and predecessor of ANFIS, FIS44 are briefly explained.

4.1 Adaptive-network based fuzzy inference system

Estimation models based on mathematical/statistical methods perform remarkably well on mathematically formulable problems. However,
real-world problems, which are usually ill-defined and uncertain, do not fall under this category. In this type of problems, FIS is one of the
frequently used approaches. An FIS model employs non-linear mapping from an input space to an output space using fuzzy rules.45 Rules used
by FIS are generated by humans.45 Therefore, the success of model hugely depends on the expertise of the expert human who generates the
rules. There are two widely used FIS types present in the literature, ie, Mamdani-type,46 and Sugeno-type.44 At some cases, it might be quite
challenging to find suitable experts or create an accurate rule-base. Besides, even if a rule-base is acquired, tuning of membership functions'
parameters is still a requirement. ANFIS is proposed to overcome this challenge. Proposed by Jang,43 ANFIS method utilizes ANN to optimize FIS
parameters and only requires to input-output tuples to create a rule-base.
Takagi-Sugeno's method of reasoning44 is considered to be a good approximator.45 Fuzzy reasoning creates a rule for each if-then state. Each
of the fuzzy rules generated by Sugeno-type reasoning has a single output that is a linear combination of inputs and a constant term. The output
6 of 15 GUZEL ET AL.

Light Related
Time Temperature Humidity
Density Model

t-2 Tempt-2 Humidt-2 Lightt-2 -

t-1 ??? Humidt-1 Lightt-1 Mdl1 Humidt-1 Lightt-1 > Tempt-1

t Tempt Humidt Lightt -

t+1 Tempt+1 Humidt+1 Lightt+1 -

t+2 Tempt+2 ??? Lightt+2 Mdl2 Tempt+2 Lightt+2 > Humidt+2

t+3 Tempt+3 Humidt+3 Lightt+3 -

t+4 Tempt+4 Humidt+4 ??? Mdl3 Tempt+4 Humidt+4 > Lightt+4

FIGURE 3 Missing sensor value situations (left) and actions (right) taken in missing sensor value occurences

FIGURE 4 Flowchart representation of proposed models application on IoT data streams

of the system is calculated as a weighted average of the output of each rule. To present ANFIS, an FIS with a 2-inputs where each input is
assumed to have two fuzzy linguistic terms is considered;
Rule 1: IF (x = A1 ) AND (y = B1 ) THEN f11 = p11 x + q11 y + r11
Rule 2: IF (x = A1 ) AND (y = B2 ) THEN f12 = p12 x + q12 y + r12
Rule 3: IF (x = A2 ) AND (y = B1 ) THEN f21 = p21 x + q21 y + r21
Rule 4: IF (x = A2 ) AND (y = B2 ) THEN f22 = p22 x + q22 y + r22
{pij , qij , rij } are the parameters that are determined during the training phase of ANFIS and {Ai , Bj } are fuzzy terms that are used for defining
data points.
GUZEL ET AL. 7 of 15

FIGURE 5 Sample ANFIS architecture

Figure 5 is an ANFIS structure which has two inputs (x, y) and an output (f). In the figure, circle nodes are fixed nodes that does not change
throughout the training phase, whereas square nodes are adaptive nodes that are calibrated through the training phase. An ANFIS is consisted
of 5 layers.
Layer 1: Every node in first layer is adaptive and calculates degree of membership value for each input variable. For a 2-input model, node
functions for each input are given as Equation (1) and Equation (2), ie,

O1,i = 𝜇Ai (x), i = 1, 2 (1)

O1,j = 𝜇Bj (y), j = 1, 2. (2)

In Equation (1) and Equation (2), 𝜇 Ai and 𝜇Bj are the selected membership functions. These functions can be Gaussian membership function
(given in Equation (3)), generalized bell membership function (given in Equation (4)), or another one, ie,

[ ( )2 ]
x − ci
𝜇Ai (x) = exp − (3)
2ai

2b
1
𝜇Ai (x) = . (4)
| x−ci |
1+| a |
| i |

In Equation (3) and Equation (4), {ai , bi , ci } are the parameters of membership function and can change shape of the function. They are referred
as premise parameters.
∏
Layer 2: Every node in this layer is fixed and labeled with . Nodes in this layer multiply incoming signals and send the product to the next
layer. Output of each node symbolizes firing strength of each rule. Output function of nodes in this layer is given as Equation (5), ie,

O2i,j = wi,j = 𝜇Ai (x) × 𝜇Bj (y), i, j = 1, 2 (5)

Layer 3: Every node in this layer is fixed and labeled with N. Nodes in this layer normalizes firing strengths of rules. Every ith node calculates
ratio of ith rule to sum of all rules' firing strengths using Equation (6).

wi,j
O3i,j = wi,j = ∑ , i, j = 1, 2. (6)
wi,j

Layer 4: Every node in this layer is adaptable with a node function given as Equation (7), ie,

O4i,j = wi,j fi,j = wi,j (pij x + qij y + rij ), i, j = 1, 2. (7)

Output of ith node is wi . Variables {pij , qij , rij } are referred as consequent parameters.
Layer 5: The fifth layer is the output layer of ANFIS structure and contains a single node that performs summation of all signals from the fourth
∑
layer. The node in this layer is labeled as and performs summation using Equation (8), ie,

∑∑
Output = O5 = wi,j fi,j i, j = 1, 2. (8)
i j

As mentioned, ANFIS structure has two adaptive layers, namely, the first and the fourth layers. Their ability to adapt roots from parameters
of these layers, namely, premise parameters of the first layer and consequent parameters of the fourth layer. The training phase of ANFIS consists
8 of 15 GUZEL ET AL.

of tuning of the premise and consequent parameter. For this purpose, ANFIS utilizes a hybrid learning algorithm.43 This algorithm is composed
of two passes. A forward pass is used for tuning of consequent parameters and a backward pass is used for tuning of premise parameters. In
the forward pass, premise parameters are fixed and signals proceed to layer four. In the fourth layer, consequent parameters are determined by
using the least square method. In the backward pass, consequent parameters are fixed, error rates propagate back to the first layer. In the first
layer, premise parameters are tuned based on membership function using Gradient Descent method.47

4.2 ANFIS clustering

As mentioned earlier, Sugeno-type ANFIS method utilizes a reasoning mechanism based on fuzzy IF-THEN rules. In FIS models, fuzzy terms are
used for defining variables, therefore choosing rules. However, in ANFIS, there are no fuzzy terms generated by a human expert. Instead, clusters
are used. All input and output variables are clustered and these clusters are used for rule generation. Numerous clustering methods are present
in the literature. In this work, three of these methods are utilized, namely, Grid Partitioning (GP), Subtractive Clustering (SC), and Fuzzy C-Means
Clustering (FCM).
Based on Zadeh's proposal of fuzzy sets48 and linguistic terms,49-51 GP is a widely used clustering method for obtaining fuzzy inference
systems. In this method, input-output couplings create a surface and this surface partitioned to grids based on linguistic labels of inputs. Each
grid represents a fuzzy inference area and creates a rule for ANFIS. It also must be noted that quantitative values can be clustered and treated
as linguistic terms.52 That flexibility allows GP to be used without prior knowledge or human expertise. The drawback of the GP method is its
vulnerability against the number of input variables. The number of rules increases exponentially as input variable number increases. Furthermore,
GP requires a large number of observations to perform well. That being said, GP model is suitable when number of variables are low and the size
of observations is high.53
SC method treats each data point as a potential cluster center and calculates a measurement of possibility based on density. Data points that
already formed a cluster subtracted from the complete set of data points, remaining non-cluster-member data points, are used for generation of
new cluster centers. Unlike GP, SC can be used for models that have high number of inputs because the number of rules of an SC based ANFIS
model is determined by number of clusters.54 However, number of observations is a problem for SC, high number of observations causes a high
computational complexity, which effects run-times. Furthermore, calibration of the cluster radius is crucial for performance of SC. Small cluster
radius results a high number of rules. On the other hand, a big radius results with highly general clusters, which causes non-acceptable results.55
FCM method is firstly introduced by Dunn56 and later improved by Bezdek.57 In FCM, each data point belongs to a fuzzy cluster with a
membership degree and can belong to multiple clusters with different membership grades.58,59 This approach removes sharp boundaries between
clusters.59 It must be noted that FCM clustering method partitions data points to a pre-determined number of clusters. In this research, we used
MATLAB implementation of FCM Clustering, which determines the number of clusters using SC before performing FCM clustering.

4.3 ANFIS optimization

To reveal the true potential of ANFIS, optimization of clustering methods poses crucial importance. Based on this, first,a set of experiments is
conducted to optimize parameters of clustering methods. In our experiment method, all parameters are fixed to the default value of clustering
method except the one that is being investigated. Investigated parameter changes in range of pre-determined values and for each value tests
are performed. Parameters' effect on clustering methods are examined for each model. Results of examination are used for generating new
test scenarios in a way that combines high performance parameter selections together. New test scenarios are performed and best performing
parameter configurations are acquired.

4.3.1 Grid partitioning parameter optimization

GP method has three parameters, namely, ‘‘Input Membership Function Type’’ (IMFT), ‘‘Number of Membership Functions’’ (NMF), and ‘‘Output
Membership Function Type’’ (OMFT).

• IMFT is crucial for the performance of fuzzy sets. Membership function calculates a membership degree between [0,1] for data points. In
this research eight different membership functions (mf) are used: generalized bell-shaped mf, gaussian curve mf, gaussian combination mf,
triangular-shaped mf, trapezoid-shaped mf, difference between two sigmoidal mf, product of two sigmoidal mf, and pi-shaped mf. IMFT is
specified for each input parameter.
• NMF specifies the number of membership functions for each input variable and directly effects number of rules. In this research, cluster
numbers between {2} and {10} are tested for all models. NMF is specified for each input parameter.
• OMFT specifies the type of membership function for output which can be linear or constant.

Best performing parameter configuration for GP is given in Table 3, where IMFT Input I, IMFT Input II, NMF Input I, NMF Input II, and OMFT
respectively stand for IMFT for the first input, IMFT for second input, NMF for first input, NMF for second input, and OMFT of ANFIS model.
GUZEL ET AL. 9 of 15

Model IMFT IMFT NMF NMF OMFT TABLE 3 Best performing parameter configurations for GP based ANFIS
Input I Input II Input I Input II
Mdl1 gaussmf trimf 2 3 linear
Mdl2 trimf trapmf 2 constant
Mdl3 gaussmf gaussmf 2 2 constant

Model CIR CIR CIR SF AR RR TABLE 4 Best performing parameter configurations for SC based ANFIS
Input I Input II Output
Mdl1 0.90 0.30 0.30 0.90 0.50 0.25
Mdl2 0.70 0.30 0.70 0.90 0.50 0.20
Mdl3 0.70 0.60 0.70 0.90 0.50 0.30

Model CN Expo MNI MI TABLE 5 Best performing parameter configurations for FCM based ANFIS
Mdl1 3 1.2 15 1.00E-5
Mdl2 2 1.2 50 1.00E-5
Mdl3 3 1.2 10 1.00E-5

4.3.2 Subtractive clustering parameter optimization

SC method has four parameters. These are Cluster Influence Range (CIR), Squash Factor (SF), Accept Ratio (AR), and Reject Ratio (RR).

• CIR is the influence range of clusters. Default value of CIR is {0.5}. In this research, input and output CIR values are tested with values
between {0.1} and {0.9}.
• SF is the factor used for scaling of influence range. Default value of SF is {1.25}. In this research, SF value is tested with values between {0.3}
and {1.50}.
• AR is used for acceptance of new clusters, values between {0.30} and {0.95} are used for tests but no significant effect of AR is observed.
• RR used for rejection of new clusters. RR values between {0.05} and {0.30} are used for testing.

Best performing parameter configuration for SC is given in Table 4, where CIR Input I, CIR Input II, and CIR Output respectively stand for CIR
for first input, CIR for second input, and CIR for output.

4.3.3 Fuzzy C-means clustering parameter optimization

FCM Clustering has four parameters, namely, Number of Clusters (NC), Exponent (Expo), Maximum Number of Iterations (MNI), and Minimum
Improvement of Objection Function (MI). Among these parameters, MNI and MI are related to termination of clustering process. Process
terminates when improvement between two iterations falls under MI or number of iterations reach to MNI.

• NC specifies number of clusters, therefore directly affects the number of generated rules. In implementation, if not specified, NC is decided
by subtractive clustering phase, which has a cluster range of {0.5}. In our parameter tests, default option and cluster numbers between {2} and
{50} are used for testing.
• Expo controls fuzzy overlapping between clusters, has a default value of {2.0} in MATLAB implementation of SC. In this research, values
between {1.2} and {3.0} are tested.
• MNI is the number of iterations in the training phase. In this research, MNI values between {5} and {50} are tested.
• MI is the minimum improvement value that is used for termination of algorithm. MI has a default value of {1.00E-5}. Values
{1.00E-4,1.00E-5,1.00E-6} are tested.

Best performing parameter configuration for FCM is given in Table 5.

5 DEEP LEARNING BASED MISSING VALUE PREDICTION

In this section, we introduce the proposed DL models and Long Short Term Memory (LSTM) network. Then, parameter optimization and training
processes are explained, respectively.

5.1 Deep Learning based prediction models

In this section, we have developed three DL models based on the LSTM network structure. The proposed DL models aim to perform data analysis
operations quickly and effectively by estimating the sensor values, which may be missing in real-time data analysis. Since there are three types
10 of 15 GUZEL ET AL.

FIGURE 6 General architecture of Deep Learning models (Left), LSTM Memory Block (Right)

of sensor data in the data set, three different models are proposed for the estimation of each type. The relevant model will work and complete
the missing data in case of one of the readings is missing. The overall architecture of the proposed models with all cases is shown in Figure 6.
Accordingly, Mdl1, Mdl2, and Mdl3 are proposed for estimating the missing temperature, humidity, and light sensor values, respectively.
Herein, the models predict temperature, humidity, and light data by taking humidity-light, temperature-light, and temperature-humidity value
pairs as inputs. The input and output values of the models are given in Table 2. In addition, the hyper-parameter values and processes of the
models are given in detail in Section 5.3.

5.2 Long Short Term Memory

Long Short Term Memory (LSTM) is an extension of Recurrent Neural Network (RNN). It has been proposed to overcome training problems such
as vanishing and exploding gradients by Hochreiter and Schmidhuber.60 Thanks to advances in their architecture, RNNs have been found quite
successful in predicting sequential and time series data.6 In LSTM, a memory unit takes the place of each ordinary neuron in the hidden layer
of standard RNN. The LSTM Memory block shown in Figure 6 has an input gate, a forget gate and an output gate which regulate the flow of
information in and out of the cell.61,62 The equations for these gates and cell states are presented as follows:

ft = 𝜎(Wf · [ht−1 , xt ] + bf ) (9)

it = 𝜎(Wi · [ht−1 , xt ] + bi ) (10)

C̃ t = tanh(Wc · [ht−1 , xt ] + bc ) (11)

Ct = ft ∗ Ct−1 + it ∗ C̃ t (12)

ot = 𝜎(Wo · [ht−1 , xt ] + bo ) (13)

ht = ot ∗ tanh Ct , (14)

where xt and ht are input and output vector at time t, C̃ t is the old cell state, Ct is the new cell state, it , ft , and ot are the input, forget, and output
gates, respectively. Wc , Wi , Wf , Wo are the input weights matrices, ∗ is the element-wise product and operates on the two vectors of the same size,
1
bc , bi , bf , bo are the bias vectors. 𝜎(·) represents the logistic sigmoid function, ie, 𝜎(x) = 1
+ e−x and tanh(·) represents hyperbolic tangent function.

5.3 Parameter optimization and training process

Parameter optimization is one of the most important steps to get the most effective results from DL models. Moreover, it is important to use
descriptive statistics to determine the appropriate model parameter for the prediction of each sensor value. In particular, the network parameters
such as LSTM units, batch size, and layer size are carefully identified. Therefore, we applied grid search method under different number of layers
(1, 2, 3), hidden units (50, 120, 240, 480) and batch size (60, 72, 240, 480, 1440) for preliminary model optimization. The obtained optimum
parameters are given in Table 6.
GUZEL ET AL. 11 of 15

Model Name Hidden Layers LSTM Units (L1xL2) Batch Sizes Epochs TABLE 6 Hyper-parameters of the DL models
Mdl1 2 240x240 1440 200
Mdl2 2 240x240 1440 200
Mdl3 2 60x60 60 200

In the process of model construction and training, we use TensorFlow63 and Keras64 framework as program computing environment. Adaptive
Moment Estimation (Adam) optimizer which computes individual adaptive learning rate for different parameters is used to minimize the loss
function.65 A mini-batch strategy is utilized in our implementation to reduce loss fluctuation so the gradients are calculated with respect to
mini-batches.

6 EXPERIMENTAL RESULTS

The performance of the proposed models is evaluated with the Root Mean Squared Error (RMSE) metric, given in Equation (15), ie,

√
√ N
√∑
RMSE = √ (y0 − ye )2 ∕N. (15)
i=1

Here, yo , ye , and N represent the observed sensor value, the estimated sensor value, and the total number of observations, respectively.
RMSE metric is used for the measurement of error amount between the estimated value and real observed value. It must be noted that RMSE
metric changes depending on the value range of variables. To acquire RMSE metrics in a proportional manner, all experiments are conducted on
normalized data.
To verify the prediction accuracy, we compare our models with SVM Regression (SVR) and Gaussian Kernel Regression (GKR), which are two
non-linear regression methods. For comparison, we performed experiments according to the inputs and outputs in Table 2. In these experiments,
each node is addressed as a different data source. Each experiment is conducted on all three data sources using 10-fold cross validation method
which results ten test results in per data source (sensor) and thirty test results per test model. Error ratios of ANFIS, DL, SVR, and GKR based
prediction methods we present throughout this section are the averages of acquired thirty results. Therefore, the results presented in this section
are generalized and effects of data selection is minimized. Experiments of ANFIS based methods, SVR, and GKR are performed on MatLab2018a.
Parameter configurations of SVR and GKR are default parameters that are defined in MATLAB. The normalized RMSE values of all models are
presented in Table 7 and Figure 7.
In case of Mdl1 and Mdl3, the proposed models have lower error ratios than implemented non-linear regression methods (SVR and GKR).
Among the proposed methods, DL based method demonstrates the best performance. In case of Mdl2, DL based method has the lowest error
ratio but error ratios of ANFIS based methods do not show any significant difference when compared to regression models. In total, proposed
models show improved results over SVR and GKR.
Among the models, Mdl1 seems to be the most predictable, which shows us that relations of temperature-light and temperature-humidity
tuples are highly correlated. A similar trend is observed in Mdl2. However, relations between humidity-temperature and humidity-light seem to
have a different characteristic. DL and ANFIS based methods show higher error ratios on Mdl2 compared to Mdl1, unlike SVR and GKR. SVR and
GKR perform better on Mdl2 compared to Mdl1. Mdl3 is the least correlated relation. All methods perform poorly on Mdl3 compared to other
methods.
As seen in Table 7, the proposed ANFIS and DL based methods fall behind regression methods in a timely manner. GKR method has the
lowest training time among all methods. The training time of SVR and GP based ANFIS is also below 1 second for training set composed of 3970
observations. On the other hand, the training time of SC based ANFIS and FCM based ANFIS is relatively longer but the excess time used in
training does not reflect on prediction results. Among ANFIS based methods, all three models have resembling results but GP based method has
significantly lower training time. In the case of Mdl1 and Mdl3, GP based ANFIS outperforms GKR and SVR in manner of prediction accuracy
with reasonable training time.
In DL, training time depends on a large number of factors such as network architecture, output channels, batch sizes, and other hyper-parameters.
Therefore, training times of DL based methods resulted as higher compared to the other methods. In light of the results in Table 7, the training
periods of Mdl1, Mdl2, and Mdl3 lasted for 24.28, 26.63, and 320.82 seconds, respectively. In particular, with lower batch size, Mdl3 has higher
computational complexity than Mdl1 and Mdl2, which causes significantly higher training time. It must be noted that priorities must be set before
method selection. Acquired results show a trade-off between robustness and accuracy among methods. If robustness is the most desirable
aspect, ANFIS based methods (in Mdl1 and Mdl3) and regression models are the right choices. However, if accuracy is the top priority, DL based
method is the right selection on all three models.
In general, the proposed models can improve the prediction accuracy and stability of missing sensor data greatly and effectively. Therefore,
ANFIS and DL based models are promising choices for prediction models.
12 of 15

TABLE 7 Normalized RMSE metrics and training/testing times

Method Mdl1 Mdl2 Mdl3
Train Test Training Testing Train Test Training Testing Train Test Training Testing
Time Time Time Time Time Time

DL 0.0659 0.0701 24.28 0,5168 0.0726 0.0880 26.63 0,5877 0.1101 0.1711 320.82 0,6237
GP-ANFIS 0.1487 0.0990 0.79 0.0086 0.1109 0.1086 0.37 0.0074 0.1096 0.1733 0.49 0.0063
SC-ANFIS 0.1581 0.0979 5.55 0.0081 0.1040 0.1089 5.69 0.0065 0.0905 0.1762 5.85 0.0064
FCM-ANFIS 0.1662 0.1031 1.65 0.0067 0.1106 0.1061 0.60 0.0055 0.0943 0.1751 1.56 0.0062
GKR 0.0907 0.1139 0.07 0.0007 0.1094 0.1079 0.06 0.0006 0.1690 0.1940 0.08 0.0007
SVR 0.1068 0.111 0.44 0.0014 0.1140 0.1055 0.41 0.0015 0.1797 0.1833 0.41 0.0012
GUZEL ET AL.
GUZEL ET AL. 13 of 15

0,20 0,20 0,20

0,15 0,15 0,15

0,10 0,10 0,10

0,05 0,05 0,05

0,00 0,00 0,00

DL GP SC FCM GKR SVR DL GP SC FCM GKR SVR DL GP SC FCM GKR SVR
ANFIS ANFIS ANFIS ANFIS ANFIS ANFIS ANFIS ANFIS ANFIS

Mdl1 Mdl2 Mdl3

FIGURE 7 Performance comparisons of all models

7 CONCLUSION AND FUTURE WORK

Missing sensor values are a big problem for both IoT and WSN. In this work, we proposed two models to tackle this problem, namely, ANFIS
and DL based models. DL models have shown state-of-art performance in computer vision, natural language processing, and robotics. These
models have an interesting potential solution for many areas including classification, prediction, and control problem. On the other hand, ANFIS
is used successfully in controlling, modeling, and parameter estimation of complex systems due to adaptation capability, nonlinear ability, and
rapid learning capacity. The motivation of this paper is to utilize their advantages in the IoT missing sensor data problem. For this purpose, firstly,
optimization processes are carried out for proposed models for identifying the optimal model parameters. Secondly, the models are constructed
by using obtained optimal parameters, and then train and test procedures are performed. The results indicate that both DL and ANFIS methods
are remarkably well in terms of normalized RMSE metrics compared to the selected non-linear regression models. Through comparisons with SVR
and GKR, our proposed models show their advantages on the prediction accuracy. Particularly, DL obviously outperforms the other methods.
Moreover, ANFIS based models work quite well for estimating missing values.
In this work, the use of different sensor data types to estimate a sensor value is investigated. Sensor reading from other sensor nodes and
previous readings from the sensor are completely ignored. Nevertheless, even with the ignored data experiment showed that the proposed
methods perform remarkably well. Based on this, DL and ANFIS based methods deserve further investigation on IoT data analysis problems. Our
next work will be about immersing previous readings and reading of neighbor nodes for the estimation process in a spatiotemporal manner.

ORCID

Suat Ozdemir https://orcid.org/0000-0002-4588-4538

REFERENCES
1. AlZu'bi S, Hawashin B, Mujahed M, Jararweh Y, Gupta BB. An efficient employment of internet of multimedia things in smart and future agriculture.
Multimed Tools Appl. 2019:1-25.
2. Atzori L, Iera A, Morabito G. The internet of things: a survey. Computer Networks. 2010;54(15):2787-2805.
3. Karkouch A, Mousannif H, Al Moatassime H, Noel T. Data quality in internet of things: a state-of-the-art survey. J Netw Comput Appl. 2016;73:57-81.
4. Qin Y, Sheng QZ, Falkner NJG, Dustdar S, Wang H, Vasilakos AV. When things matter: a survey on data-centric internet of things. J Netw Comput
Appl. 2016;64:137-153. https://doi.org/10.1016/j.jnca.2015.12.016
5. Vural Y, Akay D, Pourkashanian M, Ingham DB. Modeling of an intermediate temperature solid oxide fuel cell using the adaptive neuro-fuzzy inference
system (ANFIS). J Fuel Cell Sci Technol. 2010;7(3):034501.
6. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444.
7. Kök I, Simsek MU, Özdemir S. A deep learning model for air quality prediction in smart cities. In: Proceedings of the IEEE International Conference on
Big Data (Big Data); 2017; Boston, MA.
8. Qin Y, Zhang S, Zhu X, Zhang J, Zhang C. Semi-parametric optimization for missing data imputation. Applied Intelligence. 2007;27(1):79-88.
9. AlZu'bi S, AlQatawneh S, ElBes M, Alsmirat M. Transferable HMM probability matrices in multi-orientation geometric medical volumes segmentation.
Concurrency Computat Pract Exper. e5214.
10. AlZu'bi S, Islam N, Abbod M. Enhanced hidden Markov models for accelerating medical volumes segmentation. In: Proceedings of the 2011 IEEE GCC
Conference and Exhibition (GCC); 2011; Dubai, UAE.
11. Hassan MR, Nath B. Stock market forecasting using hidden Markov model: a new approach. In: Proceedings of the 5th International Conference on
Intelligent Systems Design and Applications (ISDA'05); 2005; Warsaw, Poland.
12. Li Z, Liu L, Kong D. Virtual machine failure prediction method based on AdaBoost-hidden Markov model. In: Proceedings of the 2019 International
Conference on Intelligent Transportation, Big Data & Smart City (ICITBS); 2019; Changsha, China.
13. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B (Methodol). 1977;39(1):1-38.
14. Delalleau O, Courville A, Bengio Y. Efficient em training of Gaussian mixtures with missing data. arXiv preprint arXiv:1209.0521. 2012.
14 of 15 GUZEL ET AL.

15. Eirola E, Lendasse A, Vandewalle V, Biernacki C. Mixture of Gaussians for distance estimation with missing data. Neurocomputing. 2014;131:32-42.
16. Bouveyron C, Girard S, Schmid C. High-dimensional data clustering. Comput Stat Data Anal. 2007;52(1):502-519.
17. Elbes M, Alzubi S, Kanan T, Al-Fuqaha A, Hawashin B. A survey on particle swarm optimization with emphasis on engineering and network applications.
Evolutionary Intelligence. 2019;12(2):113-129.
18. García JCF, Kalenatic D, Bello CAL. Missing data imputation in multivariate data by evolutionary algorithms. Comput Hum Behav. 2011;27(5):1468-1474.
19. Lobato F, Sales C, Araujo I, et al. Multi-objective genetic algorithm for missing data imputation. Pattern Recognit Lett. 2015;68:126-131.
20. Deb K, Agrawal S, Pratap A, Meyarivan T. A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In:
Proceedings of the International Conference on Parallel Problem Solving from Nature; 2000; Paris, France.
21. Priya RD, Sivaraj R, Priyaa NS. Heuristically repopulated bayesian ant colony optimization for treating missing values in large databases. Knowl Based
Syst. 2017;133:107-121.
22. Kennedy J. Particle swarm optimization. In: Encyclopedia of Machine Learning. New York, NY: Springer; 2010:760-766.
23. Nekouie A, Moattar MH. Missing value imputation for breast cancer diagnosis data using tensor factorization improved by enhanced reduced adaptive
particle swarm optimization. J King Saud Univ Comput Inf Sci. 2018.
24. Richman MB, Trafalis TB, Adrianto I. Missing data imputation through machine learning algorithms. In: Artificial Intelligence Methods in the Environmental
Sciences. Berlin, Germany: Springer; 2009:153-169.
25. Tutz G, Ramzan S. Improved methods for the imputation of missing data by nearest neighbor methods. Comput Stat Data Anal. 2015;90:84-99.
26. Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520-525.
27. García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR, Verleysen M. K nearest neighbours with mutual information for simultaneous classification
and missing data imputation. Neurocomputing. 2009;72(7-9):1483-1493.
28. Abdella M, Marwala T. The use of genetic algorithms and neural networks to approximate missing data in database. In: Proceedings of the IEEE 3rd
International Conference on Computational Cybernetics (ICCC 2005); 2005; Mauritius.
29. Nelwamondo FV, Golding D, Marwala T. A dynamic programming approach to missing data estimation using neural networks. Information Sciences.
2013;237:49-58.
30. Ravi V, Krishna M. A new online data imputation method based on general regression auto associative neural network. Neurocomputing.
2014;138:106-113.
31. Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the 2004 IEEE
International Joint Conference on Neural Networks; 2004; Budapest, Hungary.
32. Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications. Neurocomputing. 2006;70(1-3):489-501.
33. Sovilj D, Eirola E, Miche Y, et al. Extreme learning machine for missing data using multiple imputations. Neurocomputing. 2016;174:220-231.
34. Laña I, Olabarrieta II, Vélez M, Del Ser J. On the imputation of missing data for road traffic forecasting: new insights and novel techniques. Transp Res
C Emerg Technol. 2018;90:18-33.
35. Silva-Ramírez EL, Pino-Mejías R, López-Coello M, Cubiles-de-la- Vega M-D. Missing value imputation on missing completely at random data using
multilayer perceptrons. Neural Networks. 2011;24(1):121-129.
36. Folguera L, Zupan J, Cicerone D, Magallanes JF. Self-organizing maps for imputation of missing data in incomplete data matrices. Chemom Intell Lab
Syst. 2015;143:146-151.
37. Saitoh F. An ensemble model of self-organizing maps for imputation of missing values. In: Proceedings of the 2016 IEEE 9th International Workshop
on Computational Intelligence and Applications (IWCIA); 2016; Hiroshima, Japan.
38. Nishanth KJ, Ravi V. Probabilistic neural network based categorical data imputation. Neurocomputing. 2016;218:17-25.
39. Zhang L, Lu W, Liu X, Pedrycz W, Zhong C. Fuzzy C-Means clustering of incomplete data based on probabilistic information granules of missing values.
Knowl Based Syst. 2016;99:51-70.
40. Li T, Zhang L, Lu W, et al. Interval kernel Fuzzy C-Means clustering of incomplete data. Neurocomputing. 2017;237:316-331.
41. Sefidian AM, Daneshpour N. Missing value imputation using a novel grey based Fuzzy C-Means, mutual information based feature selection, and
regression model. Expert Syst Appl. 2019;115:68-94.
42. Madden S. Intel Berkeley research lab data. 2004.
43. Jang J-SR. ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Man Cybern. 1993;23(3):665-685.
44. Takagi T, Sugeno M. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern. 1985;SMC-15:116-132.
45. Akay D, Chen X, Barnes C, Henson B. ANFIS modeling for predicting affective responses to tactile textures. Hum Factors Ergon Manuf Serv Ind.
2012;22(3):269-281.
46. Mamdani EH, Assilian S. An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man Mach Stud. 1975;7(1):1-13.
47. Werbos P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences [PhD dissertation]. Cambridge, MA: Harvard University;
1974.
48. Zadeh LA. Fuzzy sets. Inf Control. 1965;8(3):338-353. https://doi.org/10.1016/S0019-9958(65)90241-X
49. Zadeh LA. The concept of a linguistic variable and its application to approximate reasoning—I. Information Sciences. 1975;8(3):199-249.
50. Zadeh LA. The concept of a linguistic variable and its application to approximate reasoning—II. Information Sciences. 1975;8(4):301-357.
51. Zadeh LA. The concept of a linguistic variable and its application to approximate reasoning-III. Information Sciences. 1975;9(1):43-80.
52. Hu Y-C. Simple fuzzy grid partition for mining multiple-level fuzzy sequential patterns. Cybern Syst Int J. 2007;38(2):203-228.
53. Cobaner M. Evapotranspiration estimation by two different neuro-fuzzy inference systems. J Hydrol. 2011;398(3-4):292-302. https://doi.org/10.
1016/j.jhydrol.2010.12.030
54. Castellanos F, James N. Average hourly wind speed forecasting with ANFIS. In: Proceedings of the 11th Americas Conference on Wind Engineering
(ACWE); 2009; San Juan, Puerto Rico.
55. Moradi F, Bonakdari H, Kisi O, Ebtehaj I, Shiri J, Gharabaghi B. Abutment scour depth modeling using neuro-fuzzy-embedded techniques.
Mar Georesources Geotechnol. 2018;37(2):190-200.
GUZEL ET AL. 15 of 15

56. Dunn JC. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. 1973;3(3):32-57.
57. Bezdek JC, Ehrlich R, Full W. FCM: the Fuzzy C-Means clustering algorithm. Comput Geosci. 1984;10(2-3):191-203.
58. Fattahi H. Adaptive neuro fuzzy inference system based on fuzzy c–means clustering algorithm, a technique for estimation of TBM penetration rate.
Int J Optim Civ Eng. 2016;6(2):159-171.
59. Abdulshahed AM, Longstaff AP, Fletcher S. The application of ANFIS prediction models for thermal error compensation on CNC machine tools.
Appl Soft Comput. 2015;27:158-168. https://doi.org/10.1016/j.asoc.2014.11.012
60. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation. 1997;9(8):1735-1780.
61. Lipton ZC, Berkowitz J, Elkan C. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019. 2015.
62. Wei D, Wang B, Lin G, et al. Research on unstructured text data mining and fault classification based on RNN-LSTM with malfunction inspection
report. Energies. 2017;10(3):406.
63. Abadi M, Barham P, Chen J, et al. Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th Usenix Symposium on Operating
Systems Design and Implementation (OSDI'16); 2016; Savannah, GA.
64. Chollet F. Keras. 2015.
65. Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014.

How to cite this article: Guzel M, Kok I, Akay D, Ozdemir S. ANFIS and Deep Learning based missing sensor data prediction in IoT.
Concurrency Computat Pract Exper. 2019;e5400. https://doi.org/10.1002/cpe.5400

Practical Data Analysis
From Everand
Practical Data Analysis
Hector Cuesta
4.5/5 (14)
Fault Detection and Classification in Industrial IoT in Case of Missing Sensor Data
No ratings yet
Fault Detection and Classification in Industrial IoT in Case of Missing Sensor Data
9 pages
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
Fault Detection and Classification in Industrial IoT in Case of Missing Sensor Data
No ratings yet
Fault Detection and Classification in Industrial IoT in Case of Missing Sensor Data
8 pages
Cognitive Computing and Big Data Analytics
From Everand
Cognitive Computing and Big Data Analytics
Judith S. Hurwitz
No ratings yet
AI and ML Innovations in Nanotechnology
From Everand
AI and ML Innovations in Nanotechnology
Dr. Zemelak Goraga
No ratings yet
Data Mining: Concepts, Fundamentals And Applications
From Everand
Data Mining: Concepts, Fundamentals And Applications
Enrico Guardelli
No ratings yet
Internet of Things & Wireless Sensor Network
From Everand
Internet of Things & Wireless Sensor Network
Ajit Singh
No ratings yet
Data Science, AI, and Blockchain: Integrated Approaches
From Everand
Data Science, AI, and Blockchain: Integrated Approaches
Ekaaksh Deshpande
No ratings yet
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
From Everand
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
Bolakale Aremu
5/5 (1)
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
Smarter Decisions – The Intersection of Internet of Things and Decision Science
From Everand
Smarter Decisions – The Intersection of Internet of Things and Decision Science
Jojo Moolayil
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
sensors-23-00170-v3
No ratings yet
sensors-23-00170-v3
16 pages
CRACKING THE CODE: Mastering Machine Learning Algorithms (2024 Guide for Beginners)
From Everand
CRACKING THE CODE: Mastering Machine Learning Algorithms (2024 Guide for Beginners)
MAX HARPER
No ratings yet
Harnessing the Power of AI: A Guide to Making Technology Work for You
From Everand
Harnessing the Power of AI: A Guide to Making Technology Work for You
Roy Hope
No ratings yet
Math for Deep Learning: What You Need to Know to Understand Neural Networks
From Everand
Math for Deep Learning: What You Need to Know to Understand Neural Networks
Ronald T. Kneusel
No ratings yet
Essential Federated Learning: AI at the Edge
From Everand
Essential Federated Learning: AI at the Edge
Robert Johnson
No ratings yet
Computer Skills: Understanding Computer Science and Cyber Security (2 in 1)
From Everand
Computer Skills: Understanding Computer Science and Cyber Security (2 in 1)
Jonathan Rigdon
No ratings yet
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
From Everand
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
Fouad Sabry
No ratings yet
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
Introduction to Machine Learning and Neural Classification
From Everand
Introduction to Machine Learning and Neural Classification
Trilokesh Khatri
No ratings yet
Data Scientist Roadmap
From Everand
Data Scientist Roadmap
Mohammed Ahmed
5/5 (1)
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
From Everand
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
Ahmed Ph. Abbasi
No ratings yet
Few-Shot Machine Learning: Doing More with Less Data
From Everand
Few-Shot Machine Learning: Doing More with Less Data
Robert Johnson
No ratings yet
Computer Vision: Fundamentals and Applications
From Everand
Computer Vision: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Target Recognition: Fundamentals and Applications
From Everand
Automatic Target Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computational Intelligence and Machine Learning Approaches in Biomedical Engineering and Health Care Systems
From Everand
Computational Intelligence and Machine Learning Approaches in Biomedical Engineering and Health Care Systems
PublishDrive
No ratings yet
Data Science: Concepts, Strategies, and Applications
From Everand
Data Science: Concepts, Strategies, and Applications
Zemelak Goraga
No ratings yet
Innovation Landscape brief: Artificial Intelligence and Big Data
From Everand
Innovation Landscape brief: Artificial Intelligence and Big Data
International Renewable Energy Agency (IRENA)
No ratings yet
AI and IoT-based intelligent Health Care & Sanitation
From Everand
AI and IoT-based intelligent Health Care & Sanitation
PublishDrive
No ratings yet
Artificial Intelligence and Natural Algorithms
From Everand
Artificial Intelligence and Natural Algorithms
PublishDrive
No ratings yet
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet
Pragmatic Internet of Everything (IOE) for Smart Cities: 360-Degree Perspective
From Everand
Pragmatic Internet of Everything (IOE) for Smart Cities: 360-Degree Perspective
Satya Prakash Yadav
No ratings yet
Uncertainty Theories and Multisensor Data Fusion
From Everand
Uncertainty Theories and Multisensor Data Fusion
Alain Appriou
No ratings yet
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
From Everand
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
Editor IJSMI
No ratings yet
5 G Technologies
From Everand
5 G Technologies
Ajit Singh
5/5 (2)
Wireless Sensor Network: Smart Monitoring and Autonomous Data Collection for NextGen Robotics
From Everand
Wireless Sensor Network: Smart Monitoring and Autonomous Data Collection for NextGen Robotics
Fouad Sabry
No ratings yet
Industrial Internet of Things: An Introduction
From Everand
Industrial Internet of Things: An Introduction
Sunil Kumar
No ratings yet
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
From Everand
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
M. Sreedevi
No ratings yet
Computer Science: The Complete Guide to Principles and Informatics
From Everand
Computer Science: The Complete Guide to Principles and Informatics
Jonathan Rigdon
No ratings yet
Edge AI Solutions
From Everand
Edge AI Solutions
Kai Turing
No ratings yet
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)
Introduction to Computer Science Unlocking the World of Technology
From Everand
Introduction to Computer Science Unlocking the World of Technology
Benjamin F
No ratings yet
Edge Computing 101: Expert Techniques And Practical Applications
From Everand
Edge Computing 101: Expert Techniques And Practical Applications
Rob Botwright
No ratings yet
A.I. Cancer Timebomb
From Everand
A.I. Cancer Timebomb
charles r giardina
No ratings yet
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Product Failure Prediction With Missing Data Using Graph Neural Networks
No ratings yet
Product Failure Prediction With Missing Data Using Graph Neural Networks
10 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Enhancing Tech Theory
From Everand
Enhancing Tech Theory
T. T. Samuels
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Introduction to computer science vol 1: Introduction to computer science vol 1, #1
From Everand
Introduction to computer science vol 1: Introduction to computer science vol 1, #1
Jm Alexander
No ratings yet
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
From Everand
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
Dr.Chandrakant
No ratings yet
Data Science
From Everand
Data Science
John D. Kelleher
3/5 (8)
Networking Programming with C++: Build Efficient Communication Systems
From Everand
Networking Programming with C++: Build Efficient Communication Systems
Robert Johnson
No ratings yet
Green Industrial Applications of Artificial Intelligence and Internet of Things
From Everand
Green Industrial Applications of Artificial Intelligence and Internet of Things
Biswadip Basu Mallik
No ratings yet
Turabieh2019 PDF
No ratings yet
Turabieh2019 PDF
10 pages
Multi-Attribute Missing Data Reconstruction Based On Adaptive Weighted Nuclear Norm Minimization in Iot
No ratings yet
Multi-Attribute Missing Data Reconstruction Based On Adaptive Weighted Nuclear Norm Minimization in Iot
13 pages
MULTI SENSOR FUSION
No ratings yet
MULTI SENSOR FUSION
19 pages
IMPClustering_Algorithms_based_Noise_Identification_from_Air_Pollution_Monitoring_Data
No ratings yet
IMPClustering_Algorithms_based_Noise_Identification_from_Air_Pollution_Monitoring_Data
6 pages
Label_Noise_Detection_in_IoT_Security_based_on_Decision_Tree_and_Active_Learning
No ratings yet
Label_Noise_Detection_in_IoT_Security_based_on_Decision_Tree_and_Active_Learning
8 pages
3Efficient_adaptive_noise_cancellation_techniques_i
No ratings yet
3Efficient_adaptive_noise_cancellation_techniques_i
6 pages
thesis-Choosing the Efficient Algorithm for Vertex Cover problem
No ratings yet
thesis-Choosing the Efficient Algorithm for Vertex Cover problem
56 pages
SYNOPSIS
No ratings yet
SYNOPSIS
17 pages
How to Create Datasets_ strategies and examples
No ratings yet
How to Create Datasets_ strategies and examples
18 pages
768538051e58eb1457c3d8f703aa640e62bc
No ratings yet
768538051e58eb1457c3d8f703aa640e62bc
14 pages
report
No ratings yet
report
5 pages
YOLO Algorithm_ Real-Time Object Detection from A to Z
No ratings yet
YOLO Algorithm_ Real-Time Object Detection from A to Z
26 pages
5046351cbf3275dcfa
No ratings yet
5046351cbf3275dcfa
8 pages
4796-22345-1-PB
No ratings yet
4796-22345-1-PB
7 pages
presentation
No ratings yet
presentation
2 pages
2022 - Bheem Et Al - Relationship Between Fish Size and Otolith Size of Four Deep-Sea Fishes From The Western - IJMS
No ratings yet
2022 - Bheem Et Al - Relationship Between Fish Size and Otolith Size of Four Deep-Sea Fishes From The Western - IJMS
6 pages
Gpa Salary
No ratings yet
Gpa Salary
14 pages
SSRN Id3799204
No ratings yet
SSRN Id3799204
20 pages
Corporate Entrepreneurship: Application of Moderator Method: BR Bhardwaj Sushil K Momaya
No ratings yet
Corporate Entrepreneurship: Application of Moderator Method: BR Bhardwaj Sushil K Momaya
13 pages
Sudeep Rath Pizza Hut Case Analysis
No ratings yet
Sudeep Rath Pizza Hut Case Analysis
3 pages
Statistics For Business and Economics
No ratings yet
Statistics For Business and Economics
7 pages
Mt551 Data Science Using R (End - sp23)
No ratings yet
Mt551 Data Science Using R (End - sp23)
1 page
LC01 - Introduction and Syllabus
100% (1)
LC01 - Introduction and Syllabus
22 pages
Section and Solution
No ratings yet
Section and Solution
4 pages
Bayesian Methods For Regression Models With Fat Data
No ratings yet
Bayesian Methods For Regression Models With Fat Data
51 pages
The Role of Mathematics in Machine Learning: March 2023
No ratings yet
The Role of Mathematics in Machine Learning: March 2023
13 pages
Business Statistics: Assignment
No ratings yet
Business Statistics: Assignment
3 pages
Seismic Risk and Vulnerability Models Considering Typical Urban Building Portfolios
No ratings yet
Seismic Risk and Vulnerability Models Considering Typical Urban Building Portfolios
36 pages
Download Complete Applied Ordinal Logistic Regression Using Stata by Xing Liu PDF for All Chapters
100% (3)
Download Complete Applied Ordinal Logistic Regression Using Stata by Xing Liu PDF for All Chapters
20 pages
Market Potential and Sales Forecasting
No ratings yet
Market Potential and Sales Forecasting
12 pages
Iso Dis 50006 PDF
100% (1)
Iso Dis 50006 PDF
54 pages
Data Mining Charu C Aggarwal download
100% (1)
Data Mining Charu C Aggarwal download
84 pages
Thank You For Taking The Week 3: Assignment 3. Week 3: Assignment 3
No ratings yet
Thank You For Taking The Week 3: Assignment 3. Week 3: Assignment 3
3 pages
Unit V Full
No ratings yet
Unit V Full
23 pages
Aggressive Driving and The Risk of Driving Toward To Drivers
No ratings yet
Aggressive Driving and The Risk of Driving Toward To Drivers
10 pages
Data Fix
No ratings yet
Data Fix
19 pages
Impact of Microfinance On Poverty Alleviation: The Study of District Bahawal Nagar, Punjab, Pakistan
No ratings yet
Impact of Microfinance On Poverty Alleviation: The Study of District Bahawal Nagar, Punjab, Pakistan
17 pages
Econometrics - Chapter 17 - Simultaneous Equations Models - Shalabh, IIT Kanpur
No ratings yet
Econometrics - Chapter 17 - Simultaneous Equations Models - Shalabh, IIT Kanpur
30 pages
Complete Download Beyond Multiple Linear Regression Applied Generalized Linear Models And Multilevel Models in R 1st Edition Paul Roback PDF All Chapters
No ratings yet
Complete Download Beyond Multiple Linear Regression Applied Generalized Linear Models And Multilevel Models in R 1st Edition Paul Roback PDF All Chapters
71 pages
ID Kualitas Pelayanan Jasa Keagenan Kapal D PDF
No ratings yet
ID Kualitas Pelayanan Jasa Keagenan Kapal D PDF
9 pages
Audit Committee Independence and Audit Quality of Nigeria Listed Deposit Money Banks
No ratings yet
Audit Committee Independence and Audit Quality of Nigeria Listed Deposit Money Banks
11 pages
Econometrics Pset 2
100% (1)
Econometrics Pset 2
4 pages
Approximation Methods For The Total Claim Amount in Collective Risk Modeling
No ratings yet
Approximation Methods For The Total Claim Amount in Collective Risk Modeling
98 pages
Panel Data Analysis
No ratings yet
Panel Data Analysis
8 pages
Serena Hotels
No ratings yet
Serena Hotels
51 pages

5ANFIS and Deep Learning based missing sensor data prediction in IoT

Uploaded by

5ANFIS and Deep Learning based missing sensor data prediction in IoT

Uploaded by

Received: 1 February 2019 Revised: 27 April 2019 Accepted: 18 May 2019

ANFIS and Deep Learning based missing sensor data

Metehan Guzel1 Ibrahim Kok2 Diyar Akay3 Suat Ozdemir4

1 Department of Computer Engineering,

Graduate School of Natural and Applied Summary

Funding information KEYWORDS

2.1 Statistical methods

2.2 Optimization methods

2.3 Machine learning methods

3 DATASET AND MODEL DESCRIPTIONS

3.2 Descriptive statistical analysis

FIGURE 1 Sensor locations and

TABLE 1 Descriptive statistics of dataset Features

FIGURE 2 Spearman correlation matrices of sensor values

3.3 Inputs and outputs of the models

4 ADAPTIVE-NETWORK BASED FUZZY INFERENCE SYSTEM BASED MISSING VALUE

4.1 Adaptive-network based fuzzy inference system

t-2 Tempt-2 Humidt-2 Lightt-2 -

t-1 ??? Humidt-1 Lightt-1 Mdl1 Humidt-1 Lightt-1 > Tempt-1

t Tempt Humidt Lightt -

t+1 Tempt+1 Humidt+1 Lightt+1 -

t+2 Tempt+2 ??? Lightt+2 Mdl2 Tempt+2 Lightt+2 > Humidt+2

t+3 Tempt+3 Humidt+3 Lightt+3 -

t+4 Tempt+4 Humidt+4 ??? Mdl3 Tempt+4 Humidt+4 > Lightt+4

FIGURE 4 Flowchart representation of proposed models application on IoT data streams

FIGURE 5 Sample ANFIS architecture

O1,i = 𝜇Ai (x), i = 1, 2 (1)

O1,j = 𝜇Bj (y), j = 1, 2. (2)

O2i,j = wi,j = 𝜇Ai (x) × 𝜇Bj (y), i, j = 1, 2 (5)

O4i,j = wi,j fi,j = wi,j (pij x + qij y + rij ), i, j = 1, 2. (7)

4.2 ANFIS clustering

4.3 ANFIS optimization

4.3.1 Grid partitioning parameter optimization

4.3.2 Subtractive clustering parameter optimization

4.3.3 Fuzzy C-means clustering parameter optimization

Best performing parameter configuration for FCM is given in Table 5.

5 DEEP LEARNING BASED MISSING VALUE PREDICTION

5.1 Deep Learning based prediction models

5.2 Long Short Term Memory

ft = 𝜎(Wf · [ht−1 , xt ] + bf ) (9)

it = 𝜎(Wi · [ht−1 , xt ] + bi ) (10)

C̃ t = tanh(Wc · [ht−1 , xt ] + bc ) (11)

ot = 𝜎(Wo · [ht−1 , xt ] + bo ) (13)

5.3 Parameter optimization and training process

TABLE 7 Normalized RMSE metrics and training/testing times

0,20 0,20 0,20

0,15 0,15 0,15

0,10 0,10 0,10

0,05 0,05 0,05

0,00 0,00 0,00

Mdl1 Mdl2 Mdl3

FIGURE 7 Performance comparisons of all models

7 CONCLUSION AND FUTURE WORK

Suat Ozdemir https://orcid.org/0000-0002-4588-4538

You might also like