5ANFIS and Deep Learning based missing sensor data prediction in IoT
5ANFIS and Deep Learning based missing sensor data prediction in IoT
DOI: 10.1002/cpe.5400
RESEARCH ARTICLE
1 INTRODUCTION
The concept of the Internet of Things (IoT) refers to a sensor-rich world where physical objects in our environment are increasingly enriched
with computing, sensing, and communication capabilities. Sensor technology is one of the core enabling technologies in this world. Sensors
are utilized to collect the large amount of heterogeneous data for large scale IoT applications such as environmental monitoring, e-health,
intelligent transportation systems, military, smart agriculture,1 and industrial plant monitoring.2 Sensors and connected devices with diverse digital
technologies generate an excessive amount of data, which are multi-source, real-time, dynamic, sparse, highly heterogeneous, and semantically
rich. In the large scale IoT platforms, due to the lack of battery power, communication errors, and malfunctioning devices, sensor-generated data
are considered to be inherently noisy, uncertain, erroneous, and missing.3 Therefore, data generation and quality become a critical issue in data
processing and analysis.
In this work, we address the problem of missing sensor data. This problem is very common in IoT for various reasons, such as unstable
network communication, synchronization issues, unreliable sensors, and other types of equipment failure. Eliminating missing data results in
loss of information and may lead to incorrect analytical results. Thus, the prediction and assessment of missing values become an imperative
task.4 Hence, there is still a need for novel prediction models to predict the missing data. In order to address the missing sensor data problem,
we propose two prediction models based on Deep Learning (DL) and Adaptive-Network based Fuzzy Inference System (ANFIS). We focus on
sensory-rich IoT applications, where our models learn how to infer the missing data from different sensors' data optimally.
ANFIS is a soft computing method that combines the advantages of Artificial Neural Networks (ANN) and Fuzzy Inference Systems (FIS).
ANFIS has high generalization ability supported with fast and accurate learning phase.5 Based on this, we decided to solve the missing sensor
data prediction problem with ANFIS.
Recently, DL is attracting widespread interest in academic and industrial fields due to the state of art performance in many domains such as
computer vision, natural language processing, speech recognition, visual object recognition, and many other domains.6 DL also indicates good
Concurrency Computat Pract Exper. 2019;e5400. wileyonlinelibrary.com/journal/cpe © 2019 John Wiley & Sons, Ltd. 1 of 15
https://doi.org/10.1002/cpe.5400
2 of 15 GUZEL ET AL.
potential for analyzing vast volumes of data and discriminative tasks such as a classification and prediction.7 We secondly employ DL for the
missing data problem due to its predictive analytics' power for large-scale data sets. The key contributions of this paper are summarized as follows.
• We propose novel prediction models based on ANFIS and DL to solve the missing data problem in IoT.
• We conduct extensive experiments to validate the performance of the proposed prediction models.
The rest of this paper is organized as follows. Section 2 includes related works on the missing sensor data prediction. Section 3 explains
the dataset and model descriptions. Section 4 and Section 5 introduce the details of the proposed prediction models. Section 6 presents the
experimental results and comparison of prediction models. Finally, Section 7 concludes the paper.
2 RELATED WORK
The presence of missing/corrupted values in the databases/datasets has been a big problem for decades. In addition, with the newly emerged
concepts like Wireless Sensor Networks (WSN) and IoT, total data generation speed has skyrocketed, but the quality/reliability of equipment
went down. That caused more and more missing values. Therefore, numerous research has been conducted to overcome this problem. In this
section, we briefly present the proposed methods to estimate missing data and show a general overview of the literature in this domain. It must
be noted that, in compliance with the scope of this paper, we exclude temporal and spatiotemporal estimation methods. We grouped methods
into three categories, namely, Statistical Methods, Optimization Methods, and Machine Learning Methods.
problem is not a native optimization problem, these algorithms ability to quickly explore search space increases the applicability in missing data
domain. Among optimization algorithms, genetic algorithm (GA), ant colony optimization (ACO), and particle swarm optimization (PSO) algorithms
are commonly used.
GA imitates the evolution of species.17 For a given problem, a population composed of candidate solutions is generated. Each solution is
evaluated according to a fitness function. Then, new populations are generated from the most successful individuals of previous population
iteratively. By utilizing addition of new random solutions and mutations at each generation, problem of falling into local minimum pits is avoided.
GA is one of the most frequently used algorithms in numerous domains and missing value estimation is not an exception. In the work of
García et al,18 GA is used to impute missing values. The proposed method handles the data as matrix and aims to find missing values in a way
that does not alter statistical characteristic of initial dataset. Suppose that X is the dataset matrix with missing values, Y is a matrix composed
of missing values and combination of X and Y, and X ̂ is the completed dataset. The method tries to find Y, which minimizes the difference of
̂ Fitness function is constructed upon this criteria and candidate Y matrixes are handled as individuals of population.
statistics between X and X.
The GA based approach is compared to EM and auxiliary regression based estimation models and surpasses methods in manner of preserving
statistical variables. Furthermore, GA based method claimed to be more flexible and responsive.18 Another GA based approach is utilized in the
work of Lobato et al.19 The proposed MOGAImp is a multi-objective GA (MOGA), which is based on NSGA-II.20 The reason behind using a
MOGA is to be able to optimize missing value selections on different metrics. In MOGAImp, used metrics are classification accuracy and RMSE.
Fitness function of is constructed upon these two metrics. The complete set of missing values is encapsulated as a single individual of population
and phases of GA are performed.
ACO based methods are easily applicable if data can be formulated as a graph problem.21 Parallel to this principle, in the work of Priya et al,21
missing value estimation problem formulated as a graph problem and an ACO based method is proposed. In conversion to graph, each covariant
is turned into a level composed of covariant values and the target (missing) attribute is turned into final level. On this graph, an ACO based
method, namely, Dual Repopulated Bayesian ACO (DPBACO), is applied. The reason behind this selection is ACO's susceptibility to fall into
local minimum. By duplicating population (main and reserve population) and crossing over individuals from different population in each iteration,
DPBACO increases variety, therefore overcomes the local minimum problem. Adding different Bayesian functions to ant traversal is applicable
to different data characteristics.
PSO22 is a stochastic swarm intelligence based optimization algorithm that is frequently utilized because of its simplicity, accuracy, and fast
convergence ability.17 In the work of Nekouie and Moattar,23 PSO based hybrid missing value estimation method is used on breast cancer
diagnosis data. To overcome PSO's weakness of getting stuck in local optimum, chaotic reduced adaptive PSO (CRAPSO) is employed. The
proposed method firstly generates a set of values to impute missing one using Bayesian networks. In the next step, tensor is used for estimation.
Tensor-based estimation is performed by calculating missing attribute as a linear function of present attributes. However, in case of data
insufficiency, like other mean square error minimization based models, tensor based estimation model suffers from accuracy loss. Therefore, an
automatic data generation phase that utilizes CRAPSO is placed before tensor phase. CRAPSO and tensor phases run iteratively until convergence
is achieved. After convergence, the acquired results are used for imputation.
In addition to stand-alone solutions, optimization methods are generally used for optimizing machine learning algorithms that are used for
missing data estimation. Research works that fall under this category are given in the next section.
ELM is applied. Except than the mentioned earlier, research works featuring multi-layer perceptron networks (MLP),35 self-organizing maps
(SOM),36,37 probabilistic NNs,38 and other types of NN are also present in the literature.
Clustering methodologies are another ML method used for estimation. Missing data can be estimated from the other data that share the same
cluster. Fuzzy c-Means (FCM) is a fuzzy based clustering algorithm that allows inter-lapping between different clusters. In other works,39-41 FCM
based methods are used for missing data problem.
3.1 Dataset
In this paper, we used the Intel Berkeley Research Lab dataset, which is publicly available. This dataset is collected from 54 sensors, which
were deployed in the Intel research laboratory at Berkeley between February 28 and April 5, 2004. It contains 2.3 million sensor readings
with time-stamped topology information, humidity, temperature, light intensity, and voltage values in ‘‘date:yyyy-mm-dd, time:hh:mm:ss.xxx,
epoch:int, moteid:int, temperature:real, humidity:real, light:real, voltage:real‘‘ format. Herein, the temperature unit is degrees Celsius. Humidity unit
is temperature corrected relative humidity, ranging from 0-100%. Light intensity is in lux and voltage is expressed in volts.42 The sensors and
sensor ids were arranged in the lab according to the diagram given in Figure 1.
In this study, humidity, temperature, and light intensity observations of 19th, 20th, and 21st sensors are used for evaluating the proposed
prediction models. The reason behind this selection is the completeness of the mentioned nodes. Most of the nodes in the dataset have missing
or corrupted readings. The selected nodes have relatively higher data density, especially between the 29th of February and the 7th of March.
Eight-day period between these dates has 100% density for observations when sensor reading are grouped by 3-minute intervals. Another
reason for the selection of nodes is the proximity of node locations. The selected nodes are adjacent to each other, which ensures similar sensing
environment. This enables us to use data in two different forms:
• Merging nodes' reading data together and process like all data is coming from a single node. This approach is utilized at DL based estimation
model.
• Using readings from different nodes separately to evaluate a single model with three different data sources. This approach is utilized at ANFIS
based estimation model.
Model Abbreviation Input I Input II Output TABLE 2 Inputs and output of the models
Mdl1 Humidity Light Temperature
Mdl2 Temperature Light Humidity
Mdl3 Temperature Humidty Light
Due to the not-normally distribution, the Spearman correlation coefficient is calculated among sensor values. The correlation matrix obtained
from the calculation is given in Figure 2.
It is used to determine the relationship between the inputs and outputs of the proposed models and to interpret the overall results. The fact
that inputs represent the output well means that there is a high correlation between inputs and output, which will contribute positively to the
performance of the models. The proposed models produce high accuracy prediction results when the correlation between the sensor nodes is
high. The accuracy decreases when the correlation becomes less. According to Figure 2, the highest correlation values between input and output
sensor values are obtained as Mdl1, Mdl2, and Mdl3, respectively.
Due to an unbalanced data characteristic, we also performed Min-Max Normalization to change their values to a common scale, without
distorting differences in the ranges of values. These normalized values were used in the training and testing processes of both DL and ANFIS
based models.
In this section, fuzzy logic based method utilized for missing sensor data prediction, ANFIS,43 and predecessor of ANFIS, FIS44 are briefly explained.
Light Related
Time Temperature Humidity
Density Model
FIGURE 3 Missing sensor value situations (left) and actions (right) taken in missing sensor value occurences
of the system is calculated as a weighted average of the output of each rule. To present ANFIS, an FIS with a 2-inputs where each input is
assumed to have two fuzzy linguistic terms is considered;
Rule 1: IF (x = A1 ) AND (y = B1 ) THEN f11 = p11 x + q11 y + r11
Rule 2: IF (x = A1 ) AND (y = B2 ) THEN f12 = p12 x + q12 y + r12
Rule 3: IF (x = A2 ) AND (y = B1 ) THEN f21 = p21 x + q21 y + r21
Rule 4: IF (x = A2 ) AND (y = B2 ) THEN f22 = p22 x + q22 y + r22
{pij , qij , rij } are the parameters that are determined during the training phase of ANFIS and {Ai , Bj } are fuzzy terms that are used for defining
data points.
GUZEL ET AL. 7 of 15
Figure 5 is an ANFIS structure which has two inputs (x, y) and an output (f). In the figure, circle nodes are fixed nodes that does not change
throughout the training phase, whereas square nodes are adaptive nodes that are calibrated through the training phase. An ANFIS is consisted
of 5 layers.
Layer 1: Every node in first layer is adaptive and calculates degree of membership value for each input variable. For a 2-input model, node
functions for each input are given as Equation (1) and Equation (2), ie,
In Equation (1) and Equation (2), 𝜇 Ai and 𝜇Bj are the selected membership functions. These functions can be Gaussian membership function
(given in Equation (3)), generalized bell membership function (given in Equation (4)), or another one, ie,
[ ( )2 ]
x − ci
𝜇Ai (x) = exp − (3)
2ai
2b
1
𝜇Ai (x) = . (4)
| x−ci |
1+| a |
| i |
In Equation (3) and Equation (4), {ai , bi , ci } are the parameters of membership function and can change shape of the function. They are referred
as premise parameters.
∏
Layer 2: Every node in this layer is fixed and labeled with . Nodes in this layer multiply incoming signals and send the product to the next
layer. Output of each node symbolizes firing strength of each rule. Output function of nodes in this layer is given as Equation (5), ie,
Layer 3: Every node in this layer is fixed and labeled with N. Nodes in this layer normalizes firing strengths of rules. Every ith node calculates
ratio of ith rule to sum of all rules' firing strengths using Equation (6).
wi,j
O3i,j = wi,j = ∑ , i, j = 1, 2. (6)
wi,j
Layer 4: Every node in this layer is adaptable with a node function given as Equation (7), ie,
Output of ith node is wi . Variables {pij , qij , rij } are referred as consequent parameters.
Layer 5: The fifth layer is the output layer of ANFIS structure and contains a single node that performs summation of all signals from the fourth
∑
layer. The node in this layer is labeled as and performs summation using Equation (8), ie,
∑∑
Output = O5 = wi,j fi,j i, j = 1, 2. (8)
i j
As mentioned, ANFIS structure has two adaptive layers, namely, the first and the fourth layers. Their ability to adapt roots from parameters
of these layers, namely, premise parameters of the first layer and consequent parameters of the fourth layer. The training phase of ANFIS consists
8 of 15 GUZEL ET AL.
of tuning of the premise and consequent parameter. For this purpose, ANFIS utilizes a hybrid learning algorithm.43 This algorithm is composed
of two passes. A forward pass is used for tuning of consequent parameters and a backward pass is used for tuning of premise parameters. In
the forward pass, premise parameters are fixed and signals proceed to layer four. In the fourth layer, consequent parameters are determined by
using the least square method. In the backward pass, consequent parameters are fixed, error rates propagate back to the first layer. In the first
layer, premise parameters are tuned based on membership function using Gradient Descent method.47
• IMFT is crucial for the performance of fuzzy sets. Membership function calculates a membership degree between [0,1] for data points. In
this research eight different membership functions (mf) are used: generalized bell-shaped mf, gaussian curve mf, gaussian combination mf,
triangular-shaped mf, trapezoid-shaped mf, difference between two sigmoidal mf, product of two sigmoidal mf, and pi-shaped mf. IMFT is
specified for each input parameter.
• NMF specifies the number of membership functions for each input variable and directly effects number of rules. In this research, cluster
numbers between {2} and {10} are tested for all models. NMF is specified for each input parameter.
• OMFT specifies the type of membership function for output which can be linear or constant.
Best performing parameter configuration for GP is given in Table 3, where IMFT Input I, IMFT Input II, NMF Input I, NMF Input II, and OMFT
respectively stand for IMFT for the first input, IMFT for second input, NMF for first input, NMF for second input, and OMFT of ANFIS model.
GUZEL ET AL. 9 of 15
Model IMFT IMFT NMF NMF OMFT TABLE 3 Best performing parameter configurations for GP based ANFIS
Input I Input II Input I Input II
Mdl1 gaussmf trimf 2 3 linear
Mdl2 trimf trapmf 2 constant
Mdl3 gaussmf gaussmf 2 2 constant
Model CIR CIR CIR SF AR RR TABLE 4 Best performing parameter configurations for SC based ANFIS
Input I Input II Output
Mdl1 0.90 0.30 0.30 0.90 0.50 0.25
Mdl2 0.70 0.30 0.70 0.90 0.50 0.20
Mdl3 0.70 0.60 0.70 0.90 0.50 0.30
Model CN Expo MNI MI TABLE 5 Best performing parameter configurations for FCM based ANFIS
Mdl1 3 1.2 15 1.00E-5
Mdl2 2 1.2 50 1.00E-5
Mdl3 3 1.2 10 1.00E-5
• CIR is the influence range of clusters. Default value of CIR is {0.5}. In this research, input and output CIR values are tested with values
between {0.1} and {0.9}.
• SF is the factor used for scaling of influence range. Default value of SF is {1.25}. In this research, SF value is tested with values between {0.3}
and {1.50}.
• AR is used for acceptance of new clusters, values between {0.30} and {0.95} are used for tests but no significant effect of AR is observed.
• RR used for rejection of new clusters. RR values between {0.05} and {0.30} are used for testing.
Best performing parameter configuration for SC is given in Table 4, where CIR Input I, CIR Input II, and CIR Output respectively stand for CIR
for first input, CIR for second input, and CIR for output.
• NC specifies number of clusters, therefore directly affects the number of generated rules. In implementation, if not specified, NC is decided
by subtractive clustering phase, which has a cluster range of {0.5}. In our parameter tests, default option and cluster numbers between {2} and
{50} are used for testing.
• Expo controls fuzzy overlapping between clusters, has a default value of {2.0} in MATLAB implementation of SC. In this research, values
between {1.2} and {3.0} are tested.
• MNI is the number of iterations in the training phase. In this research, MNI values between {5} and {50} are tested.
• MI is the minimum improvement value that is used for termination of algorithm. MI has a default value of {1.00E-5}. Values
{1.00E-4,1.00E-5,1.00E-6} are tested.
In this section, we introduce the proposed DL models and Long Short Term Memory (LSTM) network. Then, parameter optimization and training
processes are explained, respectively.
FIGURE 6 General architecture of Deep Learning models (Left), LSTM Memory Block (Right)
of sensor data in the data set, three different models are proposed for the estimation of each type. The relevant model will work and complete
the missing data in case of one of the readings is missing. The overall architecture of the proposed models with all cases is shown in Figure 6.
Accordingly, Mdl1, Mdl2, and Mdl3 are proposed for estimating the missing temperature, humidity, and light sensor values, respectively.
Herein, the models predict temperature, humidity, and light data by taking humidity-light, temperature-light, and temperature-humidity value
pairs as inputs. The input and output values of the models are given in Table 2. In addition, the hyper-parameter values and processes of the
models are given in detail in Section 5.3.
Ct = ft ∗ Ct−1 + it ∗ C̃ t (12)
ht = ot ∗ tanh Ct , (14)
where xt and ht are input and output vector at time t, C̃ t is the old cell state, Ct is the new cell state, it , ft , and ot are the input, forget, and output
gates, respectively. Wc , Wi , Wf , Wo are the input weights matrices, ∗ is the element-wise product and operates on the two vectors of the same size,
1
bc , bi , bf , bo are the bias vectors. 𝜎(·) represents the logistic sigmoid function, ie, 𝜎(x) = 1
+ e−x and tanh(·) represents hyperbolic tangent function.
Model Name Hidden Layers LSTM Units (L1xL2) Batch Sizes Epochs TABLE 6 Hyper-parameters of the DL models
Mdl1 2 240x240 1440 200
Mdl2 2 240x240 1440 200
Mdl3 2 60x60 60 200
In the process of model construction and training, we use TensorFlow63 and Keras64 framework as program computing environment. Adaptive
Moment Estimation (Adam) optimizer which computes individual adaptive learning rate for different parameters is used to minimize the loss
function.65 A mini-batch strategy is utilized in our implementation to reduce loss fluctuation so the gradients are calculated with respect to
mini-batches.
6 EXPERIMENTAL RESULTS
The performance of the proposed models is evaluated with the Root Mean Squared Error (RMSE) metric, given in Equation (15), ie,
√
√ N
√∑
RMSE = √ (y0 − ye )2 ∕N. (15)
i=1
Here, yo , ye , and N represent the observed sensor value, the estimated sensor value, and the total number of observations, respectively.
RMSE metric is used for the measurement of error amount between the estimated value and real observed value. It must be noted that RMSE
metric changes depending on the value range of variables. To acquire RMSE metrics in a proportional manner, all experiments are conducted on
normalized data.
To verify the prediction accuracy, we compare our models with SVM Regression (SVR) and Gaussian Kernel Regression (GKR), which are two
non-linear regression methods. For comparison, we performed experiments according to the inputs and outputs in Table 2. In these experiments,
each node is addressed as a different data source. Each experiment is conducted on all three data sources using 10-fold cross validation method
which results ten test results in per data source (sensor) and thirty test results per test model. Error ratios of ANFIS, DL, SVR, and GKR based
prediction methods we present throughout this section are the averages of acquired thirty results. Therefore, the results presented in this section
are generalized and effects of data selection is minimized. Experiments of ANFIS based methods, SVR, and GKR are performed on MatLab2018a.
Parameter configurations of SVR and GKR are default parameters that are defined in MATLAB. The normalized RMSE values of all models are
presented in Table 7 and Figure 7.
In case of Mdl1 and Mdl3, the proposed models have lower error ratios than implemented non-linear regression methods (SVR and GKR).
Among the proposed methods, DL based method demonstrates the best performance. In case of Mdl2, DL based method has the lowest error
ratio but error ratios of ANFIS based methods do not show any significant difference when compared to regression models. In total, proposed
models show improved results over SVR and GKR.
Among the models, Mdl1 seems to be the most predictable, which shows us that relations of temperature-light and temperature-humidity
tuples are highly correlated. A similar trend is observed in Mdl2. However, relations between humidity-temperature and humidity-light seem to
have a different characteristic. DL and ANFIS based methods show higher error ratios on Mdl2 compared to Mdl1, unlike SVR and GKR. SVR and
GKR perform better on Mdl2 compared to Mdl1. Mdl3 is the least correlated relation. All methods perform poorly on Mdl3 compared to other
methods.
As seen in Table 7, the proposed ANFIS and DL based methods fall behind regression methods in a timely manner. GKR method has the
lowest training time among all methods. The training time of SVR and GP based ANFIS is also below 1 second for training set composed of 3970
observations. On the other hand, the training time of SC based ANFIS and FCM based ANFIS is relatively longer but the excess time used in
training does not reflect on prediction results. Among ANFIS based methods, all three models have resembling results but GP based method has
significantly lower training time. In the case of Mdl1 and Mdl3, GP based ANFIS outperforms GKR and SVR in manner of prediction accuracy
with reasonable training time.
In DL, training time depends on a large number of factors such as network architecture, output channels, batch sizes, and other hyper-parameters.
Therefore, training times of DL based methods resulted as higher compared to the other methods. In light of the results in Table 7, the training
periods of Mdl1, Mdl2, and Mdl3 lasted for 24.28, 26.63, and 320.82 seconds, respectively. In particular, with lower batch size, Mdl3 has higher
computational complexity than Mdl1 and Mdl2, which causes significantly higher training time. It must be noted that priorities must be set before
method selection. Acquired results show a trade-off between robustness and accuracy among methods. If robustness is the most desirable
aspect, ANFIS based methods (in Mdl1 and Mdl3) and regression models are the right choices. However, if accuracy is the top priority, DL based
method is the right selection on all three models.
In general, the proposed models can improve the prediction accuracy and stability of missing sensor data greatly and effectively. Therefore,
ANFIS and DL based models are promising choices for prediction models.
12 of 15
DL 0.0659 0.0701 24.28 0,5168 0.0726 0.0880 26.63 0,5877 0.1101 0.1711 320.82 0,6237
GP-ANFIS 0.1487 0.0990 0.79 0.0086 0.1109 0.1086 0.37 0.0074 0.1096 0.1733 0.49 0.0063
SC-ANFIS 0.1581 0.0979 5.55 0.0081 0.1040 0.1089 5.69 0.0065 0.0905 0.1762 5.85 0.0064
FCM-ANFIS 0.1662 0.1031 1.65 0.0067 0.1106 0.1061 0.60 0.0055 0.0943 0.1751 1.56 0.0062
GKR 0.0907 0.1139 0.07 0.0007 0.1094 0.1079 0.06 0.0006 0.1690 0.1940 0.08 0.0007
SVR 0.1068 0.111 0.44 0.0014 0.1140 0.1055 0.41 0.0015 0.1797 0.1833 0.41 0.0012
GUZEL ET AL.
GUZEL ET AL. 13 of 15
Missing sensor values are a big problem for both IoT and WSN. In this work, we proposed two models to tackle this problem, namely, ANFIS
and DL based models. DL models have shown state-of-art performance in computer vision, natural language processing, and robotics. These
models have an interesting potential solution for many areas including classification, prediction, and control problem. On the other hand, ANFIS
is used successfully in controlling, modeling, and parameter estimation of complex systems due to adaptation capability, nonlinear ability, and
rapid learning capacity. The motivation of this paper is to utilize their advantages in the IoT missing sensor data problem. For this purpose, firstly,
optimization processes are carried out for proposed models for identifying the optimal model parameters. Secondly, the models are constructed
by using obtained optimal parameters, and then train and test procedures are performed. The results indicate that both DL and ANFIS methods
are remarkably well in terms of normalized RMSE metrics compared to the selected non-linear regression models. Through comparisons with SVR
and GKR, our proposed models show their advantages on the prediction accuracy. Particularly, DL obviously outperforms the other methods.
Moreover, ANFIS based models work quite well for estimating missing values.
In this work, the use of different sensor data types to estimate a sensor value is investigated. Sensor reading from other sensor nodes and
previous readings from the sensor are completely ignored. Nevertheless, even with the ignored data experiment showed that the proposed
methods perform remarkably well. Based on this, DL and ANFIS based methods deserve further investigation on IoT data analysis problems. Our
next work will be about immersing previous readings and reading of neighbor nodes for the estimation process in a spatiotemporal manner.
ORCID
REFERENCES
1. AlZu'bi S, Hawashin B, Mujahed M, Jararweh Y, Gupta BB. An efficient employment of internet of multimedia things in smart and future agriculture.
Multimed Tools Appl. 2019:1-25.
2. Atzori L, Iera A, Morabito G. The internet of things: a survey. Computer Networks. 2010;54(15):2787-2805.
3. Karkouch A, Mousannif H, Al Moatassime H, Noel T. Data quality in internet of things: a state-of-the-art survey. J Netw Comput Appl. 2016;73:57-81.
4. Qin Y, Sheng QZ, Falkner NJG, Dustdar S, Wang H, Vasilakos AV. When things matter: a survey on data-centric internet of things. J Netw Comput
Appl. 2016;64:137-153. https://doi.org/10.1016/j.jnca.2015.12.016
5. Vural Y, Akay D, Pourkashanian M, Ingham DB. Modeling of an intermediate temperature solid oxide fuel cell using the adaptive neuro-fuzzy inference
system (ANFIS). J Fuel Cell Sci Technol. 2010;7(3):034501.
6. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444.
7. Kök I, Simsek MU, Özdemir S. A deep learning model for air quality prediction in smart cities. In: Proceedings of the IEEE International Conference on
Big Data (Big Data); 2017; Boston, MA.
8. Qin Y, Zhang S, Zhu X, Zhang J, Zhang C. Semi-parametric optimization for missing data imputation. Applied Intelligence. 2007;27(1):79-88.
9. AlZu'bi S, AlQatawneh S, ElBes M, Alsmirat M. Transferable HMM probability matrices in multi-orientation geometric medical volumes segmentation.
Concurrency Computat Pract Exper. e5214.
10. AlZu'bi S, Islam N, Abbod M. Enhanced hidden Markov models for accelerating medical volumes segmentation. In: Proceedings of the 2011 IEEE GCC
Conference and Exhibition (GCC); 2011; Dubai, UAE.
11. Hassan MR, Nath B. Stock market forecasting using hidden Markov model: a new approach. In: Proceedings of the 5th International Conference on
Intelligent Systems Design and Applications (ISDA'05); 2005; Warsaw, Poland.
12. Li Z, Liu L, Kong D. Virtual machine failure prediction method based on AdaBoost-hidden Markov model. In: Proceedings of the 2019 International
Conference on Intelligent Transportation, Big Data & Smart City (ICITBS); 2019; Changsha, China.
13. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B (Methodol). 1977;39(1):1-38.
14. Delalleau O, Courville A, Bengio Y. Efficient em training of Gaussian mixtures with missing data. arXiv preprint arXiv:1209.0521. 2012.
14 of 15 GUZEL ET AL.
15. Eirola E, Lendasse A, Vandewalle V, Biernacki C. Mixture of Gaussians for distance estimation with missing data. Neurocomputing. 2014;131:32-42.
16. Bouveyron C, Girard S, Schmid C. High-dimensional data clustering. Comput Stat Data Anal. 2007;52(1):502-519.
17. Elbes M, Alzubi S, Kanan T, Al-Fuqaha A, Hawashin B. A survey on particle swarm optimization with emphasis on engineering and network applications.
Evolutionary Intelligence. 2019;12(2):113-129.
18. García JCF, Kalenatic D, Bello CAL. Missing data imputation in multivariate data by evolutionary algorithms. Comput Hum Behav. 2011;27(5):1468-1474.
19. Lobato F, Sales C, Araujo I, et al. Multi-objective genetic algorithm for missing data imputation. Pattern Recognit Lett. 2015;68:126-131.
20. Deb K, Agrawal S, Pratap A, Meyarivan T. A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In:
Proceedings of the International Conference on Parallel Problem Solving from Nature; 2000; Paris, France.
21. Priya RD, Sivaraj R, Priyaa NS. Heuristically repopulated bayesian ant colony optimization for treating missing values in large databases. Knowl Based
Syst. 2017;133:107-121.
22. Kennedy J. Particle swarm optimization. In: Encyclopedia of Machine Learning. New York, NY: Springer; 2010:760-766.
23. Nekouie A, Moattar MH. Missing value imputation for breast cancer diagnosis data using tensor factorization improved by enhanced reduced adaptive
particle swarm optimization. J King Saud Univ Comput Inf Sci. 2018.
24. Richman MB, Trafalis TB, Adrianto I. Missing data imputation through machine learning algorithms. In: Artificial Intelligence Methods in the Environmental
Sciences. Berlin, Germany: Springer; 2009:153-169.
25. Tutz G, Ramzan S. Improved methods for the imputation of missing data by nearest neighbor methods. Comput Stat Data Anal. 2015;90:84-99.
26. Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520-525.
27. García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR, Verleysen M. K nearest neighbours with mutual information for simultaneous classification
and missing data imputation. Neurocomputing. 2009;72(7-9):1483-1493.
28. Abdella M, Marwala T. The use of genetic algorithms and neural networks to approximate missing data in database. In: Proceedings of the IEEE 3rd
International Conference on Computational Cybernetics (ICCC 2005); 2005; Mauritius.
29. Nelwamondo FV, Golding D, Marwala T. A dynamic programming approach to missing data estimation using neural networks. Information Sciences.
2013;237:49-58.
30. Ravi V, Krishna M. A new online data imputation method based on general regression auto associative neural network. Neurocomputing.
2014;138:106-113.
31. Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the 2004 IEEE
International Joint Conference on Neural Networks; 2004; Budapest, Hungary.
32. Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications. Neurocomputing. 2006;70(1-3):489-501.
33. Sovilj D, Eirola E, Miche Y, et al. Extreme learning machine for missing data using multiple imputations. Neurocomputing. 2016;174:220-231.
34. Laña I, Olabarrieta II, Vélez M, Del Ser J. On the imputation of missing data for road traffic forecasting: new insights and novel techniques. Transp Res
C Emerg Technol. 2018;90:18-33.
35. Silva-Ramírez EL, Pino-Mejías R, López-Coello M, Cubiles-de-la- Vega M-D. Missing value imputation on missing completely at random data using
multilayer perceptrons. Neural Networks. 2011;24(1):121-129.
36. Folguera L, Zupan J, Cicerone D, Magallanes JF. Self-organizing maps for imputation of missing data in incomplete data matrices. Chemom Intell Lab
Syst. 2015;143:146-151.
37. Saitoh F. An ensemble model of self-organizing maps for imputation of missing values. In: Proceedings of the 2016 IEEE 9th International Workshop
on Computational Intelligence and Applications (IWCIA); 2016; Hiroshima, Japan.
38. Nishanth KJ, Ravi V. Probabilistic neural network based categorical data imputation. Neurocomputing. 2016;218:17-25.
39. Zhang L, Lu W, Liu X, Pedrycz W, Zhong C. Fuzzy C-Means clustering of incomplete data based on probabilistic information granules of missing values.
Knowl Based Syst. 2016;99:51-70.
40. Li T, Zhang L, Lu W, et al. Interval kernel Fuzzy C-Means clustering of incomplete data. Neurocomputing. 2017;237:316-331.
41. Sefidian AM, Daneshpour N. Missing value imputation using a novel grey based Fuzzy C-Means, mutual information based feature selection, and
regression model. Expert Syst Appl. 2019;115:68-94.
42. Madden S. Intel Berkeley research lab data. 2004.
43. Jang J-SR. ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Man Cybern. 1993;23(3):665-685.
44. Takagi T, Sugeno M. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern. 1985;SMC-15:116-132.
45. Akay D, Chen X, Barnes C, Henson B. ANFIS modeling for predicting affective responses to tactile textures. Hum Factors Ergon Manuf Serv Ind.
2012;22(3):269-281.
46. Mamdani EH, Assilian S. An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man Mach Stud. 1975;7(1):1-13.
47. Werbos P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences [PhD dissertation]. Cambridge, MA: Harvard University;
1974.
48. Zadeh LA. Fuzzy sets. Inf Control. 1965;8(3):338-353. https://doi.org/10.1016/S0019-9958(65)90241-X
49. Zadeh LA. The concept of a linguistic variable and its application to approximate reasoning—I. Information Sciences. 1975;8(3):199-249.
50. Zadeh LA. The concept of a linguistic variable and its application to approximate reasoning—II. Information Sciences. 1975;8(4):301-357.
51. Zadeh LA. The concept of a linguistic variable and its application to approximate reasoning-III. Information Sciences. 1975;9(1):43-80.
52. Hu Y-C. Simple fuzzy grid partition for mining multiple-level fuzzy sequential patterns. Cybern Syst Int J. 2007;38(2):203-228.
53. Cobaner M. Evapotranspiration estimation by two different neuro-fuzzy inference systems. J Hydrol. 2011;398(3-4):292-302. https://doi.org/10.
1016/j.jhydrol.2010.12.030
54. Castellanos F, James N. Average hourly wind speed forecasting with ANFIS. In: Proceedings of the 11th Americas Conference on Wind Engineering
(ACWE); 2009; San Juan, Puerto Rico.
55. Moradi F, Bonakdari H, Kisi O, Ebtehaj I, Shiri J, Gharabaghi B. Abutment scour depth modeling using neuro-fuzzy-embedded techniques.
Mar Georesources Geotechnol. 2018;37(2):190-200.
GUZEL ET AL. 15 of 15
56. Dunn JC. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. 1973;3(3):32-57.
57. Bezdek JC, Ehrlich R, Full W. FCM: the Fuzzy C-Means clustering algorithm. Comput Geosci. 1984;10(2-3):191-203.
58. Fattahi H. Adaptive neuro fuzzy inference system based on fuzzy c–means clustering algorithm, a technique for estimation of TBM penetration rate.
Int J Optim Civ Eng. 2016;6(2):159-171.
59. Abdulshahed AM, Longstaff AP, Fletcher S. The application of ANFIS prediction models for thermal error compensation on CNC machine tools.
Appl Soft Comput. 2015;27:158-168. https://doi.org/10.1016/j.asoc.2014.11.012
60. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation. 1997;9(8):1735-1780.
61. Lipton ZC, Berkowitz J, Elkan C. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019. 2015.
62. Wei D, Wang B, Lin G, et al. Research on unstructured text data mining and fault classification based on RNN-LSTM with malfunction inspection
report. Energies. 2017;10(3):406.
63. Abadi M, Barham P, Chen J, et al. Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th Usenix Symposium on Operating
Systems Design and Implementation (OSDI'16); 2016; Savannah, GA.
64. Chollet F. Keras. 2015.
65. Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014.
How to cite this article: Guzel M, Kok I, Akay D, Ozdemir S. ANFIS and Deep Learning based missing sensor data prediction in IoT.
Concurrency Computat Pract Exper. 2019;e5400. https://doi.org/10.1002/cpe.5400