A Novel Machine Learning Algorithm For Spammer Identification in Industrial Mobile Cloud Computing
A Novel Machine Learning Algorithm For Spammer Identification in Industrial Mobile Cloud Computing
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2018.2799907, IEEE
Transactions on Industrial Informatics
Abstract—An industrial mobile network is crucial for indus- An important branch of IoT is control, including human
trial production in the Internet of Things. It guarantees the to object control and human control of machines, which is
normal function of machines and the normalization of industrial an important foundation for achieving intelligence[6]. This is
production. However, this characteristic can be utilized by
spammers to attack others and influence industrial production. particularly important in modern industrial production. The
Users who only share spams, such as links to viruses and mobile network becomes a target of spammers due to its
advertisements, are called spammers. With the growth of mobile importance in industrial production control. Spam is one of the
network membership, spammers have organized into groups for most common forms of attack in mobile networks. Spammers
the purpose of benefit maximization, which has caused confusion pretend to be normal users and only send spam[7, 8], and
and heavy losses to industrial production. It is difficult to distin-
guish spammers from normal users owing to the characteristics these are the users we aim to detect. A serious problem
of multidimensional data. To address this problem, this paper caused by spam is that links leading to viruses are selected
proposes a Spammer Identification scheme based on Gaussian by mistake and then users’ personal information is stolen, or
Mixture Model (SIGMM) that utilizes machine learning for production control is interfered with. These malicious nodes
industrial mobile networks. It provides intelligent identification communicate with each other and spammers hide in them as
of spammers without relying on flexible and unreliable relation-
ships. SIGMM combines the presentation of data, where each shown in Fig.1. The net structure with character graphics in
user node is classified into one class in the construction process the left side represents mobile cloud computing, the right side
of the model. We validate SIGMM by comparing it with the describes the behavior data of each user. The red character
reality mining algorithm and hybrid FCM clustering algorithm graphics represent spammers, and the green ones represent
using a mobile network dataset from a cloud server. Simulation normal users.
results show that SIGMM outperforms these previous schemes
in terms of recall, precision, and time complexity. Proposals from industry and academia discuss solutions
for shielding against spam. Classification based on machine
Index Terms—Industrial mobile network, Internet of Things,
spammers, intelligent identification, machine learning
learning is a learning process for mapping data samples
into two classes. However it has limitations. One is data
imbalance, unlabeled data are present in a much larger amount
I. I NTRODUCTION than labeled data, which hinders direct model construction.
Another limitation is multidimensional data, too many features
HE Internet of Things (IoT) [1] is an important com-
T ponent of the new generation of information technology.
It is widely used in many fields such as industrial control,
can lead to overfittinig. Hence intelligent feature selection is
necessary. In this paper, we first investigate the characteristics
of spammers and normal users in an industrial mobile network.
cyber-physical systems, and military investigation through the Then, the SIGMM model is proposed and developed based
techniques of intelligent perception, identification technology, on Gaussian Mixture Model, which focuses on spammer
and pervasive computing [2]. To understand and measure the identification. The paper contains the following three main
environment through objects’ inter-connections around people contributions.
is the basic idea of IoT [3], its foundation is the internet and
terminals to provide communication between objects [4]. It • Based on the Gaussian Mixture Model, we propose a
connects humans and objects, objects with objects, provides recognition process named the SIGMM model for classi-
remote control, and controls intelligent networks in new ways fication without relying on users’ unreliable relationships.
through enabling technologies [5]. • SIGMM can label data automatically, which increases the
precision of the model by expanding the training set. It
Tie Qiu and Keqiu Li are with School of Computer Science and Tech- labels large amounts of unlabeled data based on a few
nology, Tianjin University, Tianjin 300050, China. (E-mail: [email protected]; labeled data and solves the problem of the imbalance
[email protected]) between labeled data and unlabeled data.
Heyuan Wang and Baochao Chen are with the School of Soft-
ware, Dalian University of Technology, Dalian 116620, China. (E-mail: • We use an industrial mobile network dataset from a
[email protected]; [email protected]) cloud server to perform simulations. The results show that
Huansheng Ning, School of Computer and Communication Engineering, SIGMM performs better than two other models in terms
University of Science and Technology Beijing, Beijing 100083, China. (E-
mail: [email protected]) of identifying spammers and reducing time complexity.
Arun Kumar Sangaiah, School of Computer Science and Engineering,
Vellore Institute of Technology (VIT), Vellore 632014, India. (E-mail: arunk- The rest of this paper is organized as follows. Related work
[email protected]) is introduced in Section II. In Section III, preliminaries are
1551-3203 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2018.2799907, IEEE
Transactions on Industrial Informatics
discussed. In Section IV, we provide a detailed description of on the volume of the dataset being small or its accuracy will
the proposed SIGMM model. Simulations are performed to be sharply reduced.
present SIGMM’s performance in identifying spammers and 3) Reinforcement learning: Reinforcement learning algo-
we compare SIGMM with the reality mining algorithm (RMA) rithms are very suitable for learning to control an agent by
and hybrid FCM clustering algorithm (HFCM) in Section V. allowing it to interact with an environment [17]. The goal is to
We implement spammer identification on the industrial mobile choose an action to maximize an expected long-term reward.
data to verify our proposed algorithm. Finally, the conclusions In [18], a recursive least squares (RLS) algorithm based on
and future work are presented in Section VI. the reinforcement algorithm is proposed. It applies Q-Learning
by choosing a policy which is the best selection for a specific
II. R ELATED W ORK user. But it starts from a random user and does the exploration
within the network by friendship relationships, which restricts
For existing algorithms, there are three types of machine the scope of the exploration, and leads to decreased detection
learning: supervised learning, unsupervised learning, and re- efficiency.
inforcement learning. Existing methods[19, 20] mainly depend on the relation-
1) Supervised Learning: The main goal of supervised ships among users. However, owing to the development of
learning is to learn a model from labeled training data that intelligent recommendation mechanisms, user associations are
allows us to make predictions about unseen or future data [9]. not based on their true preferences or intentions. In the case
Supervised refers to a set of samples where the desired output where the users are not very clearly defined, a user might
labels are already known [10]. In the Spammer Setection algo- follow or be followed by an automatic link. Therefore many
rithm based on Logistic Regression [11], a spammer classifier users represent fuzzy relationships. In order to solve the above
is built for an online network with some features as inputs, problems, we propose SIGMM for industrial mobile networks.
and the algorithm output is 1 if a spammer is suspected. The
model is trained on a large training set, however, collection of
III. PRELIMINARIES
labeled data is rather difficult because of the recent emphasis
on the secrecy of user data. In order to learn the data construction and rules, and owing
2) Unsupervised Learning: Using unsupervised learning to the limited access to original data, we preprocess the data
techniques, we are able to explore the structure of our data after extracting any available original data in an industrial
to extract meaningful information without the guidance of a mobile network.
known outcome variable or reward function [12]. A clustering
algorithm is the main algorithm for unsupervised learning [13]. A. Data description
Clustering is a technique that allows us to find groups of Our data contain the following contents, user’s ID, the
similar members [14]. In the RMA based on K-means in [15], relationship with other users, the time-stamped post record,
the algorithm proposes a silhouette function which accepts the and the activity in the past three months. From the post record,
number of clusters as a parameter to judge the accuracy of we calculate the frequency of posting and proportion of posts
clustering. Then it uses a matrix of means to record the mean with URL or @, and the average similarity among the user’s
silhouette values for each value of k and finally determines the posts. The activity reflects whether the account is normal or
best value of k. But the clustering result depends on the k cen- not. It indicates the frequency of following others, which is
troids. Therefore it must consume extra time to determine the necessary because spammers tend to follow others all the time.
value of k. Furthermore, experimental results are unstable, with
the same k used in several experiments, producing different
results. In [16], a prediction model based on Big Data analysis B. Feature scaling
using a hybrid FCM clustering algorithm (HFCM) is proposed. The data we obtained have the following two constraints.
It works by repeating arithmetic operations to minimize the First, the labeled data are far fewer than the unlabeled data
objective function and updating membership function, which is which severely decreases the precision of training. Second,
very time-consuming. The experimental performance depends there is large data noise that may cause incorrect factors in
1551-3203 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2018.2799907, IEEE
Transactions on Industrial Informatics
C. Feature grouping
By utilizing standardization and the Pearson correlation
coefficient, the features remain simple. The multidimensional
feature is divided into three parts to indicate three user
perspectives, which are basic features, content features and
network features.
(b) Spammers’ basic features after PCA.
TABLE I: Pearson correlation coefficients. Fig. 3: Basic features after PCA.
F1 fans follow post fre_follow accountage fans/follow
label -0.72 0.57 0.37 0.58 0.11 0.14 • The basic features include the number of fans, following
users, posts, and the frequency of following. Previous
studies show that spammers tend to follow a large number
F2 fre_post similarity url_portion @number forwardnumber of users, and their fans are rare. The proportion of fans to
label 0.68 0.91 0.88 0.15 0.08
following is particularly low. These characteristics reflect
whether the user is normal or not. Spammers’ frequency
1551-3203 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2018.2799907, IEEE
Transactions on Industrial Informatics
in following others is further higher than that of normal simultaneously obtaining n samples is the largest. θ̂ is called
users. the maximum likelihood estimator, which is defined in Eq. (4).
• The content features mainly reflect the characteristics of
θ̂ = argmaxl(θ ) (4)
the information sent by a user in the most recent three
months. The user’s activity can be analyzed by the content For parameter estimation, the goal is to determine a param-
features. eter that maximizes the likelihood function, the relationship of
• The network feature mainly describes user characteris- parameter θ , and likelihood function L(θ ) can be expressed
tics in an industrial mobile network. The number and as Eq. (5). (
proportion of following each other represent the degree L(θ ) = ∏ P(xi ; θ )
of intimacy between users. Spammers usually follow a (5)
θ = [µ, σ ]
large number of normal users to attack. Therefore, their
proportion of following each other is lower than that of It can be seen that the likelihood function is a continuous
normal users. multiplication. We take the logarithm of L(θ ) and transform
We perform data visualization to evaluate the characteristics it to a continuous addition format using Eq. (6).
of users. PCA is utilized for data compression[21]. and to lnL(θ1 ) = ln ∏ P(xi ; θ1 )
find the directions of maximum variance in high-dimensional " k 2
!#
data and project these onto a new subspace with fewer 1 ∑ (xi − µ1 )
= ln √ exp −
dimensions. The principal components of the new subspace 2πσ1 2σ12 (6)
can be interpreted as the directions of maximum variance given
" #
√ 2
∑ (xi − µ1 )
the constraint that the new feature axes are orthogonal to each = − kln 2πσ1 +
2σ12
other, as illustrated in Fig.2. Here, x1 and x2 are the original
feature axes, and PC1 and PC2 are the principal components. To obtain the maximum value of ln(L(θ )), we calculate its
For the basic features, PCA is utilized for dimensionality partial derivatives by Eq. (7).
reduction. We construct a d×4 dimensional transformation
√ 2
∑(x −µ )
matrix W to map a sample of four-dimensional features into
∂ −kln( 2πσ1 )− i 2 1
∂ ln(θ1 ) =
2σ1
=0
one-dimensional features. Fig. 3 shows the training dataset ∂ µ1 ∂ µ1
(7)
√ 2
after dimensionality reduction. The data of two classes have ∑(x −µ )
∂ −kln( 2πσ1 )− i 2 1
a peak and a slope, which are approximately subject to the ∂ ln(θ1 ) =
2σ1
=0
∂σ 1 ∂σ 1
Gaussian distribution with different parameters.
Eq. (7) can be simplified to Eq. (8).
k
IV. SIGMM M ECHANISM ∑i=1 (x2i −µ1 ) = 0
σ1
The SIGMM mechanism fits the behavior data of normal 2 (8)
∑ki=1 (xi −µ1 )
users and spammers, where the behavior data of normal users −k
σ1 + =0
2σ 3 1
and spammers are mixed random sampling. The SIGMM
The solution of Eq. (7) is shown in Eq. (9).
mechanism learns the parameters of the two distributions (nor-
µ1 = 1k ∑ki=1 xi
(
mal users and spammers) to obtain the classification model.
q
∑ki=1 (xi −µ1 )
2 (9)
σ1 = 2k
A. Parameters estimation based on Expectation-maximization
The data are approximately subject to the Gaussian distribu-
tion. The mean and variance must be estimated for initializing TABLE II: Variables definition.
the model. According to the probability density p(x|θ ), we Symbols Description
independently extract some samples to constitute the training θ θ = [µ, σ ] µ is the mean of samples, σ is the variance of samples
sample set X. Parameter θ represents the mean and variance L The labeled dataset
of the dataset, and is estimated through the sample set X. U The unlabeled dataset
lhs The likelihood function of spammers’ set
Consider X = {x1 , x2 , . . . , xn } as a set of extracted samples, xi lhn The likelihood function of normal users’ set
represents the i th user data, and n represents the number of Models The model of spammers’ distribution
samples. Because they are independent, the probability that Modeln The model of normal users’ distribution
elln The ellipsoid that represents Models
xi and x j are extracted simultaneously is p(xi |θ ) ∗ p(x j |θ ). ells The ellipsoid that represents Modeln
Similarly, the probability that n samples are extracted simul- dn The Euclidean distance between one user point and elln
taneously is the product of their respective probabilities, as ds The Euclidean distance between one user point and ells
shown in Eq. (3). p1 Probability of normal user
p2 Probability of spammer
n λ One user point
L(θ ) = L(x1 , x2 , . . . , xn ; θ ) = ∏ p(xi ; θ ) (3) lbl User’s label
i=1
L(θ ) is called the likelihood function related to the sample set Algorithm 1 describes the process of maximizing the likeli-
X and parameter θ . θ̂ is a value indicating the maximum result hood function and then the model’s parameters are calculated
of likelihood function L(θ ). When θ = θ̂ , the probability of from samples. All variables are shown in Table II. The samples
1551-3203 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2018.2799907, IEEE
Transactions on Industrial Informatics
can be scanned by the two loops, with the goal of establishing πk N(xi |µk , Σk )
r(i, k) = k
(12)
the maximum likelihood function. We scan all samples of
∑ π j N(xi |µ j , Σ j )
the two classes respectively to establish the two maximum j=1
likelihood functions (Lines 2 to 7). Next, we calculate the
maximizing likelihood function by partial derivatives (Lines 8
and 9). Finally, we obtain the optimal solutions θ1 and θ2 . The
time consumption of this process mainly involves establishing
likelihood functions and calculating likelihood functions. The
number of users determines the complexity of Algorithm 1,
so its complexity is O(n).
1551-3203 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2018.2799907, IEEE
Transactions on Industrial Informatics
56 1 202
202 Spammer’s
56
model
1551-3203 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2018.2799907, IEEE
Transactions on Industrial Informatics
parts, parts L1 and L2. These parts have no intersection with ability density distribution of samples, each Gaussian model
each other (Line 2). One is the training set for generating represents a class. By matching samples with several Gaussian
initial parameters while the other is the test set. Algorithm models to obtain probabilities, the class with the largest
1 is called to construct an initial model (Lines 3 to 9). By probability is chosen as the classification result.
calling Algorithm 2, unlabeled data in the remaining dataset With continuous iteration, the ellipsoid’s center as well as
are labeled (Lines 10 to 18). Line 19 is an important process radius change slowly as shown in Fig. 7(a) and converge to
in semi-supervised training. It adds the recently labeled data the stable position. The following conclusions can be obtained
to the training set and updates the model using Algorithm 1 from the figure. (1) The data of the two classifications are
(Lines 20 to 26). Thus, the complexity of Algorithm 3 is O(n). clearly separated. (2) The radius of the green one representing
normal users is larger than that of the red one, because the
number of normal users is large. Their behavior data are not
similar to each other and deviate from the center. The radius
of the red one is smaller than that of the green one because
spammer behaviors for attacking others are similar. (3) The
radius and location of the two ellipsoids vary only slightly.
Therefore, the model is stable.
1551-3203 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2018.2799907, IEEE
Transactions on Industrial Informatics
A. Simulation Setup
B. Time complexity
From Fig. 8 (b) we can see that the recall of the SIGMM
model is slightly higher compared with the other two schemes.
Moreover, HFCM and RMA are influenced because of the
initial values of the algorithms and the two polylines have a
certain degree of fluctuation.
When the size of the data is very large, the complexity of an
algorithm is also an important metric to judge its performance.
Thus, the three schemes are compared in terms of time
complexity. The SIGMM model is established by the process
of prediction and updating. The number of user samples is
(c) Runtime of the three schemes
n, and the labeled training set sample is approximately 0.4n.
The labeled set is randomly divided into two parts. Each is
Fig. 8: Performance of SIGMM.
approximately 0.2n.
Therefore, the complexity of the algorithm is determined by
the following factors: (1) it can be concluded from experiments The algorithm complexity of HFCM comes from the estab-
that the initial values have significant influence on the accuracy lishment of the fuzzy matrix and updating of membership. For
of the model. We iterate k times to select the best model and n samples, d dimension features, k iterations, and C clusters,
calculate the initial value O(1). The complexity of testing the complexity is O(n2k ).
is O(n). The total complexity is then approximately O(n). Similarly, the complexity of RMA comes from computing
(2) The calculation of each sample’s distance to the two the mean of all dimensions, cluster centers, and the samples
distributions is approximately O(1). Adjust the parameters, in each cluster. The complexity is then approximately O(n).
for a total 0.6n of data indicates a complexity of O(n). So Fig. 8(c) shows a comparison of the time complexity for the
theoretically, the overall complexity is approximately O(n). three schemes with a dataset of 1,000 users.
1551-3203 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2018.2799907, IEEE
Transactions on Industrial Informatics
The X-axis represents the amount of data in Fig. 8(c), [6] S. Lu, V. H. Nascimento, J. Sun, and Z. Wang, “Sparsity-
and the Y-axis represents running time. With the increase of aware adaptive link combination approach over distribut-
the data, the difference in running times among the three ed networks,” Electronics Letters, vol. 50, no. 18, pp.
algorithms gradually increases, and the SIGMM model is 1285–1287, 2014.
significantly better than the other two for larger datasets in [7] E. Tan, L. Guo, S. Chen, X. Zhang, and Y. Zhao,
terms of running time. “Spammer behavior analysis and detection in user gener-
ated content on social networks,” in IEEE International
VI. C ONCLUSION Conference on Distributed Computing Systems, May. 16-
In order to solve the malicious attack problem in industrial 18, 2012, pp. 305–314.
mobile networks and reduce the computational complexity of [8] P. Heymann, G. Koutrika, and H. Garcia-Molina, “Fight-
using large cloud server datasets, this paper proposes SIGMM, ing spam on social web sites,” IEEE Internet Computing,
a spammer identification model based on the Gaussian Mixture vol. 11, no. 6, pp. 36–45, 2007.
Model. We extract features related to labels from originally [9] M. Al Hasan, V. Chaoji, S. Salem, and M. Zaki, “Link
labeled data in a given dataset containing both labeled and prediction using supervised learning,” in Proc of Sdm
unlabeled data, and visualize the data to add labels to the Workshop on Link Analysis Counterterrorism and Secu-
unlabeled data. rity, Apr. 26-28, 2006, pp. 798–805.
According to the characteristics of data presentation, each [10] F. Pedregosa, A. Gramfort, V. Michel, B. Thirion,
user data belongs to one distribution. Multidimensional fea- O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,
tures are divided into three groups, and SIGMM separates V. Dubourg, and J. Vanderplas, “Scikit-learn: Machine
the two distributions based on these features. Finally, we learning in python,” Journal of Machine Learning Re-
performed simulations to evaluate the performance of SIGMM. search, vol. 12, no. 10, pp. 2825–2830, 2013.
The results show that even if the relationships among users are [11] X. Zhu, Y. Nie, S. Jin, A. Li, and Y. Jia, “Spammer
not taken into account, it can implement classification. detection on online social networks based on logistic
Our work is based on binary classification, whereas in large regression,” in International Conference on Web-Age
networks, the types of users are varied and complex. Our future Information Management, Aug. 11-13, 2015, pp. 29–40.
work will extend the categories of users to multi-classifications [12] H. Jia, Y. M. Cheung, and J. Liu, “A new distance
such as celebrity, advertiser, hacker, etc. metric for unsupervised learning of categorical data,”
IEEE Transactions on Neural Networks and Learning
ACKNOWLEDGMENT Systems, vol. 27, no. 5, pp. 1065–1079, 2016.
[13] S. D. Xenaki, K. D. Koutroumbas, and A. A. Rontogian-
This work is supported by National Natural Science Foun- nis, “A novel adaptive possibilistic clustering algorithm,”
dation of China (Grant Nos. 61672131 and 61672379), the IEEE Transactions on Fuzzy Systems, vol. 24, no. 4, pp.
National Key Research and Development Program of Chi- 791–810, 2015.
na(Grant No. 2016YFB1000205) and the State Key Program [14] K. V. Laerhoven, “Combining the self-organizing map
of National Natural Science Foundation of China (Grant and k-means clustering for on-line classification of sensor
No.61432002). data,” in International Conference on Artificial Neural
Networks, Aug. 21-25, 2001, pp. 464–469.
R EFERENCES [15] X. Yang, Y. Wang, D. Wu, and A. Ma, “K-means based
[1] J. Miranda, N. Makitalo, J. Garcia-Alonso, J. Berrocal, clustering on mobile usage for social network analy-
T. Mikkonen, C. Canal, and J. M. Murillo, “From the sis purpose,” in International Conference on Advanced
internet of things to the internet of people,” IEEE Internet Information Management and Service, Nov. 30-Dec. 2,
Computing, vol. 19, no. 2, pp. 40–47, 2015. 2010, pp. 223–228.
[2] T. Qiu, A. Zhao, F. Xia, W. Si, and D. O. Wu, “Rose: [16] S. Yang, J. Kim, and M. Chung, “A prediction model
Robustness strategy for scale-free wireless sensor net- based on big data analysis using hybrid fcm clustering,”
works,” IEEE/ACM Transactions on Networking, vol. 25, in International Journal of Internet Technology and Se-
no. 5, pp. 2944–2959, 2017. cured Transactions, Dec. 14-16, 2015, pp. 337–339.
[3] L. Yao, Q. Z. Sheng, and S. Dustdar, “Web-based [17] M. A. Wiering and H. V. Hasselt, “Ensemble algorithms
management of the internet of things,” IEEE Internet in reinforcement learning,” IEEE Transactions on Sys-
Computing, vol. 19, no. 4, pp. 60–67, 2015. tems Man and Cybernetics Part B Cybernetics, vol. 38,
[4] T. Qiu, R. Qiao, and D. O. Wu, “Eabs: An event-aware no. 4, pp. 930–936, 2008.
backpressure scheduling scheme for emergency internet [18] F. Peyravi, V. Derhami, and A. Latif, “Reinforcement
of things,” IEEE Transactions on Mobile Computing, learning based search (rls) algorithm in social networks,”
vol. 17, no. 1, pp. 72–84, 2017. in International Symposium on Artificial Intelligence and
[5] T. Qiu, K. Zheng, H. Song, M. Han, and B. Kantar- Signal Processing, Mar. 3-5, 2015, pp. 206–210.
ci, “A local-optimization emergency scheduling scheme [19] S. Bhagat, G. Cormode, and S. Muthukrishnan, “Node
with self-recovery for smart grid,” IEEE Transactions on classification in social networks,” Computer Science,
Industrial Informatics, vol. 13, no. 6, pp. 3195–3205, vol. 16, no. 3, pp. 115–148, 2011.
2017. [20] T. Kajdanowicz and P. Doskocz, “Label-dependent fea-
1551-3203 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2018.2799907, IEEE
Transactions on Industrial Informatics
10
ture extraction in social networks for node classification,” Keqiu Li received the bachelors and masters degrees
in International Conference on Social Informatics, Oct. from the Department of Applied Mathematics at the
Dalian University of Technology in 1994 and 1997,
27-29, 2010, pp. 89–102. respectively. He received the Ph.D. degree from
[21] S. Lu and Z. Wang, “Accelerated algorithms for eigen- the Graduate School of Information Science, Japan
value decomposition with application to spectral clus- Advanced Institute of Science and Technology in
2005. He also has two-year postdoctoral experience
tering,” in The 49th Asilomar Conference on Signals, in the University of Tokyo, Japan. He is currently
Systems and Computers(Asilomar), Pacific Grove, CA, a professor with the School of Computer Science
USA, 2015, pp. 355–359. and Technology, Tianjin University, China. He has
published more than 100 technical papers, such as
[22] T. Chen, C. Huang, E. Chang, and J. Wang, “Automatic IEEE TPDS, ACM TOIT, and ACM TOMCCAP. He is an Associate Editor
accent identification using gaussian mixture models,” in of IEEE TPDS and IEEE TC. He is a senior member of IEEE. His research
Automatic Speech Recognition and Understanding, 2001. interests include internet technology, data center networks, cloud computing
and wireless networks.
ASRU ’01. IEEE Workshop on, Dec. 9-13, 2001, pp. 343–
Huansheng Ning received a B.S. degree from
346. Anhui University in 1996 and Ph.D. degree
[23] A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and in Beihang University in 2001. Now, he is a
T. Raiko, “Semi-supervised learning with ladder net- professor and vice dean of School of Computer
and Communication Engineering, University of
works,” in Advances in Neural Information Processing Science and Technology Beijing, China. His
Systems, Dec. 7-12, 2015, pp. 3546–3554. current research focuses on Internet of Things,
[24] P. K. Mallapragada, R. Jin, A. K. Jain, and Y. Li- cyber-physical modeling. He is the founder of
Cyberspace and Cybermatics and Cyberspace
u, “Semiboost: Boosting for semi-supervised learning,” International Science and Technology Cooperation
IEEE Transactions on Pattern Analysis and Machine Base. He serves as an associate editor of
Intelligence, vol. 31, no. 11, pp. 2000–2014, 2009. IEEE System Journal and IEEE Internet of Things Journal. He is the
Co-Chair of IEEE Systems, Man, and Cybernetics Society Technical
[25] S. J. Roberts, D. Husmeier, I. Rezek, and W. Penny, Committee on Cybermatics. He has hosted the 2013 World Cybermatics
“Bayesian approaches to gaussian mixture modeling,” Congress (WCC2013/iThings2013/CPSCom2013/Greencom2013),
IEEE Transactions on Pattern Analysis and Machine and the 2015 Smart World Congress (Smart-
World2015/UIC2015/ATC2015/ScalCom2015/CBDCom2015/IoP2015)
Intelligence, vol. 20, no. 11, pp. 1133–1142, 1998. as the joint executive chair. He gained the IEEE Computer Society
Meritorious Service Award in 2013, IEEE Computer Society Golden Core
Award in 2014.
Tie Qiu Dr. Tie Qiu (M’12-SM’16) received M.Sc
and Ph.D degree in computer science from Dalian
University of Technology, in 2005 and 2012, respec- Arun Kumar Sangaiah received his Ph.D from VIT
tively. He is currently Full Professor at School of University and Master of Engineering from Anna
Computer Science and Technology, Tianjin Univer- University, in 2007 and 2014, respectively. He is cur-
sity. Prior to this position, he held assistant professor rently Associate Professor at School of Computing
(2008) and associate professor (2013) at School Science and Engineering, VIT University, Vellore,
of Software, Dalian University of Technology. He India. He was a visiting professor at School of com-
was a visiting professor at electrical and computer puter engineering at Nanhai Dongruan Information
engineering at Iowa State University in U.S. (Jan. Technology Institute in China (September. 2016-Jan.
2014 to Jan. 2015). He serves as an Associate Editor 2017). He has published more than 130 scientific
of IEEE Access Journal, Computers and Electrical Engineering (Elsevier papers in high standard SCI journals like IEEE-
journal) and Human-centric Computing and Information Sciences (Springer TII, IEEE-Communication Magazine, IEEE systems,
Journal), an Editorial Board Member of Ad Hoc Networks (Elsevier journal) IEEE-IoT, IEEE TSC, IEEE ETC and etc. In addition he has authored/edited
and International Journal on AdHoc Networking Systems, a Guest Editor over 8 books (Elsevier, Springer, Wiley, Taylor and Francis) and 50 journal
of Future Generation Computer Systems (Elsevier journal). He serves as special issues such as IEEE-Communication Magazine, IEEE-IoT, IEEE
General Chair, PC Chair, Workshop Chair, Publicity Chair, Publication Chair consumer electronic magazine etc. His area of interest includes software
or TPC Member of a number of conferences. He has authored/co-authored engineering, computational intelligence, wireless networks, bio-informatics,
8 books, over 100 scientific papers in international journals and conference and embedded systems. Also, he was registered a one Indian patent in the
proceedings, such as IEEE ToN, TMC, TII, TIP, IEEE Communications, area of Computational Intelligence. Besides, Prof. Sangaiah is responsible
IEEE Systems Journal, IEEE IoT Journal, Computer Networks etc. He has for Editorial Board Member/Associate Editor of various international SCI
contributed to the development of 4 copyrighted software systems and invented journals.
15 patents. He is a senior member of China Computer Federation (CCF) and
a Senior Member of IEEE and ACM.
Heyuan Wang received B.E. degree in mathematics Baochao Chen received B.E. from Dalian Universi-
from Changchun University of Science and Tech- ty of Technology, China, in 2016. He is currently
nology (CUST), China, in 2015. She is a master master student in School of Software, Dalian U-
student in school of software engineering, Dalian niversity of Technology (DUT), China. He is an
University of Technology (DUT), China. She is a excellent graduate student of DUT and has been
member of the Smart Cyber-Physical Systems Labo- awarded several scholarships in school work excel-
ratory (SmartCPS Lab). She is an excellent graduate lence and spiritual civilization. He has participated
student of CUST and and has been awarded several in science and technology competition many times
scholarships in academic excellence. Her research and achieved good results, for example, "Citi Cup"
interests cover machine learning and internet of financial innovation application contest third prize.
things. His research interests cover embedded system, large-
scale internet of things, distributed computing and internet of vehicle.
1551-3203 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.