A Survey of Deep Learning
A Survey of Deep Learning
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
Abstract—Deep learning has exploded in the public conscious- investing in deep learning technologies to supply hardware and
ness, primarily as predictive and analytical products suffuse our software innovations that can further improve deep learning
world, in the form of numerous human-centered smart-world performance, which can be used for next generation smart-
systems, including targeted advertisements, natural language
assistants and interpreters, prototype self-driving vehicle systems, world products [4].
etc. Yet to most, the underlying mechanisms that enable such Though regression analysis and auto-encoding are not new
human-centered smart products remain obscure. In contrast, topics in the field of machine learning, deep learning imple-
researchers across disciplines have been incorporating deep mentations can provide higher accuracy and better predictive
learning into their research to solve problems that could not performance, and are more flexible and configurable. As one
have been approached before. In this paper, we seek to provide
a thorough investigation of deep learning in its applications and of the largest areas of deep learning applications, supervised
mechanisms. Specifically, as a categorical collection of state-of- learning tasks for classification have far outstripped even
the-art in deep learning research, we hope to provide a broad human abilities in areas like handwriting and image recogni-
reference for those seeking a primer on deep learning and its tion [99], [73]. In addition, unsupervised learning on datasets
various implementations, platforms, algorithms, and uses in a without any particular labels has shown the potential for
variety of smart-world systems. Furthermore, we hope to outline
recent key advancements in the technology, and provide insight the extraction of unforeseen analytical and commercial value
into areas, in which deep learning can improve investigation, in the form of clustering and statistical analysis. Potentially
as well as highlight new areas of research that have yet to see the most interesting yet, reinforcement learning provides the
the application of deep learning, but could nonetheless benefit potential for deep learning without human supervision, through
immensely. We hope this survey provides a valuable reference feedback from a connected environment. This type of deep
for new deep learning practitioners, as well as those seeking to
innovate in the application of deep learning. learning has been heavily applied to the field of robotics and
computer vision [19].
Index Terms—Human-centered Smart Systems, Deep Learn- With the unceasing growth of IoT and smart-world systems
ing, Platform, Neural Networks, Emergent Applications, Internet
of Things, Cyber-Physical Systems, Survey, Networking and driven by the advance of CPS, in which all devices are network
Security. connected and able to communicate sensed data and monitor
physical objects, larger and larger datasets are becoming avail-
able for the application of deep learning, poised to materially
I. I NTRODUCTION impact our daily lives [161], [82], [143], [91], [144], [81],
Along with Big Data and Analytics [149], [79], Cloud/Edge [36], [89]. For example, smart transportation systems will
Computing-based Big Computing [155], [120], and the In- interconnect self-driving vehicles and infrastructure networks
ternet of Things (IoT)/Cyber-Physical Systems (CPS) [125], to revolutionize daily mass transit, virtually eliminating col-
[143], [148], [82], [80], [161], [127], [147], the topic of Deep lisions and enabling secondary electrical grid storage. Smart
Learning has come to dominate industry and research spheres cities shall enable the optimization of resource management
for the development of a variety of smart-world systems, and via command and control in nearly all domains, from elec-
for good reason. Deep learning has shown significant potential tricity, communications, and other utilities, to construction,
in approximating and reducing large, complex datasets into transportation, and emergency response. Smart wearables and
highly accurate predictive and transformational output, greatly tele-health devices collecting diagnostic data may reveal trends
facilitating human-centered smart systems [25], [98]. In con- that could prolong human life through disease and pattern
trast to complex hard-coded programs developed for a sole discovery, creating a research population of unimaginable
inflexible task, deep learning architectures can be applied to all scale. Smartphones have afforded the massive creation of rich
types of data, be they visual, audio, numerical, text, or some textual, audio, and visual data from various social media
combination. In addition, advanced deep learning platforms applications and embedded sensors, and likewise massive
are becoming ever more sophisticated, often open source and location and population movement data via embedded GPS
available for widespread use. Furthermore, major companies, modules. It is clear that all of these applications, alone or in
including Google, Microsoft, Amazon, Apple, etc., are heavily combination, generate unprecedented Big Data. As a solution
to the processing, dimensionality reduction, compression, and
Corresponding Author: Prof. Wei Yu (Email: [email protected]). extraction of such Big Data, deep learning provides the most
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
immediately relevant and appropriate tools, enabling the rapid learning applications and mechanisms to illuminate areas, in
analysis of complex data that spans a variety of modalities. which deep learning has yet to make significant contributions.
The primary contributions of this paper are summarized The remainder of this paper is as follows. In Section II, we
below: provide a brief overview of deep learning. In Section III, we
categorize deep learning objectives, mechanisms, and algorith-
• We provide an overview of deep learning technologies,
mic approaches. In Section IV, we outline various common
commenting briefly on the history of machine learning
platforms for implementing deep learning architectures. In
and distinguishing deep learning from constituent shallow
Section V, we present a broad review of the applications of
learning techniques. We introduce a definition of deep
deep learning. In Section VI, we highlight areas of future deep
learning, describe the basic operations of neural networks,
learning application and research. Finally, in Section VII, we
and further describe some basic types of neural networks
provide concluding remarks.
applied generally.
• We categorize deep learning by learning mechanism (e.g.,
supervised, unsupervised, and reinforcement learning). II. OVERVIEW OF D EEP L EARNING
Each learning mechanism is presented, along with the Machine learning incorporates a vast array of algorithmic
subcategories of output tasks, and common algorithmic implementations, not all of which can be classified as deep
examples. learning. For example, singular algorithms, including statis-
• We provide brief descriptions of prominent deep learning tical mechanisms like Bayesian algorithms, function approxi-
platforms, commenting on their intended applications, mation such as linear and logistic regression, or decision trees,
utility, and some implementation specifics. We note useful while powerful, are limited in their application and ability
properties of extensibility and interoperability, and high- to learn massively complex data representations. Deep learn-
light benchmarked platform comparisons. ing has developed from cognitive and information theories,
• We provide a thorough and detailed investigation into the seeking to imitate the learning process of human neurons and
numerous areas where deep learning has been broadly create complex interconnected neuronal structures. As one of
applied. In particular, we demonstrate that deep learning the key concepts of computing neurons and the neural model,
has advanced the state-of-the-art in image and video the ability for a generic neuron to be applied to any type of
processing, audio processing, text analysis and natural data and learn indiscriminately is a powerful concept [99]. In
language processing, autonomous systems and robotics, essence, there is no singular structure for each application, but
medical diagnostics, computational biology, physical sci- instead a generally applicable model for all applications.
ences, finance and economics, and cyber security, among Inherent to the process of machine learning are the concepts
others. Furthermore, we note advances in algorithmic and of training (iterative improvement in learning) and inference
architectural mechanisms in deep learning research. (the extraction of output of a trained model from some
• Having reviewed areas of deep learning advancement, we practical input). In training a model, a volume of data is split
provide insights into areas where deep learning has not into training and testing sets, and likely a validation set as
been applied, or has been applied minimally. Of prime well. A machine learning algorithm is given the training data
importance, deep learning acceleration and optimization to learn some representation of, which could be in the form of
will hasten the realization of in-device IoT and mobile a function approximation of the given feature distributions or
deep learning. In addition, distributed deep learning for as a set of decisions based on the contributions of each feature,
IoT and CPS must implement schemes to operate under among others. The validation set would also be used during
an edge computing paradigm to better serve constrained training, but as a method to validate the effectiveness of the
devices. Applications of deep learning for network oper- training process, the result of which is applied to tune learning
ation, management, design, and control remain relatively parameters of the algorithm and improve the final accuracy.
unexplored, yet technologies are advancing to allow in- The test set would then provide a previously unobserved set
ference on continuous high-throughput streams. Finally, of data to determine the final accuracy of the trained model,
the security of implemented deep learning networks and and is generally the source of the reported accuracy scores
models, and specifically their resilience to attacks, is a and other effectiveness metrics. The term inference, then, can
critical issue given the rapid adoption of deep learning be considered as the process of inputting a data item into a
technologies coupled with prominent examples of their trained and implemented machine learning model and getting
subversion. an inferred output.
In addition, compared to other survey works on the topic With the advancement of computing technologies, the im-
of deep learning [109], [67], [98], [68], [19], [38], our work plementation of large collections of neurons was possible, giv-
takes a broad view of all fields/applications to which deep ing rise to neural networks. Indeed, though neural networks are
learning has been applied, and their contributions to the study becoming commonplace, they are actually an old technology
and improvement of deep learning. Particularly, other works [44] that fell out of favor because of complexity and computing
focus primarily on the advances and needs of a single learning deficiencies. Nonetheless, this has clearly changed, thanks in
mechanism or modality [109], [19], or towards improvements no small part to the applications at which neural networks
in a single application [38], [67], [68], [98]. Instead, in this have excelled. Examples include winning the ImageNet object
paper, we primarily focus on a sweeping evaluation of deep recognition competition [73], in which neural networks can
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
exceed even human accuracy, or beating humans in the game III. C ATEGORIZATION OF D EEP L EARNING
of Go without having received any direct input or game We now provide a categorical review of deep learning
sessions against human players [123]. architectures by learning mechanism and learning output task,
By definition, deep learning is the application of multi- and provide brief descriptions of the many algorithmic im-
neuron, multi-layer neural networks to perform learning tasks, plementations of each. The primary learning mechanisms are
including regression, classification, clustering, auto-encoding, supervised learning, unsupervised learning, and reinforcement
and others. Conceptually, the most basic computational neu- learning. In general, learning mechanisms are classified by the
ron, the sigmoid neuron, can be considered as a single logistic type of input data that they operate upon. Output tasks include
node (though there are many other algorithms that can be im- classification, regression, dimensionality reduction, clustering,
plemented as activation functions). Each neuron is connected and density estimation [38].
to the input ahead of it, and a loss function is used to update
the weights of the neuron and optimize the logistic fit to the A. Supervised Learning
incoming data. As part of a neural network layer, multiple
parallel neurons initiated with different weights learn on the Supervised learning is so named because of the requirement
same input data simultaneously. In the application of multiple that the data investigated be clearly labeled, and thus the
layers of multiple nodes, each node learns from all the outputs result of the output can be supervised, or classified as correct
of the previous layers, stepwise reducing the approximation of or incorrect. In particular, supervised learning is used as a
the original input data to provide an output representation set. predictive mechanism, in which a portion of the data is learned
Thus, the complexity of multiple interconnected neurons is upon (otherwise known as the training set), another portion is
evident. used to validate the trained model (cross-validation), and the
The general structure of deep neural networks is shown in remainder is used to determine the accuracy and effectiveness
Fig. 1 and Fig. 2, in which an input layer representing the raw in prediction. Though accuracy is an important metric, other
input data (blue) feeds into multiple hidden layers of varying statistical mechanisms, such as precision, recall, and F1 score,
types (yellow), finally exiting as some output (white), which are used to assess the ability of a trained model to generalize
can represent regressive values, classification values, etc. Many to new data. The two primary learning tasks in supervised
types of neurons can be implemented, and likewise there are learning are classification and regression.
many types of layers depending on the necessary or desired Classification: In classification, the output of the learning
function thereof. The most basic layer is a fully-connected task will be of a finite set of classes. This can take the form
layer, in which all neurons are fully connected to all input, of binary classification of only two classes (0 or 1), multi-
as demonstrated in Fig. 1a. In contrast, to reduce over-fitting, class classification resulting in one class out of a set of three
some connections are removed, usually in a random manner or more total classes (red, green, blue, etc.), as multi-label
by some percentage. This type of layer is called a dropout classification, where objects can belong to multiple binary
layer, as demonstrated in the two hidden layers of Fig. 1b. classes (red or not red, and car or not car), and even as
Fig. 2 represents a convolutional neural network (CNN), as all pairs classification, in which every class in a finite set
opposed to a more general recursive neural network (RNN). In is directly compared to every other class in a binary way
RNNs, some form of optimization is used to recursively update [103]. In all pairs classification, comparing red, green and
the weights of the neurons based on the loss function results blue, the resulting output would be test: red vs. green, red
after each learning step. In CNNs, alternating convolution vs. blue, and green vs. blue. Examples for deep learning
and pooling layers are added prior to fully-connected or applications of classification include binary output in malware
dropout layers. Convolutional layers are used to filter large detection (Malicious and Benign) [48], as well as non-binary
multidimensional matrices, such as the Red, Green, and Blue classification of handwritten numbers, as in the MNIST dataset
channels of an 2-dimensional image, into feature map. The [99].
pooling layer then spatially reduces the size of the feature Regression: In contrast to classification, the output of
map into a smaller and more manageable matrix. In essence, regression learning is one or more continuous-valued numbers.
the convolution layers reduce the complexity of the image Regression analysis is a convenient mechanism to provide
by some filter (identity, edge detection, sharpening, etc.), and scored labels equivalent to multi-label classification, where
the pooling layers reduce the size of each filtered result. each item of a set has a probability of belonging (i.e., 0.997
Notice that multiple filters are typically applied to extract red, 0.320 green, 0.008 blue). Regression has been applied in
parallel and complementary features [131]. In addition, not various areas, including monocular image object recognition
all networks have this progressively reductive layer shape. for outdoor localization [97], among others.
Stacked autoencoders (SAEs) [38], for example, typically have
an hourglass shape, first reducing dimensionality, and then B. Unsupervised Learning
expanding beck to a larger feature set. Similarly, generative In unsupervised learning, datasets provided as input for
adversarial networks (GANs) [45] are composed of generator machine learning are not labeled in any way that determines
and discriminator networks, where the output of the generator a correct or incorrect result. Instead, the result may achieve
network is typically the same feature set as the input, and some broader desired goal, be judged on the ability to find
the final layers may be deconvolutional, complementing the something that is easily human-discernible, or provide a com-
convolutional layers of the input. plex application of a statistical function to extract an intended
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
(a) (b)
Fig. 1. Representative neural networks, where (a) is fully connected, and (b) includes dropout
value. For instance, clustering algorithms may cluster data into means to make cluster selection [111], [150]. In addition, deep
hard or fuzzy groups as desired, but, without an appropriate neural network architectures can provide deep learning im-
visual representation, it may be difficult to tell whether the plementations for cluster analysis [38]. Examples include the
clustering was indeed appropriate. Similarly, density estima- use of Self-Orienting Feature Maps (SOFMS) to satisfy real-
tion provides just that, an estimation, which may or may not time image registration [42], and the TSK_DBN fuzzy learning
be appropriate to the dataset, and auto-encoders can reduce network that combines the Takagi-Sugeno-Kang (TSK) fuzzy
and encode data efficiently, to be used for compression or system with a Deep Belief Network (DBN) [163], among
dimensionality reduction. Nonetheless, the ability to extract others.
the compressed representation accurately may still need to be Density Estimation: Density estimation, in general, is the
tested to determine the appropriateness of the implementation statistical extraction or approximation of features of a data
[38]. distribution, such as the extraction of densities of subgroups
Dimensionality Reduction: Dimensionality reduction can of data to evaluation correlations, or the approximation of the
be carried out in various ways, including different forms of data distribution as a whole. Examples of density estimation
component and discriminant analysis. As an example, auto- include the estimation of power spectral density for noise
encoders can transform input data into a reduced or encoded reduction in binaural assisted listening devices [92], and
output for the purposes of data compression or storage space intersection vehicle traffic density estimation utilizing CNNs
reduction. Examples of dimensionality reduction include the on heterogeneous distributed video [152].
reduction of sequential data, such as video frames, to reduce
noisy or redundant data while maintaining important features
of the original data [126], or the use of deep belief networks to C. Reinforcement Learning
reduce dimensionality of hyperspectral (400-2500 nm) images Reinforcement learning can be considered as an intermedi-
of landscapes to determine plant life content [18]. ate between supervised and unsupervised learning, because,
Clustering: Clustering algorithms are used to statistically though data is not explicitly labeled, a reward is supplied
group data. Generally speaking, this occurs through the alter- upon the execution of an action. More specifically, the learning
nating selection of cluster centroids, and cluster membership. architecture in reinforcement learning interacts with the envi-
For example, k-means and fuzzy c-means clustering utilize ronment directly, such that a change in the environment returns
the least mean square error of the distances between clusters a specific reward. The goal of the reinforcement learning
and centroids [28], [26]. In the latter, fuzzing allows data system is to maximize the reward of every state transition
membership in multiple cluster centroids, making the edges by learning the best actions to take at each given state.
of the clusters “fuzzy". Other clustering algorithms utilize This is embodied by the perception-action-learning loop, as
the Gaussian Mixture Model (GMM), or other statistical and demonstrated in Fig. 3. This loop can occur for infinite time,
probabilistic mechanisms, instead of Euclidean Distance, as a or can be applied in sessions, to learn to maximize the outcome
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
B. DeepLearning4J
Deep Learning for Java (DL4J) is a robust, open-source
distributed deep learning framework for the JVM created
by Skymind [5], which has been contributed to the Eclipse
Foundation and their Java ecosystem. DL4J is designed to
Fig. 3. Reinforcement Learning Model be commercial-grade as well as open source, supporting Java
and Scala APIs, operating in distributed environments, such
Policy Search: Policy search can be carried out by gradient- as integrating with Apache Hadoop and Spark, and can im-
based (via backpropagation) or gradient-free (evolutionary) port models from other deep learning frameworks (Tensor-
methods, to directly search for an optimal policy. These typi- Flow, Caffe, Theano) [6]. It also includes implementations
cally output parameters for a probability distribution, either for of restricted Boltzmann machines, deep belief networks, deep
continuous or discreet actions, resulting in a stochastic policy stacked autoencoders, recursive neural networks, and more,
[19]. Though prior implementations of Google’s AphaGo pro- which would need to be built from the ground up or through
gram, which were the first to beat a professional human player example code in many other platforms.
without handicap [122], were a hybrid of policy search and
value function approaches, the most recent implementation,
C. Theano
AlphaGo Zero, is entirely policy search-based, learned without
any human input, and significantly outperforms the prior Theano is a highly popular deep learning platform designed
implementations. primarily by academics which, unfortunately, is no longer
Value Function: Value function methods operate by esti- supported after release 1.0.0 (November, 2017). Initiated in
mating the expected return of being in a given state, attempting 2007, Theano is a Python library designed for performing
to select an optimal policy, which chooses the action that mathematical operations on multi-dimensional arrays and to
maximizes the expected value given all actions for a given optimize code compilation [10], primarily for scientific re-
state. The policy can be improved by iterative evaluation search applications. More specifically, Theano was designed to
and update of the value function estimate. The state-action surpass other Python libraries, like NumPy, in execution speed
value function, otherwise known as the quality function, is the and stability optimizations, and computing symbolic graphs.
source of Q-learning [19], [21]. An alternative to the quality Theano supports tensor operations, GPU computation, runs on
function, the advantage function represents relative state-action Python 2 and 3, and supports parallelism via BLAS and SIMD
values, as opposed to absolute state-action values [19]. As a support.
seminal work on the application of Q-learning and Deep Q-
Networks (DQN), Mnih et al. [93] implemented a DQN to play D. Torch
49 different Atari 2600 videogames, observing four frames Torch is also a scientific computing framework, however
as environment data, extracting the game score as reward, its focus is primarily on GPU accelerated computation. It
with controller and button combinations encoded as actions. is implemented in C and provides its own scripting lan-
Their DQN implementation outperformed human users in the guage, LuaJIT, based on Lua. In addition, Torch is mainly
majority of games, as well as outperforming the best linear supported on Mac OS X and Ubuntu 12+, while Windows
learners handily. implementations are not officially supported [11]. Nonetheless,
implementations have been developed for iOS and Android
IV. D EEP L EARNING P LATFORMS mobile platforms. Much of the Torch documentation and im-
In this section, we provide an overview of popular open- plementations of various algorithms are community driven and
source deep learning platforms. This list is not exhaustive, but hosted on GitHub. Despite the GPU-centric implementation,
is meant to provide a reference for deep learning practitioners. a recent benchmarking study [119] demonstrated that Torch
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
does not surpass the competition (CNTK, MXNet, Caffe) in can integrate other common machine learning packages, such
single- or multi-GPU computation in any meaningful way, but as scikit-learn for Python [7]. In addition, it has been widely
is still ideal for certain types of networks. adopted by researchers and industry groups over the last year.
G. MXNet
A. Image and Video Recognition/Classification
Apache MXNet supports Python, R, Scala, Julia, C++, and
As generally the largest area of deep learning investigation,
Perl APIs, as well as the new Gluon API, and supports both
image and video processing, recognition, and detection have
imperative and symbolic programming. The project began
seen explosive growth in recent years. This is in no small
around Mid-2015, with version 1.0.0 released in December of
part due to the many machine learning competitions, such
2017. MXNet was intended to be scalable, and was designed
as ImageNet [73]. In image and video processing, typical
from a systems perspective to reduce data loading and I/O
deep structures are convolutional neural networks, which first
complexity [1]. It has proven to be highly efficient primarily
convolve multiple channels of images and then pool the image
in single- and multi-GPU implementations, while CPU imple-
layers, layer by layer reducing the size of the image or frame
mentations are typically lacking [119].
field, before passing the results to fully-connected layers.
Image and video processing has been applied to many fields
H. Keras of study, including autonomous systems, medical imaging,
Though not a deep learning framework on its own, Keras astrophysics, biometric analysis, etc.
provides a high-level API that integrates with TensorFlow, For example, in the area of bioinformatics, Thammasorn et
Theano, and CNTK. The strength of Keras is the ability to al. [129] created a three-layered extractor or triplet network of
rapidly prototype a deep learning design with a user-friendly, CNNs, fed into a comparator network, to extract features from
modular, and extensible interface. Keras operates on CPUs and gamma images, in which no known suitable features exist.
GPUs, supports CNNs and RNNs, is developer-friendly, and These images, and the resulting feature, can be utilized to
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
detect potential errors in radiation therapy delivery. To carry comparison, using Euclidean distance as the similarity metric
out human action recognition, Ijjina et al. [59] utilized RGB- of the last-but-one fully connected layer, which was taken as
D camera data from ConvNet, NATOPS, and other datasets to the feature vector.
develop learned temporal templates for pattern recognition via
CNN.
In considering the reconstruction of compressed data, Iliadis
Other examples in category of human action recognition
et al. [60] demonstrated the use of deep neural network archi-
include [124] and [54]. Particularly, Srinivasa et al. [124]
tecture for compressed video sensing. Their proposed schemes
utilized Long Short Term Memory (LSTM) for regression
for trained fully-connected networks outperform competing
analysis of facial expression responses of individuals watching
schemes in reconstructing compressed high-definition video.
advertisements. Using the Affectiva-MIT Facial Expression
Similarly, Adler et al. [12] applied deep learning to block-
Dataset, the authors train a model to extract expressiveness
based compressed sensing (BCS) for simultaneous learning of
over a group of frames to better understand the advertise-
a linear sensing matrix and non-linear reconstruction operator.
ment’s impact on the viewer. The authors additionally cluster
As a means to reconstruct sparse signals that have linear
the results to extract meaningful user states during viewing.
transforms from high-dimensional images and video, their
Hong et al. [54] extracted 3-dimensional hand poses from 2-
proposed method outperforms comparable BCS mechanisms
dimensional images, known as Hand Pose Recovery (HPR).
in terms of peak signal to noise ratio vs. sensing rate, as well
The authors implement semi-supervised learning (combining
as in execution time.
labeled and unlabeled data), using low rank representation
to map unlabeled data into labeled data space achieved via
autoencoders, and utilize a ReLU activated neural network to Various works have also explored the improvement of deep
perform classification. image learning networks through variations on architectures
As applied for general object recognition tasks, Guo et al. and constrains. For example, Higgins et al. [53] developed a
[47] developed an approach to improve 3D object recognition deep unsupervised generative framework for disentangled fac-
based on multi-view 2D images. Specifically, the authors tor learning on raw image data. The authors applied constraints
increase intra-object variation and reduce inter-object varia- inspired by neuroscience (i.e., data continuity, redundancy
tion through the application of a Deep Embedding Network reduction, and statistical independence), and demonstrated
supervised by triplet and classification loss. This framework that disentangled representations enable zero-shot inference
converts the problem to a set-to-set matching problem, and and can individually encode factors of variation. Likewise,
the resulting DeepEm(M) implementation outperforms thirteen He et al. [50] explored significantly increasing DNN depth,
other methods on the MV-RED-721 dataset, and significantly which causes accuracy degradation and high training error.
improves upon precision and recall. In addition, Hickson et al. In response, the authors developed a deep residual learning
[52] investigated semantic classification of objects in images architecture, in which layers fit a residual of the previous
via weak supervision, and proposed a fully differentiable mapping via shortcut connections. The layers learn residual
unsupervised deep clustering model. In their study, K-means functions, which are referenced to layer inputs, are easy to
clustering was used to learn parameters of the network, build- optimize, and gain accuracy and reduce error. Borkar and
ing features while simultaneously learning to cluster them, Karam [23] investigated the impact of image distortion on
and storing cluster means as weights. Data was provided pre-trained convolutional filters used in deep learning neural
as objects vs. background using segmentation masks, and network and designed an approach, called DeepCorrect to
clustering was performed only on foreground objects. Another improve the robustness of deep learning neural network against
object recognition example is the image-based search engine image distortion. In addition, Dodge and Karam [33] inves-
developed by retraining a pretrained GoogleNet Inception-v3 tigated issue of performance impact of several deep learning
CNN model using transfer learning [61]. In this study, the neural network on image classification when quality distortion
network was applied as a feature extractor for nearest neighbor is in place.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
B. Audio Processing XAB evaluations, the proposed GAN model outperforms the
conventional minimum generation error DNN training method.
Audio signal processing is typically concerned with noise
reduction and data compression that maintains the maximum
value for human listeners, highly relevant to the field of C. Text Analysis and Natural Language Processing
audiology. Deep auto-encoders have shown great promise in Massively adopted mobile devices have enabled continuous
this area, and the ability to discriminate voices, languages, on-demand computing from anywhere. The social applications
and background noise from multiple and singular microphone that comprise up the bulk of common user interactions create
input is a significant gain that deep learning has the potential to continuous massive data, which can be harvested and analyzed
realize. In addition, natural language processing is concerned for sentiment and social understanding. Text and natural lan-
with the detection, comprehension, and also the translation guage processing affords the potential for on-the-fly language
of spoken language. Widely used smart assistants have only translation and the communication of humans and computer
increased in complexity, and translation applications are be- systems via natural speech.
coming ever more sophisticated. Related to sentiment classification through text analysis,
As mentioned previously, Marquardt and Doclo [92] con- there have been a number of research efforts. For instance,
ducted noise power spectral density estimation for binau- Araque et al. [17] developed a deep learning sentiment clas-
ral noise reduction. They exploited the direction of arrival sifier and proposed two ensemble techniques to aggregate the
to improve noise reduction, as demonstrated in their sim- baseline classifier with other widely used surface classifiers.
ulation. Also relevant to noise reduction and enhancement, Combining both surface and deep features, the authors merge
Zhang [164] proposed a cost-sensitive deep ensemble learning information from several common sources and conduct a
mechanism based upon a cost-sensitive objective function, performance evaluation, which confirms that the performance
cost-sensitive oversampling, and cost-sensitive undersampling, of the proposed models surpass that the baseline. In addition,
to improve multi-condition (i.e., multi-environment) training. Severyn et al. [116] utilized deep learning to re-rank short text
Specifically, as applied to speech separation in varyingly noisy pairs for optimal representation and similarity approximation
environments, the influence of low signal-to-noise ratio (SNR) without curated feature engineering. Convolutional networks
on training error can be improved via varying the learning were used to learn query and sentence documents separately,
objective and sampling with SNR. In addition, Sainath et al. and then joined with additional similarity matching score into
[112] applied a deep neural network framework to jointly the fully connected network. Experiments demonstrated excep-
perform multichannel enhancement and acoustic modeling for tional performance on the TREC: QA dataset, and comparable
automatic speech recognition (ASR). The acoustic model was performance in tweet re-ranking.
trained by applying a convolutional filter network to reduce Other types of classification of textual data have been
multiple microphone signals, and then the acoustic model accomplished, such as by Majumder et al. [90], who imple-
was learned in a convolutional LSTM network. A neural mented deep CNNs to extract personality traits from stream of
adaptive beamforming model was additionally developed to consciousness essays, combining n-gram and document-level
allow adaptation to changing conditions during decoding. features. Each of five major traits were trained in an individual
Toward the classification of audio signals, Sharan and Moir CNN with binary output, and experiments on minor variations
[118] compared SVM and DNN performance in classifying of their model showed increased accuracy on individual traits,
environmental noise and sound. Despite increased training with only one model outperforming on a majority of traits
time in DNNs, which more than doubled the testing time versus the state of the art. Similarly, Kowsari et al. [71]
of the SVM implementation, the accuracy of the DNN was proposed a hierarchical deep learning architecture for text-
greater in all scenarios. In a relevant but distinct task, Luo based document classification, in which each deep learning
et al. [84] applied DNN with dropout and SAE to detect and model is constructed of fully connected DNN, RNN with gated
classify audio recordings as original versus captured. A means recurrent units (GRUs) and LSTM, and CNN. The framework
to determine whether some audio recording might have been employs a parent model trained on a set of global classes,
illegally re-recorded, samples are normalized and segmented which output to child models that each learn on a distinct set
into 20 or 40 non-overlapping frames for evaluation. Both of subclasses belonging to a single global class.
methods are able to reduce error to approximately 7.5 %, and Deep learning has also been applied toward the generation
after applying majority voting all frames in a 2-second clip, of convincing conversational and labeling texts. For example,
the detection rate can reach over 99 %. Li et al. [75] applied deep reinforcement learning for natural
Finally, regarding the synthesis of audio signals, and in dialog simulation. In their study, an LSTM encoder-decoder
particular, human speech, Gonzalez et al. [43] presented a architecture is applied to simulate two virtual agents and
technique for synthesizing audible speech from sensed artic- optimize long term reward via policy gradient. The reward
ulator movement. Based on permanent magnet articulography combines the constraint of subsequent responses based on
(PMA), the authors synthesize speech from learned biosignals the prior responses (forward-looking), penalization of repe-
via Gaussian Mixture Model and RNNs, achieving 92 % tition (informativity), and mutual information between prior
intelligibility. In addition, Saito et al. [113] utilized GANs responses and the current response (grammatical coherency).
for statistical parametric speech synthesis to alleviate common Additionally, Zhang et al. [162] utilized CNNs for image
over-smoothing effects. Through a series of subjective AB and detection and classification from aerial landscapes, and then
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
constructed natural language descriptions, utilizing the class [105] developed a model for simultaneous 3D object detection
labels via a recurrent neural network language model. Their and pose estimation in a single deep CNN, greatly increasing
results showed 67 % correct and 30 % partly correct descrip- the efficiency of detection and pose estimation over other state-
tions. of-the-art systems. They implemented a variant of the single
shot detection (SSD) network with additional pose outputs,
and conducted tests on the Pascal 3D+ dataset to verify
D. Autonomous Systems and Robotics
design choices, such as shared pose output across objects, and
The field of robotics has seen incredible strides in the abil- combined Pascal and ImageNet annotated data with coarse-
ity to create free-standing, un-tethered autonomous walking grained training for increased accuracy.
robots, as well as in autonomous flight, driving, and naviga- As applied in particular to autonomous vehicle systems,
tion, among others. Deep learning has been widely applied Dairi et al. [30] developed an unsupervised object detection
to enable diverse sensory input to assist in the workings system based on a hybrid implementation of deep Boltzman
of autonomous machines in manufacturing and commercial machines (DBMs), auto-encoders (AEs), and support vector
spaces, and these cannot be wholly removed from video machines (SVMs), utilizing stereovision as input. In their sys-
recognition an most cases. tem, the DBMs and AEs are combined for feature extraction
In the area of robotic manipulation, deep learning has and encoding, and the SVMs are used for anomaly detection.
produced significant strides toward the rapid training of robotic In addition, Kahn et al. [64] presented a generalized compu-
arms for repetitive manufacturing tasks in a variety of ways. tational graph-based deep reinforcement learning framework
For instance, Gu et al. [46] utilized deep reinforcement learn- that combines value-based and model-based mechanisms, and
ing and off-policy updates to train robotic arms on manual applied this model to autonomous navigation using monocular
tasks without human intervention or demonstration. They also images. The model was tested both in simulated and in real
utilized multiple robotic arms as simultaneous workers to environments, and experiments were conducted to evaluate
increase learning efficiency. In contrast, Senft et al. [115] im- appropriate model horizons and bootstrapping efficacy, sample
plemented an interactive machine learning architecture called efficiency of classification and regression for value versus col-
SPARC (Supervised Progressively Autonomous Robot Com- lision probability predictions, and performance against various
petencies), in which a human user interactively guides the double Q-learning approaches. Furthermore, in a rather unique
robot as it learns. This is in the form of reinforcement learning, application, Rajesh and Mantur [108] applied a deep CNN
in which the human teacher has full control over the actions to eye movement and blink tracking for the control of an
of the learning machine, and positive feedback is supplied to electric wheelchair. Trained on the Eye-Chimera dataset and
the robot for every action that it completes. implemented via head-mounted camera, the achieved accuracy
Unlike direct user manipulation training, or human-free is upwards of 99 %.
training, Yang et al. [146] developed a deep learning pipeline
for generalized non-backdrivable humanoid robots through
teleoperation training. The model uses a deep convolutional E. Medical Diagnostics
autoencoder for image feature extraction and encoding and Highly influenced by advances in image analysis, medical
applies a time-delay neural network for temporal sequence diagnostics have benefited significantly from the rapid im-
evaluation with image and motor angle data to generate a con- provements in deep learning. Considerable work has been done
tinuous operational task. Furthermore, Polydoros et al. [106] toward improving the detection of diseases, tumors, and other
developed a deep learning framework for real-time learning abnormalities from MRI images, CT scans, etc. In addition,
of robotic controls through modeling the inverse dynamics IoT devices for medical applications can provide autonomous
of robotic manipulator joint torques from sensory data. Their monitoring of patients and extract useful data on medical
model employed a self-organized layer to decorrelate inputs populations.
and a recursive reservoir to provide fading memory, which In utilizing data from widely deployed smart IoT devices,
require no hyperparameter optimization or kernel selection. advances have been made in increasing the accuracy of remote
They additionally recorded and tested new datasets for inverse sensor metrics. For example, Jindal [63] utilized deep learning
dynamics model learning, and demonstrated the adaptability of to increase the accuracy of heart rate estimation via photo-
their model to changes in the inverse dynamics model. plethysmography (PPG) by smartphones and wearables during
A subfield of robotics, robotic vision affords autonomous exercise. The authors fuse PPG and accelerometer data, and
systems with situation awareness via object detection. As utilize deep belief networks composed of Restricted Boltzman
applied to robotic manipulators, Mahler et al. [88] developed Machines (RBMs) implemented in the cloud to classify the
a Grasp Quality Convolutional Neural Network (GQ-CNN) PPG signals into subgroups. The PPG signal then undergoes
model, called Dex-Net 2.0, to predict the probability of particle filtering to predict the heart rate over time, achieving
success of grasps from depth images. The authors developed a an average error of 4.88 %. Similarly, Ravì et al. [110] applied
synthetic dataset of 6.7 million point clouds of 1,500 3D object deep learning to human activity recognition using inertial
models and grasp quality metrics. The model achieves a high sensor data from wearable devices. In their learning model,
success rate on both known and unknown objects, and is three the authors combine deep convolutional learning in parallel
times faster than the competing method. In addition, targeting with shallow feature extraction, converging in a fully con-
more generalized detection for robotic vision, Poirson et al. nected network for more accurate classification. Their model
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
10
is implemented in Torch for Android and as an embedded In their model, features are learned directly from peptide
algorithm for the Intel Edison Development Platform. Like- sequences in parallel CNN and LSTM networks, and further
wise, Schirrmeister et al. [130] investigated deep learning for reduced via an ensemble of support vector regression, random
the classification of electroencephalography (EEG) pathologies forest, and gradient boosting. In addition, Qu et al. [107]
by signal analysis. Specifically, CNNs were developed and presented a deep learning approach to predict DNA-binding
automatically optimized using Sequential Model-based Algo- proteins solely from their primary sequences using CNNs
rithm Configuration (SMAC), resulting in higher accuracy and to detect function domains and LSTM to discover long-term
specificity. dependencies. Their model demonstrated superior performance
In addition to diagnostic for continuous sensed data, the on both equalized and asymmetric datasets.
application of deep learning to diagnostic scans and exam In use for image processing, deep learning has been applied
artifacts has produced significant results. For instance, Wang et to enhance various analytical tools. For instance, Kraus et
al. [136] implemented deep learning for automated detection al. [72] applied CNNs for the localization of subcellular
of metastatic breast cancer from lymph node biopsy images as components of yeast cells in high-content microscopy, and
part of the Camelyon Grand Challenge 2016. Their framework implemented activation maximization to visualize the learned
assessed patch level predictions via CNN, and aggregated feature morphology by applying and incrementally updating
patches to produce tumor probability heatmaps for localization randomized green pixel channels to maximize the feature
prediction. Additionally, Gulshan et al. [133] applied deep activation. They also tested their model on unseen yeast cell
learning for carrying out the detection of diabetic retinopathy morphologies, and implemented a transfer learning process
and macular edema from retinal fundus images automatically. to incorporate additional features and apply the model to
Their implemented framework, consisting of an ensemble of distinctly different microscopy techniques, which performed
10 CNNs with pre-initialized weights first trained on the significantly better than from-scratch training. Also, Eulenberg
ImageNet dataset. Also, Wang et al. [138] proposed a multi- et al. [37] applied a deep CNN with nonlinear dimensionality
scale rotation-invariant CNN architecture for classifying lung reduction for reconstructing continuous biological processes
tissues from high-resolution computed tomography (HRCT) and t-distributed stochastic neighbor embedding (tSNE) vi-
scans. The authors applied Gabor filtering and local binary sualization of flow cytometry images. Their method outper-
pattern (LBP) feature extraction prior to CNN learning of formed a comparable boosting-based approach, and the authors
interstitial lung disease (ILD) classes, and further implemented further applied their model to the progression of diabetic
a mechanism to handle unbalanced data effectively. retinopathy from fundus images
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
11
well as for image analysis of dense telescope data. For event analysis classification. They also provide and example,
instance, Gabbard et al. [40] investigated the use of deep namely smart indexing, to approximate a stock index through
convolutional learning to detect gravitational waves produced a subset of stocks.
by binary black hole (BBH) mergers. Simulated data was Additionally, the wide variety of deep network architectures
generated of Gaussian noise negative cases, and Gaussian and data sources afford strikingly varied implementations. For
noise plus BBH wave signal positive cases, for training and instance, Fischer and Krauss [39] deployed long short-term
testing, and tests demonstrated that the CNN framework memory (LSTM) networks, which are used to carry out the
is closely aligned with the more computationally expensive prediction of out-of-sample directional movements for S&P
matched-filtering technique. In addition, Ma et al. [87] applied 500 stocks from 1992 to 2015. Yielding daily returns of
deep multimodal learning for solar radio burst classification, 0.46 percent, their LSTM networks outperformed memory-
specifically treating different frequency channel spectrums as free classification (random forest, DNN, etc.). Nonetheless,
differing modes. The model is composed of autoencoders analysis of the results also reveals returns fluctuating around
(AEs) for each channel mode, structured regularization for zero after about 2009, due to low exposure of the LSTM
connecting the AEs to the fully connected network, concluding to systemic risks. Also, Hu et al. [55] deployed a Hybrid
with softmax for classification into coarse burst, non-burst, and Attention Network (HAN) for the prediction of stock trends
calibration classes. based on news reports. Their framework embedded attention
Given the vast body of work validating human-caused values into news vectors, input multiple news vectors for
climate change, it should be no surprise that the climate, temporal analysis in RNN, and further applied attention values
geographic, geological, and meteorological fields have also to and RNNs for sequential modeling, and further applied
employed deep learning. For example, Shao et al. [117] temporal attention encoding and trend prediction. In addition,
utilized stacked sparse autoencoders (SSAE) for wall-to-wall a self-paced learning training mechanism increased accuracy
forest above-ground biomass (AGB) prediction. Their model over the basic HAN, and both mechanisms outperformed
combined discrete and simulated LiDAR-derived data as an competing methods in trading simulations.
AGB reference map, replacing traditional forest inventory, Furthermore, Ding et al. [31] investigated event-driven stock
with optical and synthetic aperture radar (SAR) data from market prediction through the implementation of a neural
Landsat 8 OLI and Sentinel-1A satellite images. In addition, tensor network to learn event embeddings, and deep CNNs
Ducournau and Fablet [35] applied a super-resolution CNN for short-, medium- and long-term event analysis. Their model
(SRCNN) to reconstruct high resolution images from low significantly improved accuracy and profit in both individual
resolution ocean remote sensing data for satellite-derived sea stock prediction and market simulation experiments when
surface temperature (SST) mapping. Evaluation was conducted compared to baseline neural network methods, especially for
to compare the mean peak signal-to-noise ratio (PSNR) gain of low fortune ranking companies for which news is less avail-
different CNN models, and further evaluation against compa- able. Also, Zhao et al. [166] introduced a deep learning en-
rable reconstruction methods (e.g., bicubic interpolation, EOF- semble approach for crude oil price forecasting using stacked
based) showed improved performance. deep autoencoders trained on bootstrap aggregation or bag-
ging. Training and testing were conducted on the West Texas
Intermediate (WTI) crude oil spot price series, and experi-
H. Finance, Economics, Market Analysis and Others ments demonstrated the improvements of bagging/ensemble
As mechanisms for prediction and analysis, deep learning architectures for improving neural network and SAE accuracy
tools have the capacity to learn from stochastic data and and error, with the designed SDAE-B performing the best.
recognize trends, such that machine learning-based systems Other aspects of market forecasting, include prediction of
have been widely developed for market prediction. In addition, specific market segments and cycles, such as in the work by
verification and validation of monetary transactions greatly Zhao et al. [167], who utilized DBNs to predict customer
benefits from the potential data generated by users, and can mobile device or terminal replacement for use in marketing
be used to detect anomalous behavior. strategies and targeted sales. The authors utilized a com-
The ability to accurately predict market fluctuations pro- bination of customer business data and collected customer
vides a material advantage in stock trading and investment. device data, and compared their DBN results with several
As a powerful predictive tool, deep learning for market and more shallow learning techniques, demonstrating improved
economic analysis has been highly investigated. For example, performance. Similar predictive analytics mechanisms have
Korczak and Hernes [70] presented financial time-series fore- been applied to various smart and autonomous systems, such
casting utilizing deep learning architectures. Compared with as energy consumption forecasting, traffic prediction, user
multilayer perceptron (MLP), their CNN implementation in the geolocation, etc. For example, Yu et al. [153] designed several
H2O framework significantly decreases the forecasting error machine learning-based schemes (e.g., neural networks and
rate when trading on the FOREX market, and increases the support vector machines) to carry out the forecasting of
average rate of return per transaction. In addition, Heaton et energy usage in the smart grid and conducted performance
al. [51] considered deep learning for financial prediction and comparison using real-world smart meter dataset. Also, Wang
classification, particularly in the application of deep models et al. [137] proposed a deep learning scheme using an Error-
over shallow models for high-dimensionality data reduction, feedback Recurrent Convolutional Neural Network structure
high-dimensionality feature extraction for risk analysis, and (eRCNN) to carry out the prediction of traffic speed and
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
12
congestion. data, the authors’ model has the highest detection rate and
accuracy, particularly on DoS attacks.
Deep learning is also applicable to the security of real-world
I. Cyber Security interactions, primarily in situation awareness and analysis.
Given the unprecedented utilization of network connected For instance, Wang et al. [139] applied a deep convolutional
devices, and the significant dependence on information com- architecture for person re-identification as they are viewed by
munication technology (e.g., software applications, networked different non-overlapping cameras. The network was first pre-
servers and end-user devices) throughout the world, it is no trained on the ImageNet dataset, and then further tuned by
surprise that nefarious users attempt and occasionally succeed training on the CUHK03 re-identification dataset. With only
in subverting credentials, bypassing security systems, or sim- minor changes to the fully connected layers of the model in
ply attacking hosts with network traffic. Similar to existing retraining on the second dataset, the authors are able to signif-
research applying machine learning to conduct accurate cyber icantly improve the matching rate over the existing schemes.
situation awareness [24], [56], [41], [49], [154], [158], the Also, in the realms of validation and authentication, the
use of deep learning technologies for cyber security analysis security of transactions, identity, and the physical world can
and intrusion detection is highly relevant, as the majority of be significantly enhanced through deep learning. For instance,
attacks use families of intrusive software that can be observed Niimi [100] investigated the use of deep learning for credit
and classified. card approval determination and transaction validation. The
Considering the prevalence of malicious software (mal- implemented framework is written in R and implemented in
ware), propagated by increasingly sophisticated obfuscation Amazon’s EC2 cloud platform. Their experimental evaluation
and techniques, the use of deep learning has been widely demonstrated similar accuracy to various shallow learning
applied for highly accurate malware analysis and detection methods, but with higher precision.
of previously unforeseen threats. For example, Zhu et al.
[168] presented an Android malware detection tool called J. Architectural and Algorithmic Enhancement
DeepFlow. The designed tool utilizes FlowDroid to obtain Necessary for the continued enhancement and progress of
data flows from sensitive sources to sensitive sinks. the SUSI deep learning as a generalized framework for diverse appli-
technique is also leveraged to transform the data flows feature cations, the development of state-of-the-art architectural and
granularity. They then classify the applications via beep belief algorithmic implementations of deep networks is paramount.
network with the transformed dataflows as input. In addition, Particularly, the reduction in training and inference time
Ding et al. [32] extracted opcode sequences from Windows through clever design, as well as the improvement of accuracy
PE files for malware classification via DBN. Converting the through multi-network ensemble approaches and automatic
opcode features to n-gram representations, the features were hyperparameter optimization, are necessary for the realization
downselected by maximum information gain and document of deep learning in mobile, commodity, and IoT hardware.
frequency. The authors demonstrate both the capacity of DBNs A variety of efforts have been applied to analyzing the
to perform classification, as well as to perform autoencoding architectures, activations, optimizers, hyperparameters, etc. of
for unsupervised feature selection to enhance the performance deep learning models for particular tasks, as well as when
of shallow learning models (K-Nearest Neighbors, Decision applied more generally. For example, Keskar et al. [121]
Tree, etc.). investigated the effects of batch size in DNN training via
The detection of ongoing attacks in real time is paramount Stochastic Gradient Descent (SGD), namely that large batches
to enable timely response and mitigation techniques. Work converge to sharp minima and result in poor generalization.
to secure systems to attacks include that of Uwagbole et al. Further investigating possible remedies, attempts using data
[132], who designed a system to detect and prevent SQL in- augmentation and conservative training fail to correct problem,
jection attacks via hybrid static and dynamic analysis utilizing while the most promising solution is dynamic sampling to
deep learning techniques. Their proxy-based system combines gradually increase the batch size. Moreover, Francois Chollet
pattern matching with numerical feature encoding for neural [27] presented an analysis of depthwise separable convolutions
network and logistic regression classification. Similarly, Zolo- and their relationship to convolutional inception architecture.
tukhin et al. [170] utilized stacked autoencoders (SAEs) for Specifically, the designed extreme inception (Xception) archi-
the detection of application layer distributed denial-of-service tecture decouples the mapping of cross-channel correlations
(DDoS) attacks in encrypted traffic. Without decrypting the and spatial correlations in convolutional feature maps, and
traffic packets, the system extracts and clusters features into outperforms the Inception V3 architecture.
normal traffic patterns, conducting traditional anomaly detec- In addition, adaptive techniques have been developed to
tion for trivial DoS attacks. In addition, the SAE to detects down-select or fine-tune deep learning models. These can
attacks designed to mimic typical browser activity based on the be helpful in allowing both experts and non-experts to opti-
reconstruction error of vectorized conversation traffic groups mize parameters and architectures more quickly. For example,
in time intervals. Additionally, Kim et al. [66] developed a Cortes et al. [29] developed AdaNet, a set of algorithms
network Intrusion Detection System (IDS) based on an LSTM to adaptiveley learn ANN network structure and weights
recursive neural network. The model is trained on the KDD utilizing explicit Rademacher complexity measures to define
Cup 1999 dataset, which includes 22 attacks in 4 categories. In data-dependent learning and generalisation bounds. The al-
comparison with other neural networks using the same training gorithms are strongly convex, indicating a global solution,
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
13
and iteratively growing the structure of a neural network, Field-programmable gate array (FPGA) resource use. Using
and balancing complexity with risk minimization. Patel et AlexNet as the case study, they compared an optimized CPU
al. [102] developed a deep learning framework based on the implementation with the developed FPGA-based implementa-
Deep Rendering Mixture Model (DRMM), a theoretical frame- tion, realizing increased throughput and reduced energy usage.
work that explicitly models nuisance variation via rendering Kim et al. [65] explored the design of FPGA systems for
function. The hierarchical generative model can reproduce fully-connected neural network hardware, analyzing synthesis
the CNN architecture through the relaxation of parameter constraints and using the MNIST dataset on Caffe as an
constraints, and can improve upon CNN performance without example. Further, Li et al. [78] explored the application of
hyperparameter tuning. Stochastic Computing (SC) as applied to the implementa-
tion of deep neural networks. In particular, they investigated
VI. E MERGING R ESEARCH T RENDS hardware-oriented optimization of feature extraction blocks in
Deep learning has thoroughly saturated commercial indus- CNNs through SC implementations of convolution, pooling,
tries and suffused business applications. Yet, several primary and activation, and the arrangement of pooling and activation
needs have arisen and persist without any comprehensive solu- with respect to hardware implementation. In addition, they
tion. These emerging research trends, as outlined in Fig. 4 pri- developed equations to optimize Stochastic tanh calculations,
marily encompass: acceleration and optimization, distributed and demonstrate reduction in absolute error by placing the ac-
deep learning in IoT and CPS, network management and tivation ahead of pooling, structurally reverse that of traditional
control, and security in deep learning. These needs can be software-based feature extraction implementations.
seen as highly interdependent, and arise directly from the In addition, Wang et al. [140] proposed Memsqueezer,
emerging IoT/CPS paradigm and trustworthy computing, as an active on-chip memory design for low-overhead deep
well as due to growing needs for edge computing. In the learning acceleration. It utilizes active buffers, network weight
following, we present these needs one at a time, elaborating and intermediate data compression, and in-memory redun-
upon the background and characteristics of each, and provide dancy detection to boost performance and reduce memory
a sampling of future research. size of CNN inference. Simulation results demonstrate the
ability to significantly reduce energy consumption. Tang et
al. [128] explored executing convolutional neural networks
A. Acceleration and Optimization in IoT hardware to overcome the limitations of latency in
Due to the massive adoption of smart mobile and em- offloading computation to the cloud. Utilizing a bare-metal
bedded devices that comprise IoT and support smart-world ARM v7 system on a chip (SoC), the authors compare
applications such as smart grid, smart transportation, smart TensorFlow against the recent ARM Compute Library (ACL).
cities, etc., network congestion and latency are only going to Implementing a SqueezeNet architecture, and ARM NEON
increase without a diverse array of complementary solutions vector computation optimization, the results show that the
[125], [143], [148], [82], [80], [161], [127], [147]. Advances ACL outperforms TensorFlow by nearly 150 ms in execution
in edge computing and in-device computing provide avenues time, though memory usage and power consumption are higher
to reduce network congestion by providing computing near to in the ACL. The authors additionally describe their ongoing
users and reducing communication needed to reach resource- work of developing an integrated deep learning IoT ecosystem
rich services. In addition, advances in network infrastructure consisting of lightweight OS comprised of sensor interfacing, a
and technologies (5G, Software-Defined Networks, etc.) are compiler based on NNVM to optimized deep learning models,
also forthcoming [13], [155], [157]. Regarding the former, and a message passing framework based on Nanomsg. Also,
the acceleration and optimization of deep learning architec- Du et al. [34] proposed a CNN acceleration architecture
tures through thoughtful design of software, hardware, and for IoT devices using streaming data flows to achieve high
algorithms is driven by needs for low energy, low resource, efficiency. In their architecture, convolution can be carried out
cheap, and efficient computation. Notwithstanding neural net- in parallel with maximum pooling, and filter decomposition
work architecture design and algorithm implementation, which allows large kernel size with only a small computation unit. In
are continuously evaluated and improved upon, examples comparison with other acceleration designs, their architecture
of emerging areas of deep learning acceleration include the achieves higher peak throughput and greater energy efficiency
design of programmable computational arrays, bare-hardware on a smaller core area.
implementation, and stochastic computation mechanisms. Despite these advances, challenges remain in the develop-
For instance, Lacey et al. [74] investigated the application ment of accelerators for deep learning, and further investiga-
of Field Programmable Gate Arrays (FPGAs) as alternative tion is necessary in several particular areas. First, many of
hardware to GPUs or CPUs for implementing deep learning these studies on acceleration remain largely developmental,
networks due to their better performance per watt and flex- and more work is necessary to refine them for hardware
ibility in configuration. The authors consider the strengths implementation. While simulation studies can demonstrate the
of FPGAs, including the customizable hardware circuits for potential for various improvements, these concepts must be
multithreading and parallelism, and architectures that can transferred to hardware to realize the potential benefits in
be tailored for the intended application. Morcel et al. [96] actuality. Second, the combination and comparison of the
utilized signal flow graph reduction, fixed-point arithmetic, and many diverse acceleration mechanisms should be compared
modularity to design a deep learning accelerator to minimize whenever possible, and combined where applicable. To this
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
14
end, various methods of acceleration should be quantitatively example, Yuan and Jia [160] proposed and demonstrated the
compared with each other, and not simply with CPU or GPU use of sparse autoencoder networks on distributed servers to
counterparts. There remains great promise in these areas, as perform anomaly detection on smart electricity meter data.
deep learning is being driven by industry groups to pro- The distributed slave nodes perform the anomaly detection
vide more efficiency and reduce costs, and new experimental individually, alerting the master node and reducing computing
hardware devices are coming to market. In addition, how to overhead for the centralized master node, and outperform
automatically select features and parameters in deep learning complementary learning algorithms. Park et al. [101] designed
and the development of science of deep learning remain a Situation Reasoning framework that extracts multiple low-
challenging issues. level contexts in DNN modules, and combines them in a
higher level Situation Reasoning module based on the Feature
Comparison Model of cognitive psychology. Utilizing the
B. Distributed Deep Learning in IoT and CPS
spatio-temporal contexts of IoT data, the author’s framework
We now discuss distributed deep learning applied to IoT demonstrates good performance in comparison with other
and CPS. Situation Reasoning methods.
1) Internet of Things: In considering the applications of While the examples provided show some promise, signifi-
deep learning for IoT, significant work has been carried cant work must still be done. Given that IoT systems facilitate
out toward broadly applying those typical categories men- near-infinite potential for integrating deep learning networks
tioned above (image/video/audio processing, text analysis, for innumerable applications, the development of appropriate
etc.) across centralized and distributed cloud computing frame- paradigms to analyze said data in a timely manner is impera-
works, utilizing IoT devices and some novel mechanisms [85], tive. While not all applications will require real-time analysis
[67], [95], [142], [134], [160], [101]. and inference, those that converge with critical infrastructure
For instance, Kim [67] proposed a deep learning system and safety applications surely will. The requirements of such
for use in identifying and tracking motion of individuals via real-time functionality can be considered from two domains,
Channel State Information (CSI) of IoT devices and widely distributed deep learning at the network edge, and in-device
deployed MIMO enabled access points. Mohammadi et al. deep learning.
[95] developed a semi-supervised deep reinforcement learning In the first case, distributed deep learning is a solution
system to support smart city applications based on both to the inability to resolve deep learning in-device because
structured and unstructured data. Utilizing Variational Autoen- of complexity and processing power. This almost certainly
coders (VAEs), the authors studied indoor user localization necessitates the intervention of edge computing to offset net-
using Bluetooth Low Energy (BLE), collecting the received work latency that will critically reduce the effectiveness of the
signal strength indicator (RSSI) from a grid of iBeacon target learning system. To this end, though the edge computing
devices. In addition, Wu et al. [142] developed an efficient paradigm has recently seen significant study, the intersection
road scene segmentation deep learning model for embedded of edge computing infrastructures with deep learning remains
devices, termed ApesNet. Via time profiling and analysis, the to be thoroughly investigated. Specifically, parallel simultane-
authors developed an asymmetric encoder-decoder network, ous learning network implementations for edge architectures
and limited the size of large feature maps in convolutional should be developed and optimized for self-organization and
layers. In comparison with a complementary encoder-decoder runtime.
network, ApesNet improves accuracy and reduces model size In the second case, we consider that deep learning in-
and runtime when tested on CamVid and Cityscapes datasets. ference has only recently been realized in IoT hardware,
Valipour et al. [134] developed a deep convolutional network with scalability at cost still on the horizon. In general, then,
for parking stall vacancy detection. Designed with existing the implementation of deep networks in IoT devices is a
parking lot cameras and infrastructure in mind, the system preeminent concern that requires continued investigation. This
implemented and provides web and mobile interfaces for users. development is significantly affected by advances in hardware
Additionally, the inference time of their model running on and computational capabilities. In addition, in-device deep
embedded Raspberry Pi architecture was only 0.22 seconds. learning provides the potential for reductions in network
Despite these and many other works, there remain several overhead in terms of data transfer and signaling, the impact
critical issues which have yet to be resolved. Particularly, of which has yet to be considered.
while a number of early efforts have shown the potential 2) Cyber-Physical Systems: In addition, Cyber-Physical
to run inference operations in IoT devices, the training of Systems (CPS), more than just network connected devices like
deep models in IoT hardware remains a practical impossibility. IoT, include the vertical layering of IoT devices, networking,
Nonetheless, local training of distributed and partial neural service, applications, and command and control (C&C) plat-
network input in IoT devices provides an opportunity to forms. Examples of CPS systems include smart transportation
reduce network overhead and latency in training by offloading system with self-driving vehicles, smart cities, smart electrical
pre-trained feature output for additional training at higher grids, etc. [94], [58], [125], [143], [148], [82], [80], [161],
layers. This would be particularly practical for image and [127], [147], [145], [165]. More specifically, as applied to
object recognition processing offloaded to edge computing power generation, monitoring and control, Mocanu et al. [94]
nodes, where the dimensionality of transmitted data can be utilized Factored Four-Way Conditional Restricted Boltzmann
reduced. Some relevant examples include [160] and [101]. For Machines (FFW-CRBMs) and Disjunctive Factored Four-Way
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
15
Conditional Restricted Boltzmann Machines (DFFW-CRBMs) cle technologies, autonomous transportation must enable inter-
to carry out energy disaggregation, and flexibility classification vehicle communication, but must also communicate both the
and prediction, on smart appliance data. Likewise, Liangzhi et smart transportation infrastructure in transit, and smart city
al. [58] investigated electrical load forecasting in the smart and smart grid infrastructures in locating parking and acting as
grid via deep learning. Utilizing seven years of smart meter secondary electricity storage to enhance grid function. We can
and IoT device data, the designed system first forecasts daily envision many such communication exchanges across different
total consumption via DNN with complex input features, and domains, including in user identification and tracking, au-
then predicts intra-day load variation by applying the daily tonomous services such as delivery or manufacturing, and even
consumption prediction, along with a more limited set of in localized network and electrical load prediction via massive
features, to a second DNN. In addition, Zhao et al. leverage fine-grained IoT device transit data. Further investigation is
convolutional neural networks to develop a new deep heartbeat necessary to understand hierarchically combined deep learning
classification system, which can accurately analyzing the raw models and the policies to optimize and secure their use in
electrocardiogram (ECG) signal in healthcare smart-world critical infrastructure systems, as well as best practices for
system. managing and updating individual aspects of such a system
As applied to CPS, distributed deep learning takes on a using deep learning.
new dimension, as we consider the distinct layered structure,
and the heterogeneity of the data within. In this case, we
shall integrate the aforementioned distributed IoT context with C. Network Management and Control
multi-modal data fusion. Relevant examples include [141] and In future 5G broadband systems [13], [157] and other
[76], the former of which also seeks to adapt new features to a networking systems, as well as evolving traditional network in-
pre-trained model to improve the overall system. Particularly, frastructures, complex heterogeneous protocols, interfaces, and
Wang et al. [141] explored image classification in CPS, hardware will be massively implemented to realize through-
and proposed a fast feature fusion algorithm. The authors put and bandwidth gains, supporting a massive number of
extracted features from various deep and shallow learning users with diverse quality of service requirements. Solutions
mechanisms in parallel, utilized a Genetic Algorithm (GA) to to the growing complexity and need for agile service in-
convert those features into weights for a fusion feature vector, clude software-defined networking (SDN), network function
and introduced partial selection to choose a classifier. Using visualization, and edge computing paradigms. While these
this mechanism, pre-trained neural network models could be technologies are indeed poised to provide solutions to address
combined with original models, which add new features and future network challenges, the architecture, management, and
classes to outperform any single constituent model. security of these future networks will be highly dependent on
Further, Li et al. [76] proposed a deep convolutional compu- the effective optimization of services and hardware. In this
tation model, which is used for conducting hierarchical feature regard, deep learning offers a viable technique that can effec-
learning on IoT big data. Utilizing tensor representation, tively learn the characteristics of the network and the behavior
which preserves raw data structures and thus mutuality and of users, leading to better network management and control
complementarity, they can better represent hierarchical multi- decisions and outcomes. Furthermore, the massive increase in
modal data. Designing tensor-based convolution, pooling, and users, including humans and autonomous machine-to-machine
fully-connected layers, as well as high-order back propagation, equipment, will necessitate analysis, density estimation, and
the authors demonstrate the effectiveness of their approach complexity reduction to handle such massive data. With the
against multi-modal deep learning, as well as deep computa- continuous developments of deep learning, these challenges
tion models on three datasets (CUAVE, SNAE2, and STL-10). can be resolved, yet thorough research is necessary to achieve
Another relevant work is to adapt the ST-ResNet structure to these goals.
predict the hourly distribution of crime in parceled areas in In this regard, very little research has been conducted. For
the city of Los Angeles [135]. In this work, the necessary example, Zhu et al. [169] implemented stacked auto-encoders
spatial and temporal resolutions for optimal prediction were (SAEs) to realize Q-learning for transmission scheduling in
investigated, and a ternarization of the model was additionally cognitive IoT relays. Modeling the system as a Markov
developed to reduce model size and execution time, with a decision process, and seeking to maximize system utility, a
minor increase in error. simulation evaluation shows improved performance over W -
Finally, in considering CPS, autonomous command and learning, but not strategy iteration. Nonetheless, strategy itera-
control can be distributed to the lowest levels necessary via tion considers all states of the system at a given time, instead of
deep learning for in-time analysis, and can be configured the current state, and is not scalable. Likewise, Lopez-Martin
uniquely for each layer. In this way, resource use can be et al. [83] demonstrated flow statistics-based network traffic
reduced throughout the system. Indeed, if IoT enables the classification via deep neural networks. Using only packet
convergence of many technologies (networking, distributed headers, the authors investigated the use of RNN, CNN, and
computing, deep learning, big data, etc.), then CPS compounds combined CNN/RNN models, convolving over the time series
this through the imperative of infrastructure security and of the incoming data. Their designed models demonstrate good
communication. This, then, presents a challenging issue of performance, especially on labels with a frequency higher
how to integrate the various CPS layers that include deep than 1 %. It is worth noting that not all CNN/RNN models
learning mechanisms. For instance, in considering smart vehi- outperform the basic RNN. Aminanto et al. [14] developed
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
16
a three-layer Wi-Fi impersonation attack detection system. second scheme, using double decryption, training is performed
In their tiered model, Stacked Autoencoders first performed on ciphertexts of users with different public keys. While
feature extraction on the original dataset. Feature selection these are novel methods that leverage the state-of-the-art in
was then performed via ANN, SVM, or Decision Tree, on security research, encrypting not only data, but computation
the original data plus the newly extracted features, and an as well, they can still be considered as expansions of traditional
ANN was used for final classification. Their system achieves security techniques.
a 99.918 % detection rate and 0.012 % false positive rate, and In exception to traditional mechanisms, attacks that seek
significantly outperforms the comparable systems. to undermine the output of deep learning systems have re-
Despite the works outlined here, the majority of applica- cently received deeper consideration. Indeed, the widespread
tions for deep learning in network management and control adoption of machine learning is cause for concern, as attacks
remain unexplored. Indeed, deep learning has the potential that are solely intended to thwart the normal operation of the
to fundamentally transform network design, management, and learning network can lead to catastrophic harm. For instance,
service through integration with advanced architecture such a recent work by Yuan et al. [159] specifically investigated
as cognitive radio and network function virtualization, as the space of these attacks that target only the inference
well as in optimization and analysis to enable adaptability mechanism through adversarial input. The authors classified
and autonomy. For instance, deep learning models can be no less than sixteen different attack methods, which have
applied to learn the characteristics of the network and the been shown to be effective against various targets, including
behaviors of connected users. In this way, optimal decisions subverting segmentation (removal of objects from detection)
(e.g., routing optimization and node placement) can be made. and facial recognition. This is similar to an investigation by
In addition, network traffic analysis could be implemented Huang et al. [57] from 2011, which investigated security
in routing devices and utilized for traffic offloading, or im- in machine learning and provided a taxonomy for causative
plemented to provide hierarchical prioritized relay. While we and exploratory attacks, and formulated game-theory-based
have yet to see significant research in this regard, this area formalisms to understand each attack. Nonetheless, the latter
is garnering increased attention. Additionally, this remains a focused on the shallow learning methods of the time.
challenging area, due to the limitations of hardware for deep In the interim, Goodfellow et al. [45] proposed the gener-
learning and the latency that deep learning may introduce into ative adversarial network, pitting generator and discriminator
networking systems, as the smaller time scale necessary for networks against one another in a minimax game. Here, a
inference (compared to training) cannot be considered trivial. discriminator is used to discern the data distribution from
Furthermore, traditional network transmission considerations the generated model distribution via a learning process, while
are aimed at minimizing data size and transmission frequency a generator learns to better undermine the discriminator,
to reduce network load. Nonetheless, in the context of deep improving both in the process. A relevant insight observed
learning, additional data generally increases the accuracy of from the development of GANs is that they do not necessarily
the learning model. Thus, a balance must be struck that resolve adversarial examples: those which fail in ways that are
satisfies the needs of any implemented deep learning system imperceptible to humans, or succeed while not retaining any
with those of congestion reduction, quality of service, energy of the human-perceptible attributes. Though GANs have been
efficiency, and latency. Furthermore, the automation of net- used to great effect in increasing accuracy in generative and
work management via intelligent networked systems must be discriminative networks, they fail to address problems posed
scalable, secure, and fault-tolerant. by these corner cases.
While Yuan et al. [159] did point out various defensive
mechanisms against adversarial input, such as network distil-
D. Secure Deep Learning lation, adversarial retraining, adversarial detection, and input
Given the increasing number of devices, operating systems, reconstruction, significantly more work is needed. In addition,
and communication protocols that abound in IoT, security is Pei et al. [104] developed DeepXplore, the first whitebox
an ever-ballooning problem [151], [82]. Securing the data, testing framework for evaluating deep learning systems. Their
operation, and mechanisms of deep learning are all the more work developed the concept of neuron coverage, which is
relevant in considering edge computing, which can be a viable characterized as the amount of deep network logic or neurons
computing infrastructure to provision deep learning schemes activated by a given input. They also leveraged multiple
[155], supporting a variety of smart-world systems (smart complementary deep networks as cross-referencing oracles,
cities, smart manufacturing, smart grid, smart transportation, and formalized the maximization of neuron coverage and
and many others). As computing nodes will be more dispersed differential behaviors as a joint optimization problem with
and local to the user, they will also have fewer resources and be gradient ascent. Beyond demonstrating the effectiveness of
more available to would-be adversaries. The investigation and their framework in terms of runtime and neuron coverage, they
application of increasingly sophisticated security mechanisms, also leveraged their framework to augment network training to
such as homomorphic encryption, are thus significant. For demonstrably improve accuracy. In addition, Booz et al. [22]
example, Li et al. [77] proposed multiple schemes for machine investigated how to fine-tune parameters of deep learning to
learning on multi-key homomorphic encrypted data in the improve the accuracy of detecting Android malware.
cloud. In the first scheme, deep learning is conducted on Though considerations for the limitations of deep learning
multiple users’ data who share the same public key. In the go back a few years, and adversarial learning has helped
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
17
increase the accuracy achieved in training models, as well as Additionally, we have thoroughly investigated the state-of-
the generation of unique data, significant work is still needed the-art in deep learning research. These categories include
to secure deep learning systems. In particular, further study is multimedia (audio, visual, and text) processing, autonomous
necessary to fully develop standardized testing practices for systems, medical diagnostics, biological and physical sciences,
deep learning to reveal hidden or unforeseen vulnerabilities. financial applications, security analysis, and algorithmic en-
This should include two primary directions: securing deep hancement. Finally, having surveyed the landscape of com-
learning models, and securing deep learning systems. While pleted works, we have highlighted areas in which deep learn-
the latter can be considered within the traditional realm of ing research has yet to make significant strides, or where sig-
security analysis and prevention, nonetheless, deep learning nificant advances are immediately forthcoming. These include
should be further applied to enhance security detection systems acceleration and optimization of deep learning via fundamental
at all levels. Examples include deep learning for static and hardware and encoding methods, distributed deep learning for
dynamic analysis in intrusion detection to secure deep learning IoT and CPS, network management and control applications
systems and underlying architectures. In addition, the verifi- of deep learning, and, perhaps most importantly, securing deep
cation of appropriateness and resiliency of the trained deep learning models and systems. Given the widespread adoption
learning models, as well as their further improvement, should of deep learning, especially in multimedia fields, and the
be the primary goals of future adversarial investigations. inevitability of increasingly sophisticated cyber threats, the
Of particular interest are attacks against deep models, as development of mechanisms to harden systems against adver-
inappropriately or insufficiently tested models may be eas- sarial data input is imperative. We hope this work provides
ily subverted by and attacker, causing damage to digital or a valuable reference for researchers and computer science
physical property, and potentially endangering human lives. practitioners alike in considering the techniques, tools, and
Deep learning, like many technologies, is a double-edged applications of deep learning, and provokes interest into areas
sword that can be used by both adversaries and defenders that desperately need further consideration.
in the cybersecurity field. In fact, advancements in deep
learning are likely to have a profound impact on future cyber ACKNOWLEDGEMENT
attacks, as attackers leverage the technology to enact more This work was supported in part by the US National Sci-
encompassing, effective, autonomous, and potentially novel ence Foundation (NSF) under grants: CNS 1350145 (Faculty
attacks. Therefore, systematically investigating threats in the CAREER Award), and the University System of Maryland
full lifecycle (training and inference) of deep learning in (USM) Endowed Wilson H. Elkins Professorship Award Fund.
its use for cybersecurity, in addition to adversarial input, Any opinions, findings and conclusions or recommendations
becomes critical. Further, understanding the capabilities of expressed in this material are those of the authors and do not
deep learning in detecting cyber threats and investigating how necessarily reflect the views of the agencies.
to optimize deep learning networks to achieve the highest
detection accuracy, dealing with both known and unknown
R EFERENCES
threats remain challenging issues.
[1] Apache mxnet: A flexible and efficient library for deep learning. 2017.
VII. F INAL R EMARKS https://mxnet.apache.org/.
[2] Caffe. 2017. http://caffe.berkeleyvision.org/.
Deep learning is a technology that continues to mature, [3] Caffe2: A new lightweight, modular, and scalable deep learning frame-
and has clearly been applied to a multitude of applications work. 2017. https://caffe2.ai/.
[4] Cloud tpu alpha: Train and run machine learning models faster than
and domains to great effect. While the full-scale adoption of ever before. 2017. https://cloud.google.com/tpu/.
deep learning technologies in industry is ongoing, measured [5] Deep learning: For data scientists who need to deliver. 2017. https:
steps should be taken to ensure appropriate application of //skymind.ai/.
[6] Deep learning for java: Open-source, distributed, deep learning library
deep learning, as the subversion of deep learning models may for the jvm. 2017. https://deeplearning4j.org/.
result in significant loss of monetary value, trust, or even [7] Keras: The python deep learning library. 2017. https://keras.io/.
life in extreme cases. In this survey, we have provided an [8] The microsoft cognitive toolkit. 2017. https://docs.microsoft.com/
en-us/cognitive-toolkit/.
overview of deep learning operation, distinguishing deep learn- [9] An open-source software library for machine intelligence. 2017. https:
ing from traditional shallow learning methods, and outlining //www.tensorflow.org/.
prominent structural implementations. We have reviewed deep [10] Theano. 2017. http://deeplearning.net/software/theano/.
[11] Torch: A scientific computing framework for luajit. 2017. http://torch.
learning architectures in detail based on learning mechanisms ch/.
(supervised, unsupervised, and reinforcement) and the target [12] A. Adler, D. Boublil, M. Elad, and M. Zibulevsky. A Deep Learning
output structures, and provided typical examples in each case. Approach to Block-based Compressed Sensing of Images. ArXiv e-
prints, June 2016.
We have also introduced many common and widely adopted [13] M. Agiwal, A. Roy, and N. Saxena. Next generation 5g wireless
deep learning frameworks, and considered them from the networks: A comprehensive survey. IEEE Communications Surveys
perspectives of design, extensibility and comparative efficacy. Tutorials, 18(3):1617–1655, thirdquarter 2016.
[14] M. E. Aminanto, R. Choi, H. C. Tanuwidjaja, P. D. Yoo, and K. Kim.
It is worth mentioning that each of the frameworks imple- Deep abstraction and weighted feature selection for Wi-Fi imperson-
ments the basic elements of deep learning in different ways ation detection. IEEE Transactions on Information Forensics and
using different libraries, are optimized for different hardware Security, 13(3):621–636, March 2018.
[15] C. Angermueller, H. J. Lee, W. Reik, and O. Stegle. Deepcpg: accurate
systems, and provide varying degrees of control over model prediction of single-cell DNA methylation states using deep learning.
design. Genome Biology, 18(1):67, Apr 2017.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
18
[16] C. Angermueller, T. Pärnamaa, L. Parts, and O. Stegle. Deep learning [39] T. Fischer and C. Krauss. Deep learning with long short-term
for computational biology. Molecular Systems Biology, 12(7), 2016. memory networks for financial market predictions. European Journal
[17] O. Araque, I. Corcuera-Platas, J. F. Sánchez-Rada, and C. A. Iglesias. of Operational Research, pages –, 2017.
Enhancing deep learning sentiment analysis with ensemble techniques [40] H. Gabbard, M. Williams, F. Hayes, and C. Messenger. Matching
in social applications. Expert Systems with Applications, 77(Supple- matched filtering with deep networks in gravitational-wave astronomy.
ment C):236 – 246, 2017. ArXiv e-prints, Dec. 2017.
[18] D. M. S. Arsa, G. Jati, A. J. Mantau, and I. Wasito. Dimensionality [41] L. Ge, H. Zhang, G. Xu, W. Yu, C. Chen, and E. P. Blasch. Towards
reduction using deep belief network in big data case study: Hyperspec- mapreduce based machine learning techniques for processing massive
tral image classification. In 2016 International Workshop on Big Data network threat monitoring data. Networking for Big Data, published by
and Information Security (IWBIS), pages 71–76, Oct 2016. CRC Press & Francis Group, USA, Yu, S. (Ed.), Lin, X. (Ed.), Misic,
[19] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath. J. (Ed.), Shen, X., 2015.
Deep reinforcement learning: A brief survey. IEEE Signal Processing [42] L. L. Ge, Y. H. Wu, B. Hua, Z. M. Chen, and L. Chen. Image
Magazine, 34(6):26–38, Nov 2017. registration based on SOFM neural network clustering. In 2017 36th
[20] E. Barberio, B. Le, E. Richter-Was, Z. Was, D. Zanzi, and J. Zaremba. Chinese Control Conference (CCC), pages 6016–6020, July 2017.
Deep learning approach to the Higgs boson CP measurement in H → [43] J. A. Gonzalez, L. A. Cheah, A. M. Gomez, P. D. Green, J. M.
τ τ decay and associated systematics. Phys. Rev., D96(7):073002, 2017. Gilbert, S. R. Ell, R. K. Moore, and E. Holdsworth. Direct speech
[21] A. Bonarini, A. Lazaric, F. Montrone, and M. Restelli. Reinforcement reconstruction from articulatory sensor data by machine learning.
distribution in fuzzy Q-learning. Fuzzy Sets and Systems, 160(10):1420 IEEE/ACM Transactions on Audio, Speech, and Language Processing,
– 1443, 2009. Special Issue: Fuzzy Sets in Interdisciplinary Perception 25(12):2362–2374, Dec 2017.
and Intelligence. [44] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT
[22] J. Booz, J. McGiff, W. G. Hatcher, W. Yu, and C. Lu. Tuning deep Press, 2016. http://www.deeplearningbook.org.
learning performance for android malware detection. In Technical [45] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-
Report, Towson University, 2018, Feb. 2018. Farley, S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial
[23] T. S. Borkar and L. J. Karam. Deepcorrect: Correcting DNN models Networks. ArXiv e-prints, June 2014.
against image distortions. CoRR, abs/1705.02406, 2017. [46] S. Gu, E. Holly, T. Lillicrap, and S. Levine. Deep reinforcement
[24] A. L. Buczak and E. Guven. A survey of data mining and machine learning for robotic manipulation with asynchronous off-policy updates.
learning methods for cyber security intrusion detection. IEEE Commu- In 2017 IEEE International Conference on Robotics and Automation
nications Surveys Tutorials, 18(2):1153–1176, Secondquarter 2016. (ICRA), pages 3389–3396, May 2017.
[25] X. W. Chen and X. Lin. Big data deep learning: Challenges and [47] H. Guo, J. Wang, Y. Gao, J. Li, and H. Lu. Multi-view 3D object
perspectives. IEEE Access, 2:514–525, 2014. retrieval with deep embedding network. IEEE Transactions on Image
[26] Z. Chen, H. Zhang, W. G. Hatcher, J. Nguyen, and W. Yu. A Processing, 25(12):5526–5537, Dec 2016.
streaming-based network monitoring and threat detection system. In [48] W. G. Hatcher, J. Booz, J. McGiff, C. Lu, and W. Yu. Edge computing
2016 IEEE 14th International Conference on Software Engineering based machine learning mobile malware detection. In National Cyber
Research, Management and Applications (SERA), pages 31–37, June Summit, 2017.
2016. [49] W. G. Hatcher, D. Maloney, and W. Yu. Machine learning-based mobile
[27] F. Chollet. Xception: Deep Learning with Depthwise Separable threat monitoring and detection. In 2016 IEEE 14th International
Convolutions. ArXiv e-prints, Oct. 2016. Conference on Software Engineering Research, Management and Ap-
[28] O. Cominetti, A. Matzavinos, S. Samarasinghe, D. Kulasiri, S. Liu, plications (SERA), pages 67–73, June 2016.
and P. K. Maini. DifFUZZY: a fuzzy clustering algorithm for complex
[50] K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for
datasets. 1, 01 2010.
Image Recognition. ArXiv e-prints, Dec. 2015.
[29] C. Cortes, X. Gonzalvo, V. Kuznetsov, M. Mohri, and S. Yang. AdaNet:
[51] J. B. Heaton, N. G. Polson, and J. H. Witte. Deep Learning in Finance.
Adaptive Structural Learning of Artificial Neural Networks. ArXiv e-
ArXiv e-prints, Feb. 2016.
prints, July 2016.
[30] A. Dairi, F. Harrou, M. Senouci, and Y. Sun. Unsupervised obstacle de- [52] S. Hickson, A. Angelova, I. Essa, and R. Sukthankar. Object category
tection in driving environments using deep-learning-based stereovision. learning and retrieval with weak supervision. ArXiv e-prints, Jan. 2018.
Robotics and Autonomous Systems, pages –, 2017. [53] I. Higgins, L. Matthey, X. Glorot, A. Pal, B. Uria, C. Blundell,
[31] X. Ding, Y. Zhang, T. Liu, and J. Duan. Deep learning for event-driven S. Mohamed, and A. Lerchner. Early Visual Concept Learning with
stock prediction. In Proceedings of the 24th International Conference Unsupervised Deep Learning. ArXiv e-prints, June 2016.
on Artificial Intelligence, IJCAI’15, pages 2327–2333. AAAI Press, [54] C. Hong, J. Yu, R. Xie, and D. Tao. Weakly supervised hand pose
2015. recovery with domain adaptation by low-rank alignment. In 2016 IEEE
[32] Y. Ding, S. Chen, and J. Xu. Application of deep belief networks 16th International Conference on Data Mining Workshops (ICDMW),
for opcode based malware detection. In 2016 International Joint pages 446–453, Dec 2016.
Conference on Neural Networks (IJCNN), pages 3901–3908, July 2016. [55] Z. Hu, W. Liu, J. Bian, X. Liu, and T.-Y. Liu. Listening to Chaotic
[33] S. Dodge and L. Karam. Understanding how image quality affects deep Whispers: A Deep Learning Framework for News-oriented Stock Trend
neural networks. In 2016 Eighth International Conference on Quality Prediction. ArXiv e-prints, Dec. 2017.
of Multimedia Experience (QoMEX), pages 1–6, June 2016. [56] H. H. Huang and H. Liu. Big data machine learning and graph ana-
[34] L. Du, Y. Du, Y. Li, J. Su, Y. C. Kuan, C. C. Liu, and M. C. F. lytics: Current state and future challenges. In 2014 IEEE International
Chang. A reconfigurable streaming deep convolutional neural network Conference on Big Data (Big Data), pages 16–17, Oct 2014.
accelerator for Internet of Things. IEEE Transactions on Circuits and [57] L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D.
Systems I: Regular Papers, 65(1):198–208, Jan 2018. Tygar. Adversarial machine learning. In Proceedings of the 4th ACM
[35] A. Ducournau and R. Fablet. Deep learning for ocean remote sensing: Workshop on Security and Artificial Intelligence, AISec ’11, pages 43–
an application of convolutional neural networks for super-resolution 58, New York, NY, USA, 2011. ACM.
on satellite-derived sst data. In 2016 9th IAPR Workshop on Pattern [58] Y. Huang, X. Ma, X. Fan, J. Liu, and W. Gong. When deep learning
Recogniton in Remote Sensing (PRRS), pages 1–6, Dec 2016. meets edge computing. In 2017 IEEE 25th International Conference
[36] N. Ekedebe, C. Lu, and W. Yu. Towards experimental evaluation on Network Protocols (ICNP), volume 00, pages 1–2, Oct. 2017.
of intelligent transportation system safety and traffic efficiency. In [59] E. P. Ijjina and K. M. Chalavadi. Human action recognition in rgb-d
2015 IEEE International Conference on Communications (ICC), pages videos using motion sequence information and deep learning. Pattern
3757–3762, June 2015. Recognition, 72(Supplement C):504 – 516, 2017.
[37] P. Eulenberg, N. Köhler, T. Blasi, A. Filby, A. E. Carpenter, P. Rees, [60] M. Iliadis, L. Spinoulas, and A. K. Katsaggelos. Deep fully-connected
F. J. Theis, and F. A. Wolf. Reconstructing cell cycle and disease networks for video compressive sensing. Digital Signal Processing,
progression using deep learning. Nature Communications, 8(1):463, 72(Supplement C):9 – 18, 2018.
2017. [61] S. Jain and J. Dhar. Image based search engine using deep learning.
[38] Z. M. Fadlullah, F. Tang, B. Mao, N. Kato, O. Akashi, T. Inoue, In 2017 Tenth International Conference on Contemporary Computing
and K. Mizutani. State-of-the-art deep learning: Evolving machine (IC3), pages 1–7, Aug 2017.
intelligence toward tomorrow’s intelligent network traffic control sys- [62] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
tems. IEEE Communications Surveys and Tutorials, 19(4):2432–2455, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for
Fourthquarter 2017. fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
19
[63] V. Jindal. Integrating mobile and cloud for PPG signal selection [85] X. Luo, Y. Lv, M. Zhou, W. Wang, and W. Zhao. A laguerre neural
to monitor heart rate during intensive physical exercise. In 2016 network-based ADP learning scheme with its application to tracking
IEEE/ACM International Conference on Mobile Software Engineering control in the Internet of Things. 20, 04 2016.
and Systems (MOBILESoft), pages 36–37, May 2016. [86] C. Ma, Z. Zhu, J. Ye, J. Yang, J. Pei, S. Xu, R. Zhou, C. Yu, F. Mo,
[64] G. Kahn, A. Villaflor, B. Ding, P. Abbeel, and S. Levine. Self- B. Wen, and S. Liu. DeepRT: deep learning for peptide retention time
supervised Deep Reinforcement Learning with Generalized Compu- prediction in proteomics. ArXiv e-prints, May 2017.
tation Graphs for Robot Navigation. ArXiv e-prints, Sept. 2017. [87] L. Ma, Z. Chen, L. Xu, and Y. Yan. Multimodal deep learning for solar
[65] J. Kim, J. Kim, B. Kim, M. Lee, and J. Lee. Hardware design explo- radio burst classification. Pattern Recognition, 61:573 – 582, 2017.
ration of fully-connected deep neural network with binary parameters. [88] J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. Aparicio
In 2016 International SoC Design Conference (ISOCC), pages 305– Ojea, and K. Goldberg. Dex-Net 2.0: Deep Learning to Plan Robust
306, Oct 2016. Grasps with Synthetic Point Clouds and Analytic Grasp Metrics. ArXiv
[66] J. Kim, J. Kim, H. L. T. Thu, and H. Kim. Long short term e-prints, Mar. 2017.
memory recurrent neural network classifier for intrusion detection. In [89] M. S. Mahmud, H. Wang, E. E. Alam, and H. Fang. A real time and
2016 International Conference on Platform Technology and Service non-contact multiparameter wearable device for health monitoring. In
(PlatCon), pages 1–5, Feb 2016. 2016 IEEE Global Communications Conference (GLOBECOM), pages
[67] S. C. Kim. Device-free activity recognition using CSI big data analysis: 1–6, Dec 2016.
A survey. In 2017 Ninth International Conference on Ubiquitous and
[90] N. Majumder, S. Poria, A. Gelbukh, and E. Cambria. Deep learning-
Future Networks (ICUFN), pages 539–541, July 2017.
based document modeling for personality detection from text. IEEE
[68] P. V. Klaine, M. A. Imran, O. Onireti, and R. D. Souza. A survey
Intelligent Systems, 32(2):74–79, Mar 2017.
of machine learning techniques applied to self-organizing cellular
networks. IEEE Communications Surveys Tutorials, 19(4):2392–2431, [91] S. Mallapuram, N. Ngwum, F. Yuan, C. Lu, and W. Yu. Smart city: The
Fourthquarter 2017. state of the art, datasets, and evaluation platforms. In 2017 IEEE/ACIS
[69] P. T. Komiske, E. M. Metodiev, and M. D. Schwartz. Deep learning 16th International Conference on Computer and Information Science
in color: towards automated quark/gluon jet discrimination. Journal of (ICIS), pages 447–452, May 2017.
High Energy Physics, 2017(1):110, Jan 2017. [92] D. Marquardt and S. Doclo. Noise power spectral density estimation
[70] J. Korczak and M. Hemes. Deep learning for financial time series for binaural noise reduction exploiting direction of arrival estimates. In
forecasting in A-Trader system. In 2017 Federated Conference on 2017 IEEE Workshop on Applications of Signal Processing to Audio
Computer Science and Information Systems (FedCSIS), pages 905–912, and Acoustics (WASPAA), pages 234–238, Oct 2017.
Sept 2017. [93] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
[71] K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,
Gerber, and L. E. Barnes. HDLTex: Hierarchical Deep Learning for S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran,
Text Classification. ArXiv e-prints, Sept. 2017. D. Wierstra, S. Legg, and D. Hassabis. Human-level control through
[72] O. Z. Kraus, B. T. Grys, J. Ba, Y. Chong, B. J. Frey, C. Boone, and deep reinforcement learning. Nature, 518:529 EP –, Feb 2015.
B. J. Andrews. Automated analysis of high-content microscopy data [94] D. C. Mocanu, E. Mocanu, P. H. Nguyen, M. Gibescu, and A. Liotta.
with deep learning. Molecular Systems Biology, 13(4), 2017. Big IoT data mining for real-time energy disaggregation in buildings. In
[73] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification 2016 IEEE International Conference on Systems, Man, and Cybernetics
with deep convolutional neural networks. In Proceedings of the 25th (SMC), pages 003765–003769, Oct 2016.
International Conference on Neural Information Processing Systems - [95] M. Mohammadi, A. Al-Fuqaha, M. Guizani, and J. S. Oh. Semi-
Volume 1, NIPS’12, pages 1097–1105, USA, 2012. Curran Associates supervised deep reinforcement learning in support of iot and smart
Inc. city services. IEEE Internet of Things Journal, PP(99):1–1, 2017.
[74] G. Lacey, G. W. Taylor, and S. Areibi. Deep Learning on FPGAs: [96] R. Morcel, H. Akkary, H. Hajj, M. Saghir, A. Keshavamurthy,
Past, Present, and Future. ArXiv e-prints, Feb. 2016. R. Khanna, and H. Artail. Minimalist design for accelerating convo-
[75] J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, and D. Jurafsky. Deep lutional neural networks for low-end FPGA platforms. In 2017 IEEE
Reinforcement Learning for Dialogue Generation. ArXiv e-prints, June 25th Annual International Symposium on Field-Programmable Custom
2016. Computing Machines (FCCM), pages 196–196, April 2017.
[76] P. Li, Z. Chen, L. T. Yang, Q. Zhang, and M. J. Deen. Deep [97] T. Naseer and W. Burgard. Deep regression for monocular camera-
convolutional computation model for feature learning on big data in based 6-DoF global localization in outdoor environments. In 2017
Internet of Things. IEEE Transactions on Industrial Informatics, IEEE/RSJ International Conference on Intelligent Robots and Systems
14(2):790–798, Feb 2018. (IROS), pages 1525–1530, Sept 2017.
[77] P. Li, J. Li, Z. Huang, T. Li, C.-Z. Gao, S.-M. Yiu, and K. Chen. [98] N. D. Nguyen, T. Nguyen, and S. Nahavandi. System design per-
Multi-key privacy-preserving deep learning in cloud computing. Future spective for human-level agents using deep reinforcement learning: A
Generation Computer Systems, 74(Supplement C):76 – 85, 2017. survey. IEEE Access, 5:27091–27102, 2017.
[78] Z. Li, A. Ren, J. Li, Q. Qiu, Y. Wang, and B. Yuan. DSCNN: [99] M. Nielsen. Neural networks and deep learning. 2017. http:
hardware-oriented optimization for stochastic computing based deep //neuralnetworksanddeeplearning.com/.
convolutional neural networks. In 2016 IEEE 34th International [100] A. Niimi. Deep learning for credit card data analysis. In 2015 World
Conference on Computer Design (ICCD), pages 678–681, Oct 2016. Congress on Internet Security (WorldCIS), pages 73–77, Oct 2015.
[79] F. Liang, W. Yu, D. An, Q. Yang, X. Fu, and W. Zhao. A survey on
[101] S. Park, M. Sohn, H. Jin, and H. Lee. Situation reasoning framework
big data market: Pricing, trading and protection. IEEE Access, 2018.
for the internet of things environments using deep learning results. In
[80] J. Lin, W. Yu, X. Yang, Q. Yang, X. Fu, and W. Zhao. A real-
2016 IEEE International Conference on Knowledge Engineering and
time en-route route guidance decision scheme for transportation-based
Applications (ICKEA), pages 133–138, Sept 2016.
cyberphysical systems. IEEE Transactions on Vehicular Technology,
66(3):2551–2566, March 2017. [102] A. B. Patel, M. T. Nguyen, and R. Baraniuk. A probabilistic framework
[81] J. Lin, W. Yu, N. Zhang, X. Yang, and L. Ge. On data integrity attacks for deep learning. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon,
against route guidance in transportation-based cyber-physical systems. and R. Garnett, editors, Advances in Neural Information Processing
In 2017 14th IEEE Annual Consumer Communications Networking Systems 29, pages 2558–2566. Curran Associates, Inc., 2016.
Conference (CCNC), pages 313–318, Jan 2017. [103] M. Paul. Multiclass and multi-label classification. 2017. http://cmci.
[82] J. Lin, W. Yu, N. Zhang, X. Yang, H. Zhang, and W. Zhao. A survey colorado.edu/classes/INFO-4604/files/slides-7_multi.pdf.
on Internet of Things: Architecture, enabling technologies, security and [104] K. Pei, Y. Cao, J. Yang, and S. Jana. DeepXplore: Automated Whitebox
privacy, and applications. IEEE Internet of Things Journal, 4(5):1125– Testing of Deep Learning Systems. ArXiv e-prints, May 2017.
1142, Oct 2017. [105] P. Poirson, P. Ammirato, C. Y. Fu, W. Liu, J. Kosecka, and A. C.
[83] M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, and J. Lloret. Net- Berg. Fast single shot detection and pose estimation. In 2016 Fourth
work traffic classifier with convolutional and recurrent neural networks International Conference on 3D Vision (3DV), pages 676–684, Oct
for Internet of Things. IEEE Access, 5:18042–18050, 2017. 2016.
[84] D. Luo, H. Wu, and J. Huang. Audio recapture detection using deep [106] A. S. Polydoros, L. Nalpantidis, and V. Krüger. Real-time deep
learning. In 2015 IEEE China Summit and International Conference learning of robotic manipulator inverse dynamics. In 2015 IEEE/RSJ
on Signal and Information Processing (ChinaSIP), pages 478–482, July International Conference on Intelligent Robots and Systems (IROS),
2015. pages 3442–3448, Sept 2015.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
20
[107] Y.-H. Qu, H. Yu, X.-J. Gong, J.-H. Xu, and H.-S. Lee. On the [129] P. Thammasorn, L. Wootton, E. Ford, and M. Nyflot. Deep convo-
prediction of DNA-binding proteins only from primary sequences: A lutional triplet network for quantitative medical image analysis with
deep learning approach. PLOS ONE, 12(12):1–18, 12 2017. comparative case study of gamma image classification. In 2017 IEEE
[108] A. Rajesh and M. Mantur. Eyeball gesture controlled automatic International Conference on Bioinformatics and Biomedicine (BIBM),
wheelchair using deep learning. In 2017 IEEE Region 10 Humanitarian pages 1119–1122, Nov 2017.
Technology Conference (R10-HTC), pages 387–391, Dec 2017. [130] R. Tibor Schirrmeister, L. Gemein, K. Eggensperger, F. Hutter, and
[109] D. Ramachandram and G. W. Taylor. Deep multimodal learning: T. Ball. Deep learning with convolutional neural networks for decoding
A survey on recent advances and trends. IEEE Signal Processing and visualization of EEG pathology. ArXiv e-prints, Aug. 2017.
Magazine, 34(6):96–108, Nov 2017. [131] ujjwalkarn. An intuitive explanation of convolutional
[110] D. Ravì, C. Wong, B. Lo, and G. Z. Yang. A deep learning approach neural networks. 2016. https://ujjwalkarn.me/2016/08/11/
to on-node sensor data analytics for mobile or wearable devices. IEEE intuitive-explanation-convnets/.
Journal of Biomedical and Health Informatics, 21(1):56–64, Jan 2017. [132] S. O. Uwagbole, W. J. Buchanan, and L. Fan. Numerical encoding
[111] Y. P. Raykov, A. Boukouvalas, F. Baig, and M. A. Little. What to to tame SQL injection attacks. In NOMS 2016 - 2016 IEEE/IFIP
do when K-Means clustering fails: A simple yet principled alternative Network Operations and Management Symposium, pages 1253–1256,
algorithm. PLOS ONE, 11(9):1–28, 09 2016. April 2016.
[112] T. N. Sainath, R. J. Weiss, K. W. Wilson, B. Li, A. Narayanan, [133] G. V, P. L, C. M, and et al. Development and validation of a deep
E. Variani, M. Bacchiani, I. Shafran, A. Senior, K. Chin, A. Misra, learning algorithm for detection of diabetic retinopathy in retinal fundus
and C. Kim. Multichannel signal processing with deep neural networks photographs. JAMA, 316(22):2402–2410, 2016.
for automatic speech recognition. IEEE/ACM Transactions on Audio, [134] S. Valipour, M. Siam, E. Stroulia, and M. Jagersand. Parking-stall
Speech, and Language Processing, 25(5):965–979, May 2017. vacancy indicator system, based on deep convolutional neural networks.
[113] Y. Saito, S. Takamichi, and H. Saruwatari. Statistical parametric speech In 2016 IEEE 3rd World Forum on Internet of Things (WF-IoT), pages
synthesis incorporating generative adversarial networks. IEEE/ACM 655–660, Dec 2016.
Transactions on Audio, Speech, and Language Processing, 26(1):84– [135] B. Wang, P. Yin, A. L. Bertozzi, P. J. Brantingham, S. J. Osher,
96, Jan 2018. and J. Xin. Deep Learning for Real-Time Crime Forecasting and its
[114] K. T. Schütt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and Ternarization. ArXiv e-prints, Nov. 2017.
K.-R. Müller. SchNet - a deep learning architecture for molecules and [136] D. Wang, A. Khosla, R. Gargeya, H. Irshad, and A. H. Beck. Deep
materials. ArXiv e-prints, Dec. 2017. Learning for Identifying Metastatic Breast Cancer. ArXiv e-prints, June
[115] E. Senft, P. Baxter, J. Kennedy, S. Lemaignan, and T. Belpaeme. 2016.
Supervised autonomy for online learning in human-robot interaction. [137] J. Wang, Q. Gu, J. Wu, G. Liu, and Z. Xiong. Traffic speed prediction
Pattern Recognition Letters, 99(Supplement C):77 – 86, 2017. User and congestion source exploration: A deep learning method. In 2016
Profiling and Behavior Adaptation for Human-Robot Interaction. IEEE 16th International Conference on Data Mining (ICDM), pages
[116] A. Severyn and A. Moschitti. Learning to rank short text pairs 499–508, Dec 2016.
with convolutional deep neural networks. In Proceedings of the 38th [138] Q. Wang, Y. Zheng, G. Yang, W. Jin, X. Chen, and Y. Yin. Multi-
International ACM SIGIR Conference on Research and Development scale rotation-invariant convolutional neural networks for lung texture
in Information Retrieval, SIGIR ’15, pages 373–382, New York, NY, classification. IEEE Journal of Biomedical and Health Informatics,
USA, 2015. ACM. 22(1):184–195, Jan 2018.
[117] Z. Shao, L. Zhang, and L. Wang. Stacked sparse autoencoder modeling [139] S. Wang, Y. Shang, J. Wang, L. Mei, and C. Hu. Deep features for
using the synergy of airborne LiDAR and satellite optical and sar data person re-identification. In 2015 11th International Conference on
to map forest above-ground biomass. IEEE Journal of Selected Topics Semantics, Knowledge and Grids (SKG), pages 244–247, Aug 2015.
in Applied Earth Observations and Remote Sensing, 10(12):5569–5582, [140] Y. Wang, H. Li, and X. Li. Re-architecting the on-chip memory
Dec 2017. sub-system of machine-learning accelerator for embedded devices. In
[118] R. V. Sharan and T. J. Moir. Robust acoustic event classification using Proceedings of the 35th International Conference on Computer-Aided
deep neural networks. Information Sciences, 396(Supplement C):24 – Design, ICCAD ’16, pages 13:1–13:6, New York, NY, USA, 2016.
32, 2017. ACM.
[119] S. Shi, Q. Wang, P. Xu, and X. Chu. Benchmarking State-of-the-Art [141] Y. Wang, B. Song, P. Zhang, N. Xin, and G. Cao. A fast feature fusion
Deep Learning Software Tools. ArXiv e-prints, Aug. 2016. algorithm in image classification for cyber physical systems. IEEE
[120] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu. Edge computing: Vision Access, 5:9089–9098, 2017.
and challenges. IEEE Internet of Things Journal, 3(5):637–646, Oct [142] C. Wu, H. P. Cheng, S. Li, H. Li, and Y. Chen. ApesNet: a pixel-
2016. wise efficient segmentation network for embedded devices. IET Cyber-
[121] N. Shirish Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and Physical Systems: Theory Applications, 1(1):78–85, 2016.
P. T. P. Tang. On Large-Batch Training for Deep Learning: Gener- [143] J. Wu and W. Zhao. Design and realization of WInternet: From Net of
alization Gap and Sharp Minima. ArXiv e-prints, Sept. 2016. Things to Internet of Things. ACM Trans. Cyber-Phys. Syst., 1(1):2:1–
[122] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den 2:12, Nov. 2016.
Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, [144] H. Xu, J. Lin, and W. Yu. Smart Transportation Systems: Architecture,
M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, Enabling Technologies, and Open Issues, pages 23–49. Springer
I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and Singapore, Singapore, 2017.
D. Hassabis. Mastering the game of Go with deep neural networks [145] Y. Xu, X. Luo, W. Wang, and W. Zhao. Efficient DV-HOP localization
and tree search. Nature, 529:484 EP –, Jan 2016. Article. for wireless cyber-physical social sensing system: A correntropy-based
[123] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, neural network learning scheme. Sensors, 17(135), 2017.
A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, [146] P. C. Yang, K. Sasaki, K. Suzuki, K. Kase, S. Sugano, and T. Ogata.
F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis. Repeatable folding task by humanoid robot worker using deep learning.
Mastering the game of go without human knowledge. Nature, 550:354 IEEE Robotics and Automation Letters, 2(2):397–403, April 2017.
EP–, Oct 2017. Article. [147] Q. Yang, D. An, R. Min, W. Yu, X. Yang, and W. Zhao. Optimal
[124] K. G. Srinivasa, S. Anupindi, R. Sharath, and S. K. Chaitanya. Analysis PMU placement based defense against data integrity attacks in smart
of facial expressiveness captured in reaction to videos. In 2017 IEEE grid. IEEE Transactions on Forensics and Information Security (T-IFS),
7th International Advance Computing Conference (IACC), pages 664– 12(7):1735–1750, 2017.
670, Jan 2017. [148] X. Yang, X. Ren, J. Lin, and W. Yu. On binary decomposition
[125] J. A. Stankovic. Research directions for the Internet of Things. IEEE based privacy-preserving aggregation schemes in real-time monitoring
Internet of Things Journal, 1(1):3–9, Feb 2014. systems. IEEE Transactions on Parallel and Distributed Systems,
[126] B. Su, X. Ding, H. Wang, and Y. Wu. Discriminative dimensionality 27(10):2967–2983, Oct 2016.
reduction for multi-dimensional sequences. IEEE Transactions on [149] X. Yang, T. Wang, X. Ren, and W. Yu. Survey on improving data utility
Pattern Analysis and Machine Intelligence, 40(1):77–91, Jan 2018. in differentially private sequential data publishing. IEEE Transactions
[127] Y. Sun, H. Song, A. J. Jara, and R. Bie. Internet of Things and big on Big Data, PP(99):1–1, 2017.
data analytics for smart and connected communities. IEEE Access, [150] X. Yang, P. Zhao, X. Zhang, J. Lin, and W. Yu. Toward a gaussian-
4:766–773, 2016. mixture model-based detection scheme against data integrity attacks in
[128] J. Tang, D. Sun, S. Liu, and J. L. Gaudiot. Enabling deep learning on the smart grid. IEEE Internet of Things Journal, 4(1):147–161, Feb
iot devices. Computer, 50(10):92–96, 2017. 2017.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2830661, IEEE Access
21
[151] Y. Yang, L. Wu, G. Yin, L. Li, and H. Zhao. A survey on security and
privacy issues in Internet-of-Things. IEEE Internet of Things Journal,
4(5):1250–1258, Oct 2017.
[152] C. Yeshwanth, P. S. A. Sooraj, V. Sudhakaran, and V. Raveendran.
Estimation of intersection traffic density on decentralized architectures
with deep networks. In 2017 International Smart Cities Conference
(ISC2), pages 1–6, Sept 2017.
[153] W. Yu, D. An, D. Griffith, Q. Yang, and G. Xu. Towards statistical
modeling and machine learning based energy usage forecasting in smart
grid. SIGAPP Appl. Comput. Rev., 15(1):6–16, Mar. 2015.
[154] W. Yu, L. Ge, G. Xu, and X. Fu. Towards neural network based mal-
ware detection on android mobile devices. Pino R., Kott A., Shevenell
M. (eds) Cybersecurity Systems for Human Cognition Augmentation.
Advances in Information Security, 61, 2014.
[155] W. Yu, F. Liang, X. He, W. G. Hatcher, C. Lu, J. Lin, and X. Yang. A
survey on the edge computing for the Internet of Things. IEEE Access,
PP(99):1–1, 2017.
[156] W. Yu, G. Xu, Z. Chen, and P. Moulema. A cloud computing based
architecture for cyber security situation awareness. In 2013 IEEE
Conference on Communications and Network Security (CNS), pages
488–492, Oct 2013.
[157] W. Yu, H. Xu, H. Zhang, D. Griffith, and N. Golmie. Ultra-dense
networks: Survey of state of the art and future directions. In 2016 25th
International Conference on Computer Communication and Networks
(ICCCN), pages 1–10, Aug 2016.
[158] W. Yu, H. Zhang, L. Ge, and R. Hardy. On behavior-based detection of
malware on android platform. In 2013 IEEE Global Communications
Conference (GLOBECOM), pages 814–819, Dec 2013.
[159] X. Yuan, P. He, Q. Zhu, R. Rana Bhat, and X. Li. Adversarial
Examples: Attacks and Defenses for Deep Learning. ArXiv e-prints,
Dec. 2017.
[160] Y. Yuan and K. Jia. A distributed anomaly detection method of opera-
tion energy consumption using smart meter data. In 2015 International
Conference on Intelligent Information Hiding and Multimedia Signal
Processing (IIH-MSP), pages 310–313, Sept 2015.
[161] A. Zanella, N. Bui, A. Castellani, L. Vangelista, and M. Zorzi. Internet
of Things for smart cities. IEEE Internet of Things journal, 1(1):22–32,
2014.
[162] X. Zhang, X. Li, J. An, L. Gao, B. Hou, and C. Li. Natural language
description of remote sensing images based on deep learning. In
2017 IEEE International Geoscience and Remote Sensing Symposium
(IGARSS), pages 4798–4801, July 2017.
[163] X. Zhang, X. Pan, and S. Wang. Fuzzy DBN with rule-based knowl-
edge representation and high interpretability. In 2017 12th International
Conference on Intelligent Systems and Knowledge Engineering (ISKE),
pages 1–7, Nov 2017.
[164] X. L. Zhang. Speech separation by cost-sensitive deep learning. In 2017
Asia-Pacific Signal and Information Processing Association Annual
Summit and Conference (APSIPA ASC), pages 159–162, Dec 2017.
[165] P. Zhao, W. Yu, D. Quan, and X. Yang. Deep learning-based detection
scheme with raw ECG signal for wearable telehealth systems. In
Technical Report 2018-CS-021, Dept. of Computer and Information
Science, Towson University, March 2018.
[166] Y. Zhao, J. Li, and L. Yu. A deep learning ensemble approach for
crude oil price forecasting. Energy Economics, 66:9 – 16, 2017.
[167] Z. Zhao, J. Guo, E. Ding, Z. Zhu, and D. Zhao. Terminal replacement
prediction based on deep belief networks. In 2015 International
Conference on Network and Information Systems for Computers, pages
255–258, Jan 2015.
[168] D. Zhu, H. Jin, Y. Yang, D. Wu, and W. Chen. DeepFlow: deep
learning-based malware detection by mining Android application for
abnormal usage of sensitive data. In 2017 IEEE Symposium on
Computers and Communications (ISCC), pages 438–443, July 2017.
[169] J. Zhu, Y. Song, D. Jiang, and H. Song. A new deep-q-learning-
based transmission scheduling mechanism for the cognitive Internet
of Things. IEEE Internet of Things Journal, PP(99):1–1, 2017.
[170] M. Zolotukhin, T. Hämäläinen, T. Kokkonen, and J. Siltanen. In-
creasing web service availability by detecting application-layer DDoS
attacks in encrypted traffic. In 2016 23rd International Conference on
Telecommunications (ICT), pages 1–6, May 2016.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.