0% found this document useful (0 votes)
2K views8 pages

A Brief Survey of Machine Learning Methods and Their Sensor and IoT Applications

This document provides a brief survey of machine learning methods and their applications in sensor networks and the Internet of Things (IoT). It begins with definitions of machine learning and discusses various learning modalities like supervised learning, unsupervised learning, and deep learning. It then describes applications of machine learning algorithms in areas like pattern recognition, sensor networks, anomaly detection, and health monitoring. Specific machine learning algorithms discussed include linear regression, logistic regression, support vector machines (SVM), and neural networks.

Uploaded by

Daniel Vásquez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views8 pages

A Brief Survey of Machine Learning Methods and Their Sensor and IoT Applications

This document provides a brief survey of machine learning methods and their applications in sensor networks and the Internet of Things (IoT). It begins with definitions of machine learning and discusses various learning modalities like supervised learning, unsupervised learning, and deep learning. It then describes applications of machine learning algorithms in areas like pattern recognition, sensor networks, anomaly detection, and health monitoring. Specific machine learning algorithms discussed include linear regression, logistic regression, support vector machines (SVM), and neural networks.

Uploaded by

Daniel Vásquez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

A Brief Survey of Machine Learning Methods

and their Sensor and IoT Applications


Uday Shankar Shanthamallu, Andreas Spanias, Cihan Tepedelenlioglu, and Mike Stanley*
SenSIP Center, School of ECEE, Arizona State University, NXP Semiconductors*
Tempe, AZ 85287, USA
[email protected]

Abstract—This paper provides a brief survey of the basic projects, mobile health monitoring, networked security,
concepts and algorithms used for Machine Learning and its manufacturing, self-driven automobiles, surveillance, intelligent
applications. We begin with a broader definition of machine border control; every application has its idiosyncrasies and
learning and then introduce various learning modalities including requires customized features, adaptive learning, and data fusion.
supervised and unsupervised methods and deep learning Data compression and statistical signal and data analysis has a
paradigms. In the rest of the paper, we discuss applications of large role transmitting and interpreting data and producing
machine learning algorithms in various fields including pattern meaningful analytics. Machine Learning algorithms can be
recognition, sensor networks, anomaly detection, Internet of broadly classified into three categories based on the properties,
Things (IoT) and health monitoring. In the final sections, we
style of learning, and the way data are used [13]: supervised,
present some of the software tools and an extensive bibliography.
unsupervised and semi-supervised algorithms. This type of
I. INTRODUCTION classification is important in identifying the role of the input
data, the utility of the algorithms and learning models relative to
Machine Learning [1-10,89], as described by Arthur Samuel the applications.
in 1959 [11], is a “Field of study that gives computers the ability
to learn without being explicitly programmed.” In 1997, Tom II. SUPERVISED LEARNING
Mitchell [12] gave a more formal definition, namely: “A In supervised learning, “true” or “correct” labels of the input
Computer program is said to learn from an experience E with dataset are available. The algorithm is “trained” using the
respect to some task T and some performance measure P, if its
labeled input dataset (training data) which means ground truth
performance on T, as measured by P, improves with experience
samples are available for training. In the training process, the
E.”
Although, the term machine learning has its origins in algorithm makes appropriate predictions on the input data and
computer science, there have been several vector quantization improves its estimates using the ground truth and reiterating
methods [106] developed in telecommunications and signal until the algorithm reaches a desired level of accuracy. In almost
processing for coding and compression [105]. In computer and all the machine learning algorithms, we optimize a cost function
data science, learning is accomplished based on examples (data or an objective function. The cost function is typically a measure
samples) and experience. A basic signal/data processing [86- of the error between the ground truth and the algorithm
88,90] framework that includes pre-processing, noise removal estimates. By minimizing the cost function, we train our model
and segmentation is shown in Figure 1, where, the signal is to produce estimates that are close to the correct values (ground
acquired from the sensor and then processed, typically in a truth). Minimization of the cost function is usually achieved
frame-by-frame or batch mode [94]. Removal of noise and using gradient descent technique [116-118,121,122]. Variants of
feature extraction follows next and finally the classification gradient descent technique such as stochastic gradient descent
stage which will provide either an estimate or a decision is at the for a minibatch, momentum based gradient descent [123,124],
end of the process. nesterov accelerated gradient descent [119] have been used in
many machine learning training paradigms. Suppose we have
‘ ’ number of training examples, each one of them is a labelled
data and can be represented in a pair:( , ), here represents the
Figure 1: Basic signal processing framework including pre- input data and represents the class label. The input data can
processing, feature extraction and classification. be an dimensional, whereas each dimension corresponds to a
feature or a variable. Supervised learning methods are used in
Typically, the feature extraction stage will extract compact various fields including the identification of phytoplankton
information bearing parameters that can characterize the data. species [14], mapping rainfall induced landslides [15], and
The classification stage will have to be trained by a machine classification of biomedical data [16]. In [91], a machine
learning algorithm to recognize and classify the collection of learning algorithm is integrated on an embedded sensor system
features. The field of machine learning is vast and applications for IoT applications. In the following sub-sections, we present
are expanding rapidly especially with the emergence of fast supervised learning algorithms.
mobile devices that also have access to cloud computing [108].
Compressing and extracting information from sensors and big
data have recently elevated interest in the area. Smart city
A. Linear Regression The output of the sigmoid function is a value between 0 and 1.
Regression [17-19] is a statistical technique of estimating the All values below 0.5 belong to negative class and values greater
relationship between input and output variables. It maps the than or equal to 0.5 belong to positive class. The application of
input variables to a continuous function. A simple univariate Logistic Regression is seen in various fields including
linear regression [20-22, 24] model is shown in Figure 2. evaluating Trauma care [27], patient severity assessment [28],
determining the risk of heart disease [29], early detection and
recognition of Glaucoma in ocular thermographs [32], and in
computer vision and adaptive object tracking [33]. For a
multiclass classification problem, we can have one-vs-all
implementation.
C. Support Vector Machines (SVM)
Support Vector Machines [1-4,34,35,37] are one of the
Figure 2: A simple Linear Figure 3: Sigmoid curve
regression example with one having a bound between 0 and popular supervised learning models, mainly used for binary
feature/variable. 1. classification as well as multi-class classification. SVM maps
the input data as points in a ‘ ’ dimensional space and draws a
‘ − 1’ dimensional hyperplane to separate the data points into
The training dataset consists of ‘ ’ labelled training two groups. This can be visualized easily for two-dimensional
sets( , ) ⋴ , is the independent variable and is the data points as shown in the Figure 4. From the labeled dataset,
dependent variable. The linear regression model assumes the SVM algorithm tries to divide these points to two separate
relationship between independent variable and dependent groups by a hyperplane, which is in this case a line, such that the
variable is linear and fits a straight line to the data points. This width of separation between the two groups is maximized. In the
relationship is expressed by a hypothesis function or a prediction Figure 4, ‘B’ is a line which just separates two classes. However,
function. It is expressed as the line ‘A’ gives the maximum separation between the classes.
The data points which are close to the hyperplane (line ‘A’) are
ℎ( ) = + + +. . . + (1) called support vectors. Maximum margin was proposed by
where , ,. . . are the features and , , . . . are the Vapnik in 1963 and the SVM algorithm was introduced in 1992
weights of the model. As shown in [142] an FIR filtering [36]. Vapnik et.al also proposed a technique to generate a non-
approach can be used to perform linear regression through slope linear hyperplane known as the “Kernel trick” when the data is
filtering. Equation (1) is for a multivariate linear regression non-linearly separable. The kernel trick is achieved by
model. The output is the linear sum of the weighted input transforming the non-linearly separable input data to a higher
features. The weights are typically learned by weighted least dimensional space or Hilbert space, where, the transformed data
squares minimization process. We can also make use of
is now linearly separable. The linear hyperplane is drawn in this
quadratic, cubic or higher polynomial [144-145] terms to obtain
space and transformed back into original feature space. Many
completely different hypothesis function which can fit quadratic
[143], cubic or polynomial curves respectively, rather than a types of kernels are used in practice including Gaussian kernels
simple straight line. Multivariate linear regression is used for [130-134], the radial basis function [120], and the polynomial
several applications, including activity recognition and kernel [125-128]. In 1995, Vapnik and Cortes proposed the soft-
classification [23], steady state visual evoked potential (SSVEP) margin approach [38] where the maximum margin constraint is
recognition for BCI data [25,26]. relaxed by introducing the slack variables which allows outliers
of either class to be present on the other side of the hyperplane.
B. Logistic Regression A major advantage of SVM is that it avoids overfitting and is
The objective of multivariate regression model is to non-probabilistic. SVM can also be used for regression analysis
determine a hypothesis function which outputs a continuous as well as clustering [39-41]. The SVM algorithm is used in
value. Now, we present another class of supervised learning several applications including simple binary classification [135]
algorithms: Classification, in which the objective is to obtain a text categorization [136-138], hand written digit recognition
discrete output. Logistic regression [30,31] is a statistical way [139-141], novelty, anomaly or outlier detection [42,43],
of modelling a binomial outcome. As before, the input can have intrusion detection [51], emotion recognition [67], stress
one or more features (or variables). For a binary logistic detection [69], noise robust speech recognition [129]. Different
regression, the outcome can be a 0 or 1 which performs binary variations of SVM have also been proposed including the least
classification of positive class from negative class. Logistic square SVM (LS-SVM) [44], one-class SVM for anomaly
regression uses a sigmoid curve shown in the Figure 3 to output detection [45-50, 85], and adaptive SVM [53].
a probability value and thus performs the classification. The
hypothesis function for a logistic regression is given by D. Naïve Bayes
ℎ( ) = ( + + + . . . + ) (2) Naïve Bayes [68] classifiers are simple probabilistic
where ( ) is a sigmoid function given by classifiers. The term “Naïve” is used because of the strong
1 (3) assumption of the algorithm, that, all the input features are
( )= independent of each other and no correlation exists between
1+
them. Naïve Bayes is based on Bayes’ theorem. Being a
probabilistic model, Naïve Bayes’ outputs a posterior III. UNSUPERVISED LEARNING
probability of belonging to a class given the input features. In the case of unsupervised algorithms [70,71], there are no
explicit labels associated with the training dataset. The
( | )= ( | , , ,... ) (4) objective is to draw inferences from the input data and then
model the hidden or the underlying structure and the
( | ) ( ) (5) distribution in the data, in order to learn more about the data.
( | )=
( ) Clustering is the most common example of an unsupervised
algorithm. The details of the same is mentioned below.
for each possible outcomes or number of classes. Here,
( | ) is the posterior probability that given feature A. Clustering
belongs to th class , and ( ) is the prior probability of the Clustering [75,81,82] deals with finding a structure or
class independent of the data, and ( | ) is the likelihood pattern in a collection of unlabeled dataset. For a given dataset,
which is the probability of the predictor given the class and clustering algorithm groups the given data into K number of
( ) is the prior probability of the predictor which is the clusters such that the data points within each cluster are similar
normalizing factor. There are many variations of Naïve Bayes to each other and data points from different clusters are
theorem, some of them tackle the poor assumptions of Naïve dissimilar. Similar to k-NN algorithm, we make use of a
Bayes [54,55,56]. Naïve Bayes algorithm is used for text similarity metric or distance metric. Different distance metrics
classification [57], for credit scoring [58], for emotion such as Euclidean, Mahalanobis, cosine, Minkowski etc. are
classification and recognition [67], and detection of epileptic used. Although Euclidean distance metric is used more often, it
seizures from EEG signals [146]. is shown in [74] that it is not a suitable metric to capture the
quality of the clustering. The K-means algorithm is one of the
E. k-Nearest Neighbors simplest clustering algorithms and is an intuitive and iterative
The k-Nearest Neighbors (k-NN) algorithm [1,60,61,65] is algorithm. It clusters the data by separating them into K groups
one of the simplest supervised machine learning algorithm. k- of equal variances, minimizing the inertia or within-cluster
NN can be used for classification of input points to discrete sum-of-squares. However, the algorithm requires the number of
outcomes. A simple k-NN model is shown in Figure 5. clusters to be specified before running the algorithm. Each
observation or the data point is assigned to the cluster with the
nearest mean ( ) , which is also referred to as the Centroid of
that cluster. Thus, the K clusters can be specified by the K
centroids. After the random assignment of K centroids, the
algorithms inner loop iterates over the following two steps:
(i) Assign each observation ( ) ( ( ) is the th sample point) to
the closest cluster centroid ( )
(ii) Update each cluster’s centroid to the mean of the points
Figure 4: Maximum margin Figure 5: A simple k-NN model assigned to it.
intuition; hyperplane A has for different values of k = 1,2, . . . ; Total of observations or data points
maximum separation. = 1,2, . . . ; Total of K clusters and hence K centroids
The inertia or the within-cluster sum-of-squares is given
k-NN can be used for regression analysis [64,147] where the by:
outcome of a dependent variable is predicted from the input
independent variables. In Figure 5, for k =3, the test point (star) () − () = () − ( ) (6)
is classified as belonging to class B and for k=6, the point is ( )∈

classified as belonging to class A. k-NN is a non-probabilistic () denotes that, for the th sample, is the closest
and non-parametric model [62,63,93] and hence it is the first centroid. K-means clustering algorithms leads to Voronoi
choice for classification study when there is no prior knowledge tessellation. K-means algorithms iterations stops (converges)
about the distribution of data. k-NN stores all the labelled input when there is no change in the value of means of the clusters.
points to classify any unknown sample and this makes it In Figure 6, a converged K-means algorithm is shown.
computationally expensive. The classification is based on the Clustering has several applications in many fields. In biology,
similarity measure (a distance metric). Any unknown sample is clustering has been used to determine groups of genes that have
classified by the majority vote of its k nearest neighbors. The similar functions [77-79], for detection of brain tumor in [76],
complexity increases as the dimensionality increases and hence cardiogram data clustering [80], in business and e-commerce
dimensionality reduction techniques [164] are performed before analysis [83] and information retrieval [92], image
using k-NN to avoid the effects of curse of dimensionality [66]. segmentation [72] and compression [84], in the study of
k-NN classifier is used for stress detection using physiological quantitative resolutions of nanoparticles [95], in fault detection
signals in [69] and detection of epileptic seizures [146]. in Solar PV panels [101,187,188] and in speech recognition
[148].
[152,154,170]. Each successive layer takes the output of the
previous layer and feeds the result to the next layer.

Figure 6: The K-means and the cluster centroids.

B. Vector Quantization
In its simplest form vector quantization [102,103,106]
organizes data in vectors and represents them by their centroids. Figure 9: Artificial Neural Network with four hidden layers.
It typically uses a K-means clustering algorithm to train the
quantizer. The centroids form codewords and all the codewords Typical artificial neural networks challenges include
are stored in a Codebook. Vector quantization is a lossy initialization of the network parameters, overfitting, and long
compression method and is used in several coding applications. training time. We now have various techniques to address the
As a result, the compressed data has errors that are inversely above problems. Batch normalization [182], normalization
proportional to density. This property is shown in Figure 8 and propagation [183], weight normalization [184], layer
compared with uniform quantization Figure 7. normalization [185] all help in accelerating the training of deep
neural networks. Dropouts [160] help in reducing overfitting.
There are several network architectures including the one
shown in Figure 9 which consists of dot product layers (fully
connected layers). A convolutional layer [167] processes
volume of activations rather than a vector and produces feature
maps. It also makes use of a subsampling layer or a max-
pooling layer to reduce the size of the feature maps. Figure 10
Figure 7: Uniform quantization Figure 8: Vector quantization shows an example of a convolutional neural network (CNN).
of 2-dimensional Data. of 2-dimensional Data. Networks whose output depends on present and past inputs,
namely recurrent neural networks (RNNs) [169,172,173], have
The Vector quantization technique is used in various speech also been used in several applications.
applications including speech coding [103,107], emotion
recognition [104], audio compression [105], large-scale image
classification [149] and image compression [150].
IV. DEEP LEARNING
In this section, a brief introduction to the field of artificial
neural networks is provided with the focus on deep learning
[151,153,161] methodologies and their applications. Artificial Figure 10: A CNN with 3 convolutional, 2 subsampling layers.
neural networks are widely used in the areas of image
classification, pattern recognition and they have proved to be V. SENSOR AND IOT APPLICATIONS
the most successful and they achieve superior results in various
The Internet of Things (IoT) [189] is a system of
fields including signal processing [163,168,171], computer
connected physical devices, smart machines or objects that have
vision [157], speech processing [162,165,166] and natural
unique identifiers. The devices will typically consist of
language processing [158,186]. Deep learning is a branch in
electronics, software, sensors, and radios enabling these objects
machine learning that has gained popularity quite recently,
to continuously collect and transfer data. Sensors that consist
capable of learning multiple levels of abstraction. Although, the
of a transducer that will convert some form of physical process
inception of neural networks dates in 1960 [156], deep learning
into an electrical signal. Examples include microphones,
gained more popularity since 2012 [155] because of the great
cameras, accelerometers, thermometers, pressure sensors etc.
advancements in the GPUs [99] and availability of large labeled
Perhaps a mobile phone is a good example of a connected
datasets. In Figure 9, a simple artificial neural network with 4
device that embeds several heterogeneous sensors including
hidden layers is shown. The last layer, namely the output layer,
microphone arrays, at least two cameras, magnetometers,
performs classification. The term “deep learning” [159] refers
accelerometers etc. First generation smart phones for example
to several layers used to learn multiple levels of representation
typically included six sensors. These days a Galaxy S5 has 26
sensors including microphones, cameras, magnetometer,
accelerometer, proximity, IR, pressure, humidity, gyro etc. and NumPy [177,178]. TensorFlow [115,181] is an open source
Accelerometers and magnetometers (Fig. 11) have been used in software library for numerical computation using data flow
many applications, including machine monitoring, structural graphs and is very popular in deep learning and computer vision.
monitoring, human activity, and healthcare [190-193]. Other The Azure Machine Learning Studio [111,112,176] is a drag and
areas of collaborative sensing and machine learning include drop tool for analytics. IBM Bluemix [174,175] is a cloud
localization [199-201] platform that supports several programming languages as well
as integrated DevOps.
CONCLUSION
This Machine Learning short survey paper supported the
tutorial session of the IISA2017. The paper covered supervised
and unsupervised learning models. We also provided a brief
introduction to current deep learning methodologies and
outlined several applications including pattern recognition,
anomaly detection, computer vision, speech processing, and IoT
applications. The paper provides extensive bibliography of
Figure 11. A magnetometer can help align with the earth's field. machine algorithms and their applications.
Clever entertainment and information exchange systems such ACKNOWLEDGMENT
as smart speakers combine multiple technologies such as This work was supported in part by the NSF I/UCRC
circular microphone arrays (Fig. 12), local and cloud based award 1540040, IUSE award 1525716, NXP, and the ASU
machine learning and information retrieval algorithms. The SenSIP Center.
Amazon Echo represents a recent example of an IoT device that
has a circular microphone array along with voice recognition REFERENCES
capabilities. Local and cloud computing allow this device to: [1] C. M. Bishop, Pattern recognition and machine learning (information
interface with various other systems, exchange information, science and statistics). New York: Springer-Verlag New York, 2008.
[2] R. O. Duda, D. G. Stork, and P. E. Hart, Pattern classification: Pt.1:
provide e-services, playback music and news on demand, and
Pattern classification, 2nd ed. New York: John Wiley & Sons, 2000.
provide human to machine interface for a smart home. [3] S. Marsland, Machine learning: An algorithmic perspective. Boca Raton:
Chapman & Hall/CRC, 2009.
[4] Y. Kodratoff, Introduction to machine learning. Morgan Kaufmann,
1993.
[5] R. S. Michalski et al., Machine learning an artificial intelligence
approach. Berlin, Heidelberg, Springer, 1983.
[6] J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical
learning: Data mining, inference, and prediction, Springer, NY, 2009.
[7] Miroslav Kubat, An Introduction to Machine Learning. Springer
International Publishing, ISBN 978-3-319-20009-5, 2015.
[8] "Machine learning," in Wikipedia, Wikimedia Foundation, 2016.
[Online]. Available: https://en.wikipedia.org/wiki/Machine_learning.
[9] E. Alpaydin, Introduction to machine learning, MIT Press, 2010.
[10] Alexander Johannes Smola and S. Vishwanathan, Introduction to
Machine Learning, Cambridge University Press, 2008.
[11] A. L. Samuel, "Some studies in machine learning using the game of
checkers," IBM Journal of R&D, vol. 3, no. 3, pp. 210–229, Jul. 1959.
[12] T. M. Mitchell, Machine learning, 7th ed. NY, McGraw Hill, 1997.
Figure 12: Microphone array on Amazon EchoTM.(from [202]) [13] J. L. Berral-García, "A quick view on current techniques and machine
learning algorithms for big data analytics," ICTON Trento, pp.1-4, 2016.
[14] T. Phan et al., "Comparative study on supervised learning methods for
The interconnection of IoT smart devices is also enabling identifying phytoplankton species," ICCE, Vietnam, pp. 283-288, 2016.
advanced large-scale applications such as smart cities [15] S. Heleno, M. Silveira, M. Matias and P. Pina, "Assessment of
[194,195], large-scale smart networks and radios, smart campus supervised methods for mapping rainfall induced landslides in VHR
systems [196-198]. The field of sensors and IoT applications is images," IGARSS, Milan, pp. 850-853, 2015.
[16] P. Drotar and Z. Smeakal, "Comparative Study of Machine Learning
vast and large-scale applications are beginning to emerge. Techniques for Supervised classification of Biomedical Data,"Acta
These include several smart and connected health and Electrotechnica et Informatica, vol. 14, pp. 5–10, Sep. 2014.
community systems. [17] F. Galton, Natural Inheritance, Proc Royal Soc.y of London, 1989.
[18] F. Galton, Anthropological Miscellanea: "Regression towards mediocrity
VI. IMPLEMENTATION AND SOFTWARE TOOLS in hereditary stature,", The Journal of the Anthropological Institute of
Great Britain and Ireland, pp. 246–263, 1886.
This section introduces some of the machine learning tools. [19] F.Galton, "Co-relations and their measurement, chiefly from
All the algorithms explained in sections II and III can be anthropometric data, Proc.s Royal Soc. of London, pp. 135-145, 1989.
implemented in various platforms and libraries, e.g., the R [20] G. A. F. Seber, A. J. Lee, and R. A. Lee, Linear regression analysis, 2nd
ed. New York, NY, United States: Wiley, John & Sons, 2003.
[110,113] and Python [180] languages. Python is one of the [21] D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to linear
most utilized environments for machine learning. There are also regression analysis, 5th ed. Oxford: Wiley-Blackwell, 2012.
a number of libraries available such as SciKit-Learn [114,179]
[22] H. Motulsky and A. Christopoulos, Fitting models to biological data [56] Rennie, J.; Shih, L.; Teevan, J.; Karger, D.; Tackling the poor
using linear and nonlinear regression: A practical guide to curve fitting. assumptions of Naive Bayes classifiers, ICML, 2003.
New York: Oxford University Press, 2004. [57] Chai, K.; H. T. Hn, H. L. Chieu; “Bayesian Online Classifiers for Text
[23] S. Gayathri et al., "Multivariate linear regression based activity Classification and Filtering”, ACM SIGIR, pp 97-104, August 2002.
recognition and classification," ICICES Chennai, pp. 1-6, 2014. [58] R. Vedala et al., "An application of Naive Bayes classification for credit
[24] P. Chandler et al., "Constrained Linear Regression for Flight Control scoring in e-lending platform," ICDSE, pp. 81-84, 2012.
System Failure Identification," ACC, San Francisco, p. 3141, 1993. [59] Lewis, D. D, Naive (Bayes) at forty: The independence assumption in
[25] H. Wang et al., "SSVEP recognition using multivariate linear regression information retrieval. Proceedings of ECML, 1998.
for brain computer interface," ICCC Chengdu, pp. 176-180, 2015. [60] S. Inc, "K-nearest neighbors," 2016. [Online]. Available:
[26] H. Wang et al., "Discriminative Feature Extraction via Multivariate http://www.statsoft.com/textbook/k-nearest-neighbors.
Linear Regression for SSVEP-Based BCI," in IEEE Trans. on Neural [61] T. M. Cover and P. E. Hart, "Nearest neighbour pattern classification,"
Systems and Rehabilitation Eng., vol. 24, no. 5, pp. 532-541, May 2016. IEEE Trans. Inform. Theory, vol. IT-13, pp. 21-27, Jan. 1967.
[27] C. R. Boyd et al., "Evaluating trauma care," The Journal of Trauma: [62] L. Peterson, "K-nearest neighbor," Scholarpedia, vol. 4, p. 1883, 2009.
Injury, Infection, and Critical Care, vol. 27, pp. 370–378, Apr. 1987. [63] Y. Lifshits, "Nearest neighbor search," SIGSPATIAL, v. 2, p. 12, 2010.
[28] J. R. Le Gall, "A new simplified acute physiology score (SAPS II) based [64] N. S. Altman, "An Introduction to Kernel and Nearest-Neighbor
on a European/north American multicenter study," JAMA, vol. 270, no. Nonparametric Regression," The Amer. Stat., vol. 46, p. 175, 1992.
24, pp. 2957–2963, Dec. 1993. [65] Cover TM, Hart PE, "Nearest neighbor pattern classification," IEEE
[29] J. Truett, J. Cornfield, and W. Kannel, "A multivariate analysis of the Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967.
risk of coronary heart disease in Framingham," Journal of Chronic [66] K. Beyer et al., "When Is ‘Nearest Neighbor’ Meaningful?” Database
Diseases, vol. 20, no. 7, pp. 511–524, Jul. 1967. Theory: ICDT, pp. 217–235, 1999.
[30] J. M. Hilbe, Logistic regression models. Boca Raton: Chapman and [67] E. H. Jang, B. J. Park, S. H. Kim, Y. Eum and J. H. Sohn, "A Study on
Hall/CRC, 2016. Analysis of Bio-Signals for Basic Emotions Classification: Recognition
[31] F. C. Pampel, Logistic regression: A primer. Thousand Oaks, CA: Sage Using Machine Learning Algorithms," 2014 ICISA, pp. 1-4, Seoul, 2014.
Publications, 2000. [68] W. Wu et al., "Bayesian Machine Learning: EEG/MEG signal processing
[32] Harshvardhan G et al., "Assessment of Glaucoma with ocular thermal measurements," in IEEE SPM, vol. 33, no. 1, pp. 14-36, Jan. 2016.
images using GLCM techniques and Logistic Regression [69] A. Ghaderi et al., "Machine learning-based signal processing using
classifier," WiSPNET, Chennai, India, pp. 1534-1537, 2016. physiological signals for stress detection," ICBME, Tehran, 2015.
[33] J. Song and B. Fan, "Adaptive object tracking with logistic regression," [70] M. Khanum et al., "A survey on Unsupervised machine learning
CCDC, Yinchuan, pp. 5403-5408, 2016. Algorithms for automation, classification and maintenance," IJCA, vol.
[34] N. Cristianini et al., An introduction to support vector machines: And 119, no. 13, pp. 34–39, Jun. 2015.
other kernel-based learning methods. Cambridge University Press, 2000. [71] M. E. Celebi , K. Aydin, Ed., Unsupervised Learning Algorithms, 1st ed.
[35] I. Steinwart and A. Christmann, Support vector machines. New York: Switzerland: Springer International Publishing, 2016.
Springer-Verlag New York, 2008. [72] A. Albiol et al., "An unsupervised color image segmentation algorithm
[36] Boser, B. E et al., "A training algorithm for optimal margin classifiers". for face detection applications," ICIP, Thessaloniki, pp. 681-684, 2001.
Proceedings of the fifth annual workshop on COLT, p. 144, 1992. [73] C. K. Lee, P. F. Sum and K. S. Tan, "An unsupervised learning
[37] C. J. C. Burges. “A Tutorial on Support Vector Machines for Pattern” algorithm for character recognition," Neural Networks, 1992. IJCNN.,
Recognition. Knowledge Discovery and Data Mining, 2(2), 1998. [74] N. Bouhmala, "How Good is the Euclidean Distance Metric for the
[38] C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, Clustering Problem,"IIAI-AAI, Kumamoto, pp. 312-315, 2016.
vol. 20, no. 3, pp. 273–297, Sep. 1995. [75] A. Bindal and A. Pathak, "A survey on k-means clustering and web-text
[39] Vapnik, Vladimir et al., "Support vector clustering", Journal of Machine mining," IJSR, vol. 5, no. 4, pp. 1049–1052, Apr. 2016.
Learning Research, vol. 2, pp. 125–137 2001. [76] A. A. Mandwe and A. Anjum, "Detection of brain tumor using k-means
[40] A. BenHur, "Support vector clustering," Scholarpedia, 3, p.5187, 2008. clustering," IJSR, vol. 5, no. 6, pp. 420–423, Jun. 2016.
[41] H.Xiao et al., "Indicative Support Vector Clustering with Its Application [77] K. Dhiraj and S. K. Rath, "Gene expression analysis using
on Anomaly Detection, "ICMLA Miami, FL, pp. 273-276, 2013. clustering," Int. Journal of Comp and Elec Engg pp. 155–164, 2009.
[42] D. Huang, J. H. Lai and C. D. Wang, "Incremental support vector [78] A. Bhattacharya, R. De, "Bi-correlation clustering algorithm for
clustering with outlier detection," ICPR, 2012. determining a set of co-regulated genes," Bioinf., V, 25, p. 2795, 2009.
[43] F. de Morsier et al., "Unsupervised change detection via hierarchical [79] E. Zeng, C. Yang, T. Li, and G. Narasimhan, "Clustering genes using
support vector clustering," PRRS, 2012 heterogeneous data sources," IJKDB, vol. 1, no. 2, pp. 12–28, 2010.
[44] Suykens, J.A.K.; Vandewalle, J. "Least squares support vector machine [80] C. Sundar, "An analysis on the performance of k-means clustering
classifiers", Neural Processing Letters, 9 (3), 293-300, 1999. algorithm for Cardiotocogram clustering," IJCSA, v. 2, p. 11, Oct. 2012.
[45] Bernhard S. et.al, SVM Method for Novelty detection, MIT press, 2000. [81] J. Sun, "Clustering Algorithms research," J. Software, V. 19, Jun. 2008.
[46] B Scholkopf et al., Single class support vector machines, Unsupervised [82] G. Gan, Chaoqun, and J. Wu, Data clustering: Theory, algorithms, and
learning, Dagstuhl –seminar report, pp. 19-20, 1999 applications. Philadelphia, SIAM, U.S., 2007.
[47] D.M.J Tax and R.P.W. Duin, Data description by support vectors. In [83] X. HUANG and Z. Song, "Clustering analysis on e-commerce
M.Verleysen, editor, Proceedings ESANN, Brussels, pp. 251-256, 1999. transaction based on k-means clustering," J. Networks, vol. 9, Feb. 2014.
[48] X. Peng et al., "Efficient support vector data descriptions for novelty [84] C. W. Wang and J. H. Jeng, "Image compression using PCA with
detection, "Neural Comp. and App., vol.21, pp. 2023–2032, May 2011. clustering," ISPACS New Taipei, pp. 458-462, 2012.
[49] S. Wang et al., "A modified support vector data description based [85] Kunlun Li and Guifa Teng, "Unsupervised SVM Based on p-kernels for
novelty detection approach for machinery components," Applied Soft Anomaly Detection," ICICIC Beijing, pp. 59-62, 2006.
Computing, vol. 13, no. 2, pp. 1193–1205, Feb. 2013. [86] N. Kovvali, M. Banavar, A. Spanias An Introduction to Kalman
[50] M. Yao, H. Wang, "One-Class Support Vector Machine for Functional Filtering with MATLAB Examples, Synthesis Lect. Signal Proc.,
Data Novelty Detection," 3rd Cong. Int. Sys., Wuhan, p. 172, 2012. Morgan & Claypool Publ., Ed. J. Mura, vol. 6, , Sep. 2013.
[51] ZHOU Guangping, "The study of the application in intrusion detection [87] B. Widrow, S. Stearns, Adaptive Signal Processing, Prentice Hall, 1985.
based on SVM," Journal of Conv. Inf. Tech., vol. 8, p. 11, Mar. 2013. [88] J. Foutz, A. Spanias, M. Banavar, Narrowband Direction of Arrival
[52] N. Chand et al., "A comparative analysis of SVM and its stacking with Estimation for Antenna Arrays, Synthesis Lectures on Antennas,
other classification algorithms," ICACCA, Dehradun, pp. 1-6, 2016. Morgan & Claypool Publishers, ISBN-13: 978-1598296501, Aug. 2008.
[53] M. A. Oskoei et al, "Adaptive schemes applied to online SVM for BCI [89] S. Theodorides, Machine Learning, A Bayesian and Optimization
data classification," IEEE EMBS, Minneapolis, pp. 2600-2603, 2009. Perspective, 1st Edition, Academic Press, December 2015.
[54] S. J. Russell and P. Norvig, Artificial intelligence: A modern approach. [90] A. Spanias, Digital Signal Processing; An Interactive Approach – 2nd
United Kingdom: Prentice Hall, 1994. Edition, ISBN 978-1-4675-9892-7, Lulu Press, May 2014.
[55] Irina Rish, An empirical study of the naive Bayes classifier, IJCAI
Workshop on Empirical Methods in AI, 2001.
[91] J. Lee, M. Stanley, A. Spanias, and Cihan Tepedelenlioglu, “Integrating [118] V. J. Mathews, Z. Xie, "A stochastic gradient adaptive filter with gradient
Machine Learning in Embedded Sensor Systems for Internet-of-Things adaptive step size," IEEE Trans. SP, vol. 41, p. 2075, Jun 1993.
Applications,” IEEE ISSPIT, Limassol, Cyprus Dec. 2016. [119] G. Qu and N. Li, "Accelerated Distributed Nesterov Gradient Descent for
[92] J. Thiagarajan, K. Ramamurthy, P. Turaga, A. Spanias, Image smooth and strongly convex functions," 54th Annual Allerton Conf. on
Understanding Using Sparse Representations, Synth. Lect. on Image, Comm., Control, and Computing, Monticello, IL, pp. 209-216, 2016.
Video, and Multimedia Proc., Morgan & Claypool Publ., April 2014. [120] Y. Wong, "How Gaussian radial basis functions work," IJCNN-91 Int.
[93] V. Berisha, A. Wisler, A. Hero, A. Spanias, "Empirically Estimable Joint Conf. on Neural Networks, Seattle, WA, pp. 133-138, 1991.
Classification Bounds Based on a Nonparametric Divergence Measure,” [121] J. A. Flanagan and T. Novosad, "Maximizing WCDMA network packet
IEEE Trans. on Signal Processing, vol. 64, pp.580-591, Feb. 2016. traffic performance: multi-parameter optimization by gradient descent
[94] Wichern, G.; Jiachen Xue; Thornburg, H.; Mechtley+, B.; Spanias, A; minimization of a cost function," IEEE PIMRC, v..1, pp. 311-315, 2003.
"Segmentation, Indexing, and Retrieval for Environmental and Natural [122] F. F. Lubis et al.,"Gradient descent and normal equations on cost function
Sounds IEEE Trans on ASLP, Vol. 18, Issue: 3, pp. 688 – 707, 2010. minimization for online predictive using linear regression with multiple
[95] X Bi, S Lee, JF Ranville, P Sattigeri, A Spanias, P Herckes, P variables," 2014 ICISS, Bandung, pp. 202-205, 2014.
Westerhoff, "Quantitative resolution of nanoparticle sizes using single [123] S. K. Lenka et al., "Gradient Descent with Momentum based Neural
particle inductively coupled plasma mass spectrometry with the K- Network Pattern Classification for the Prediction of Soil Moisture Content
means clustering algorithm," J Analy. Atomic Spectr., 29, p. 1630, 2014. in Precision Agriculture," 2015 IEEE iNIS., Indore, p. 63, 2015.
[96] Braun, H. Turaga, P.; Spanias, A., Direct tracking from compressive [124] M. Tivnan et al., "A modified gradient descent reconstruction algorithm
imagers: A proof of concept,” IEEE ICASSP 2014, Florence, 2014. for breast cancer detection using Microwave Radar and Digital Breast
[97] J. J. Thiagarajan, K. N. Ramamurthy, P. Sattigeri and A. Spanias, Tomosynthesis," 2016 10th EuCAP, Davos, pp. 1-4, 2016.
“Supervised local sparse coding of sub-image features for image [125] D. Chen, et al., "Similarity learning on an explicit kernel feature map for
retrieval,” IEEE ICIP 2012, Orlando, Sept. 2012. person re-identification," IEEE CVPR, Boston, pp. 1565, 2015.
[98] P. Sattigeri, J. J. Thiagarajan, M. Shah, K. N. Ramamurthy and A. [126] P. Sahoo et al., "On the study of GRBF and polynomial kernel based
Spanias, "A scalable feature learning and tag prediction framework for support vector machine in web logs," 2013 1st Int. Conf. on Emerging
natural environment sounds," 48th Asilomar Conference on Signals, Trends and Applications in Computer Science, Shillong, 2013, pp. 1-5.
Systems and Computers, Pacific Grove, CA, pp. 1779-1783, 2014. [127] P. Panavaranan and Y. Wongsawat, "EEG-based pain estimation via fuzzy
[99] P. Sattigeri, J. J. Thiagarajan, K. N. Ramamurthy and A. Spanias, logic and polynomial kernel support vector machine," The 6th 2013
"Implementation of a fast image coding and retrieval system using a Biomedical Engg. Int. Conf., Amphur Muang, 2013, pp. 1-4.
GPU," 2012 IEEE ESPA, Las Vegas, NV, pp. 5-8, 2012. [128] S. Yaman et al., "Using Polynomial Kernel SVM for Speaker
[100] P. Sattigeri, J. J. Thiagarajan, K. Natesan Ramamurthy, A. Spanias, M. Verification," in IEEE Sig.Process. Lett., v. 20, pp. 901-904, Sept. 2013.
Goryll and T. Thornton, “Robust PSD Features for Ion-Channel [129] J. Bai et al., "Application of SVM with Modified Gaussian Kernel in A
Signals,” in SSPD, London, UK, 27-29 September 2011. Noise-Robust Speech Recognition System," IEEE Int. Symp. on
[101] A. Spanias, C. Tepedelenlioglu, E.Kyriakides, D. Ramirez, S. Rao, H. Knowledge Acquisition and Modeling Workshop, pp. 502-505, 2008.
Braun, J. Lee, D. Srinivasan, J. Frye, S. Koizumi, Y. Morimoto, "An 18 [130] P. Baldi; S. Brunak, "Gaussian Processes, Kernel Methods, and
kW Solar Array Research Facility for Fault Detection Experiments," SVM," Bioinformatics, Machine Learning, MIT Press, p.387, 2001
Proc. 18th MELECON, Cyprus, April 2016. [131] M. Varewyck et al., "A Practical Approach to Model Selection for Support
[102] A. Gersho and R. M. Gray, Vector quantization and signal compression, Vector Machines With a Gaussian Kernel," in IEEE Trans. on SMC, Part
6th ed. Boston, MA, United States: Kluwer Academic Publishers, 1991. B (Cybernetics), vol. 41, no. 2, pp. 330-340, April 2011.
[103] J. Makhoul et al., "Vector quantization in speech coding," [132] D. Zhang et al., "Time Series Classification Using SVM with Gaussian
in Proceedings of the IEEE, vol. 73, no. 11, pp. 1551-1588, Nov. 1985. Elastic Metric Kernel," ICPR, Istanbul, 2010, pp. 29-32.
[104] M. Shah, C. Chakrabarti and A. Spanias, “Within and cross-corpus [133] J. Tian and L. Zhao, "Weighted Gaussian Kernel with Multiple Widths
speech emotion recognition using latent topic model-based features”, and Support Vector Classifications," Int. Symposium on Info. Engg. and
EURASIP J . Audio, Speech, and Music Processing, 2015:4, Jan. 2015. Electronic Commerce, Ternopil, pp. 379-382, 2009.
[134] Yaohua Tang et al., "Efficient model selection for SVM with Gaussian
[105] A.Spanias, T. Painter, V.Atti, Audio Signal Processing and Coding,
kernel function," 2009 IEEE CIDM, Nashville, TN, pp. 40-45, 2009.
Wiley, March 2007.
[135] A. Betancourt et al., "Filtering SVM frame-by-frame binary classification
[106] Linde Yoseph Buzo Andrés Gray Robert M. "An Algorithm for Vector
in a detection framework," ICIP, Quebec, p. 2552, 2015.
Quantization" IEEE COM-28 No. 1 pp. 84-95 Jan. 1980.
[136] Chen donghui and Liu zhijing, "A new text categorization method based
[107] A.S. Spanias, "Speech Coding: A Tutorial Review," Proceedings of the on HMM and SVM," 2010 2nd Int. Conf. on Computer Engig. and Tech.,
IEEE, Vol. 82, No. 10, pp. 1441-1582, October 1994. Chengdu, 2010, pp. V7-383-V7-386.
[108] E. G. Ularu et al., "Mobile computing and cloud maturity - introducing [137] M. Kumar, M. Gopal, "An Investigation on Linear SVM and its
machine learning for ERP configuration automation," Informatica Variants," Int. Conf. Mach/ Learn.and Comp., Bangalore, p. 27, 2010.
Economica, vol. 17, no. 1/2013, pp. 40–52, Mar. 2013. [138] Z. Wang and X. Qian, "Text Categorization Based on LDA and
[109] I. H. Witten et al., “Data mining: Practical machine learning tools and SVM," 2008 Int. Conf. on Com Sci and Sofe Eng., Hubei, p. 674, 2008..
techniques”, 3rd ed. USA, Morgan Kaufmann Publishers In, 2011. [139] A. Sharma, "Handwritten digit recognition using SVM", eprint
[110] R. Schumacker, Understanding Statistics Using R, S. Tomek, Ed. arXiv:1203.3847, 2012.
Springer Publishing Company, Inc., 2013. [140] D. Gorgevik et al., "Handwritten digit recognition by combining support
[111] G. Webber-Cross, Learning Microsoft azure: A comprehensive guide to vector machines using rule-based reasoning," Proc. 23rd Int. Conf. on
cloud application development using MS azure. UK: Packt Publi., 2014. Info. Tech. Interfaces, pp. 139-144 vol.1, 2001
[112] V. Fontamaet al., “Predictive Analytics with MS azure machine learning; [141] Tuba, Eva et al. "Handwritten digit recognition by SVM optimized by bat
build and deploy solutions in minutes.USA press, 2014. algo." 24th Int Conf WSCG , 2016.
[113] R Development Core Team (2008). R: A language and environment for [142] C. S. Turner, "Slope filtering: An FIR approach to linear regression [DSP
statistical computing. R foundation for Statistical Computing, Vienna, Tips&Tricks]," IEEE Sig, Proc. Mag., pp. 159-163, Nov. 2008.
Austria. URL http://www.R-project.org. [143] Y. T. Chang and K. Cheng, "Sensorless position estimation of switched
[114] Pedregosa, F et al., Scikit-learn: Machine Learning in Python, Journal on reluctance motor at startup using quadratic polynomial regression," in IET
Machine Learning Research 12, p. 2825, 2011. https://scikit-learn.org Electric Power Applications, vol. 7, pp. 618-626, Aug. 2013.
[115] Abadi, M et al., TensorFlow: Large-scale machine learning on [144] E. Masry, "Multivariate regression estimation of continuous-time
heterogeneous systems, 2015. URL https://tensorflow.org processes from sampled data: local polynomial fitting approach," in IEEE
[116] S. Amari, "Backpropagation and stochastic gradient descent method", Trans. on Info. Theory, vol. 45, no. 6, pp. 1939-1953, Sep 1999.
Neurocomputing, vol. 5, no. 4-5, pp. 185-196, 1993. [145] T. Banerjee et al, "PERD: Polynomial-based Event Region Detection in
[117] H. Blockeel, Machine learning and knowledge discovery in databases, Wireless Sensor Networks," 2007 IEEE ICC, pp. 3307-3312, 2007.
Berlin, Springer, 2013. [146] A. Sharmila and P. Geethanjali, "DWT Based Detection of Epileptic
Seizure, ," in IEEE Access, vol. 4, pp. 7716-7727, 2016.
[147] V. Agrawal, et al., "Application of K-NN regression for predicting coal [183] Arpit D et al. "Normalization propagation: A parametric technique for
mill related variables," 2016 ICCPCT, India, pp. 1-9, 2016. removing covariate shift in deep networks." arXiv preprint, 2016.
[148] X. Li et al., "Speech recognition based on k-means clustering and NN [184] T. Salimans et al., "Weight normalization: A reparameterization to
ensembles," 7th Int. Conf. on Natural Comp., Shanghai, p. 614, 2011. accelerate training of neural networks." Adva. Neu. Info. Sys. 2016.
[149] E. C. Ozan, et al., "A vector quantization based k-NN approach for large- [185] J. Ba et al. "Layer normalization." arXiv:1607.06450 (2016).
scale image classification," IPTA, Oulu, pp. 1-6, 2016. [186] P. Loizou and A. Spanias, "High Performance Alphabet Recognition,"
[150] D. Valsesia, P. Boufounos, "Multispectral image compression using IEEE Trans. on Speech and Audio, vol. 4, pp. 439-445, Nov. 1996.
vector quantization," IEEE ITW, , Cambridge, p 151, 2016. [187] S. Rao, S. Katoch, P. Turaga, A. Spanias, C. Tepedelenlioglu, R. Ayyanar,
[151] I. Goodfellow, Y. Bengio and A. Courville, Deep learning, 1st ed. H.Braun, J. Lee, U.Shanthamallu, M. Banavar, D. Srinivasan, "A Cyber-
Cambridge, Mass: The MIT Press, 2017. Physical System Approach for Photovoltaic Array Monitoring and
[152] Y. Bengioet et all, "Representation Learning: A Review and New Control," Proceedings 8th International Conference on Information,
Perspectives," IEEE Transactions on PAMI, vol. 35, p. 1798, Aug. 2013. Intelligence, Systems and Applications (IEEE IISA 2017), Larnaca,
[153] I. Arel et al., "Deep Machine Learning - A New Frontier in Artificial August 2017.
Intelligence Research [Research Frontier]," in IEEE Computational [188] A. Spanias, "Solar Energy Management as an Internet of Things (IoT)
Intelligence Magazine, vol. 5, no. 4, pp. 13-18, Nov. 2010. Application," Proceedings 8th International Conference on Information,
[154] LeCun et al., "Deep learning." nature 521.7553, 2015. Intelligence, Systems and Applications (IEEE IISA 2017), Larnaca,
[155] Krizhevsky et al, “ImageNet classification with deep convolutional NN,” August 2017.
Adv. Neural Info.Process. Sys., vol 25, pp 1090-1098, 2012. [189] Gubbi, Jayavardhana et al., "Internet of Things (IoT): A vision,
[156] F. Rosenblatt, "The perceptron: A probabilistic model for information architectural elements, and future directions. "Future generation
storage in the brain." Psychological review vol 65, pp. 386-408, 1958. computer systems”, Vol. 29, no. 7, pp.1645-1660, 2013.
[157] Kavukcuoglu, Koray et al. "Learning convolutional feature hierarchies for [190] Aldrich and L. Auret, Unsupervised Process Monitoring and Fault
visual recognition." Advances in neural info. process. sys. 2010. Diagnosis with Machine Learning Methods, Springer, 2013.
[158] Mikolov, Tomas et al. "Distributed representations of words and phrases [191] X. Long, B. Yin and R. M. Aarts, "Single-accelerometer-based daily
and their compositionality." NIPS 2013. physical activity classification," in Annual International Conference of the
[159] Schmidhuber, Jürgen. "Deep learning in neural networks: An IEEE Engineering in Medicine and Biology Society, 2009.
overview." Neural networks 61 (2015): 85-117. [192] D. Rajan, A. Spanias, S. Ranganath, M. Banavar, and P. Spanias, "Health
[160] N. Srivastava, et al. "Dropout: A simple way to prevent neural networks Monitoring Laboratories by Interfacing Physiological Sensors to Mobile
from overfitting." J. Machine Learning Res, 15.1 (2014): 1929-1958. Android Devices," in IEEE FIE, 2013.
[161] L. Deng. "A tutorial survey of architectures, algorithms, and applications
[193] J. P. Lynch, "A Summary Review of Wireless Sensors and Sensor
for deep learning." APSIPA Trans. on Signal and Info. Process. 3, 2014.
Networks for Structural Health Monitoring," The Shock and Vibration
[162] Hinton, G et al. “Deep NN for acoustic modeling in speech recognition”,
Digest, vol. 38, no. 2, pp. 91 - 128, 2006.
IEEE Signal Process. Magazine, vol. 29, no. 6, pp. 82-97, 2012.
[194] Hwang, Jong-Sung; Choe, Young Han (February 2013). "Smart Cities
[163] Yu, Dong, and Li Deng. "Deep learning and its applications to signal and
information processing." IEEE Sig. Pro. Mag. 28, no. 1, pp. 145-154, 2011. Seoul: a case study" (PDF). ITU-T Technology Watch. Retrieved 23
[164] Hinton, G.; Salakhutdinov, R. “Reducing the dimensionality of data with October 2016.
neural networks”, Science 313 no. 5786, pp. 504–507, 2006. [195] Zanella, Andrea; Bui, Nicola; Castellani, Angelo; Vangelista, Lorenzo;
[165] Yu, Dong, and Li Deng. Automatic speech recognition: A deep learning Zorzi, Michele (February 2014). "Internet of Things for Smart Cities".
approach. Springer, 2014. IEEE Internet of Things Journal. 1(1): 22–32. Retrieved 26 June 2015.
[166] Abdel-Hamid, Ossama, and Hui Jiang. "Fast speaker adaptation of hybrid [196] Sensor networks and the smart campus," 2014. [Online]. Available:
NN/HMM model for speech recognition based on discriminative learning
https://beaverworks.ll.mit.edu/CMS/bw/smartcampusfuture. Accessed:
of speaker code." Proc. IEEE ICASSP 2013, Vancouver, 2013.
[167] Szegedy, Christian et al. "Going deeper with convolutions." Proceedings Dec. 12, 2016.
of the IEEE Conf. on Computer Vision and Pattern Recognition. 2015. [197] P. Bellavista et al, “Convergence of MANET and WSN in IoT urban
[168] H. Song, et al., "Auto-context modeling using multiple Kernel scenarios,” IEEE Sens. J., vol. 13, no. 10, pp. 3558–3567, Oct. 2013.
learning," 2016 IEEE ICIP, pp. 1868-1872, Phoenix, Sep. 2016. [198] Andrea Zanella et al, “Internet of Things for Smart Cities”, IEEE Internet
[169] Bengio, Yoshua et al., "Advances in optimizing recurrent Of Things Journal, Vol. 1, No. 1, February 2014.
networks." Proc. IEEE ICASSP, 2013, Vancouver, 2013. [199] S. Miller, X. Zhang, A. Spanias, Multipath Effects in GPS Receivers,
[170] Salakhutdinov et al., “Deep boltzmann machines.” Proceedings of the int.
Synthesis Lectures on Communications, Morgan & Claypool
conf. on AI and statistics. vol. 5, no. , Cambridge, MIT Press, 2009.
[171] Song, Huan, J. Jayaraman, A. Spanias, "A Deep Learning Approach To Publishers, ISBN 978-1627059312, Ed. W. Tranter, No. 1 , Dec. 2015.
Multiple Kernel Fusion." Proc. ICASSP 2017, New Orleans. [200] X. Zhang, C. Tepedelenlioglu, M. Banavar, A. Spanias, Node
[172] Mikolov, T et al.: Recurrent neural network based language model, in Localization in Wireless Sensor Networks, Synth Lectures on
Proc. IEEE ICASSP, 2010, 1045–1048. Communications, Morgan & Claypool Publ., ISBN: 9781627054850,
[173] Mesnil, G et al., Investigation of RNN architectures and learning methods Ed. W. Tranter, Dec. 2016.
for spoken language understanding, Proc. Interspeech, 2013.
[201] Quoc-Huy Phan andSu-Lim Tan, "Mitigation of GPS periodic multipath
[174] Kobylinski, Kris et al., "Enterprise application development in the cloud
with IBM Bluemix." Proc 24th Conf Comp. Sc. Soft. Eng. IBM. , 2014. using nonlinear regression,” 19th European Signal Processing
[175] Gheith, A et al., "IBM Bluemix Mobile Cloud Services." IBM Journal of Conference, Barcelona, 2011.
Research and Development 60.2-3 (2016): 7-1. [202] www.ifixit.com/Teardown/Amazon+Echo+Teardown/
[176] Klein, Scott. "Azure Machine Learning." IoT Solutions in Microsoft's
Azure IoT Suite. Apress, 2017. 227-252.
[177] Walt, Stéfan van der, et al. "The NumPy array: a structure for efficient
numerical computation." Comp. in Science & Eng. 13.2 (2011): 22-30.
[178] McKinney, Wes. Python for data analysis: Data wrangling with Pandas,
NumPy, and IPython. " O'Reilly Media, Inc.", 2012.
[179] G. Hackeling. Mastering ML with scikit-learn. Packt Publi., 2014.
[180] Van Rossum, Guido. "Python Programming Language." USENIX Annual
Technical Conf. Vol. 41. 2007.
[181] Chollet, François. "Keras: Deep learning library for theano and
tensorflow." URL: https://keras. io/k (2015).
[182] I. Sergey, C. Szegedy. "Batch normalization: Accelerating deep network
training by reducing internal covariate shift." arXiv:1502.03167 (2015).

You might also like