0% found this document useful (0 votes)
111 views12 pages

Iot and Machine Learning

The document discusses machine learning and the Internet of Things (IoT). It covers: 1) How machine learning techniques are applied to data collected from connected IoT devices to enhance intelligence and capabilities of applications. 2) Popular machine learning topics for research like deep learning, natural language processing, and reinforcement learning. 3) Challenges of machine learning like the "black box" problem, high complexity, and difficulty in data collection and cleaning. Solutions like using cross-validation and training neural networks on solutions to reduce computational costs are discussed.

Uploaded by

Amogh Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views12 pages

Iot and Machine Learning

The document discusses machine learning and the Internet of Things (IoT). It covers: 1) How machine learning techniques are applied to data collected from connected IoT devices to enhance intelligence and capabilities of applications. 2) Popular machine learning topics for research like deep learning, natural language processing, and reinforcement learning. 3) Challenges of machine learning like the "black box" problem, high complexity, and difficulty in data collection and cleaning. Solutions like using cross-validation and training neural networks on solutions to reduce computational costs are discussed.

Uploaded by

Amogh Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

IOT AND

MACHINE
LEARNING
Area of Research

 With the rise of the Internet of Things (IoT), applications have become smarter and
connected devices give rise to their exploitation in all aspects of a modern city. As the
volume of the collected data increases, Machine Learning (ML) techniques are applied to
further enhance the intelligence and the capabilities of an application.
 Smart phones, embedded systems, wireless sensors, and almost every electronic device are
connected to a local network or the internet, leading to the era of the Internet of Things
(IoT). With the number of devices increasing, the amount of data collected by those devices
is increasing as well. New applications emerge that analyze the collected data to make
meaningful correlations and possible decisions, leading to Artificial Intelligence (AI) via
Machine Learning (ML) algorithms.
 In recent years, researchers have developed and applied new machine learning technologies.
These new technologies have driven many new application domains. Before we discuss that,
we will first provide a brief introduction to a few important machine learning technologies,
such as deep learning, reinforcement learning, adversarial learning, dual learning, transfer
learning, distributed learning, and meta learning.
Hot topics in Machine Learning for thesis and research:
 Deep Learning
 Human-computer interaction
 Genetic Algorithm
 Image Annotation
 Reinforcement Learning
 Natural Language Processing
 Supervised Learning
 Unsupervised Learning
 Support Vector Machines(SVMs)
 Sentiment Analysis
Literature Survey
 Rob Law (1998) [7] applies neural networks to forecasts occupancy rates for the rooms of
Hong Kong hotels and finds that neural networks outperforms naïve extrapolation model and
also superior to multiple regression. This research studied the feasibility incorporating the
neural network to predict the rate of occupancy of rooms in Hong Kong hotel industry.
 Authors Hua et al. (2006) [8] described support vector machines approach to predict
occurrences of non zero demand or load time demand of spare parts which used in
petrochemical enterprise in china for inventory management. They used a integrated
procedure for establishing a correlation of explanatory variables and autocorrelation of time
series of demand with demand of spare parts. On performing the comparison the
performance of SVM method with this LRSVM model, Croston’s model , exponential smoothing
model, IFM method and Markov bootstrapping procedure., it performs best across others.
 Authors Vahidov et al. ( 2008) [9] compares the methods of predicting demand in the last of
a supply chain, the naive forecasting and linear regression and trend moving average with
advanced machine learning methods such as neural networks and support vector machines,
recurrent neural networks finds that recurrent neural networks and support vector machines
show the best performance
 Chen Hung et al. (2014) [14] proposes forecasting model for tourists arrival of Taiwan and
Hong Kong named as LLSSVR or logarithm least-squares support vector regression technologies.
In combinations with fuzzy c-means (FCM) and Genetic algorithms (GA) were optimally used
and indicates that method explains a better performance to other methods in terms of
prediction.
 Guang-Bin Huang et al. (2015) [15] explores the basic features of ELMs such as kernels ,
random features and random neurons, compares the performance of ELMs and shows it tend to
outshine classification, support vector machine and regression applications
 Wang et al. (2016) [16] proposed a novel forecasting method CMCSGM based Markov-chain
grey model which used algorithm of Cuckoo search optimization to make better the
performance of the Markov chain grey model. The resultant study indicates that the given
model is systematic and fine than the traditional MCGM models.
 Barzegar et al. (2017) [17] demonstrates model predict multi-step ahead electrical
conductivity i.e. indicator of water quality which is needed for estimating the mineralization,
purification and salinity of water based on wavelet extreme learning machine hybrid or WAELM
models and extreme learning machine which exploiting the boosting ensemble method. The
findings showed that upgrading multi WA ELM and multi WAANFIS ensemble models outshines
the individual WAELM and WA ANFIS constructions.
 Authors Fouilloy et al. (2018) [18] suggested a statistical method employing machine learning
model and to analyze and applied it to solar irradiation prediction working hourly. This
methodology used the high, low and medium meteorological variability like Ajacio, Odeillo ,
Tilos . They compared model with auto regressive moving average and multi-layer preceptor .
 Wang (2007) [10] describes the machine learning method with genetic algorithm (GA)-
SVR with real value GAs, . The experimental findings investigates this , SVR outshines the
ARIMA models and BPNN regarding the base the normalized mean square error and mean
absolute percentage error .
 Authors Chen et al. (2011) [11] presents a method forecast the tourism demands that is
SVR built using chaotic genetic algorithm (CGA), like SVRCGA, which overcome
premature local optimum problem. This paper reveal that suggested SVRCGA model
outclass other methodologies reviewed in the research paper.
 Turksen et al. (2012) [12], presents next-day stock price prediction model which is
based on a four layer fuzzy multi agent system (FMAS) structure. This artificial
intelligence model used the coordination of intelligent agents for this task. Authors
investigates that FMAS is a suitable tool for stock price prediction problems as it
outperforms all previous methods.
 Shahrabi et al. (2013) [13] proposed a method for estimating tourism demand which is a
new combined intelligent model i.e. Modular Genetic-Fuzzy Forecasting System using a
genetic fuzzy expert systems and finds that accuracy of predicting power of MGFFS is
better than approaches like Classical Time Series models , so it is suitable estimating
tool in tourism demand prediction problems.
 Chen Hung et al. (2014) [14] proposes forecasting model for tourists arrival of Taiwan
and Hong Kong named as LLSSVR or logarithm least-squares support vector regression
technologies. In combinations with fuzzy c-means (FCM) and Genetic algorithms (GA)
were optimally used and indicates that method explains a better performance to other
methods in terms of prediction.
Challenges in Machine learning
While there has been much progress in machine learning, there are also challenges :
 Black Box Problem: The black box is a challenge for in-app recommendation services. It turns out that web
application users feel more comfortable when they know more or less how the automatic suggestions work. 
 High Complexity: The computational complexity of machine learning algorithms is usually very high and we
may want to invent lightweight algorithms or implementations.  
 Tool Chaos: You need time to achieve any satisfying results and planning is difficult. Machine learning takes
much more time. You have to gather and prepare data, then train the algorithm. There are much more
uncertainties.
 Expensive: Data is not free at all.  It requires time to collect a sufficient amount of data. Moreover, buying
ready sets of data is expensive. 
 Isolated Data: One can store all data required but it is very difficult to collect all data as we have to collect
data from different locations.
 Data Cleaning: Businesses typically face challenges in feeding the right data to machine learning
algorithms or cleaning of irrelevant and error-prone data. In other words, when it comes to utilizing ML
data, most of the time is spent on cleaning data sets or creating a dataset that is free of errors
Solutions of Challenges in Machine
Learning
Tool Chaos Problem Solution:
 Since its formulation by Sir Isaac Newton, the problem of solving the equations of
motion for three bodies under their own gravitational force has remained practically
unsolved. Currently, the solution for a given initialization can only be found by
performing laborious iterative calculations that have unpredictable and potentially
infinite computational cost, due to the system's chaotic nature. We show that an
ensemble of solutions obtained using an arbitrarily precise numerical integrator can
be used to train a deep artificial neural network (ANN) that, over a bounded time
interval, provides accurate solutions at fixed computational cost and up to 100 million
times faster than a state-of-the-art solver. Our results provide evidence that, for
computationally challenging regions of phase-space, a trained ANN can replace
existing numerical solvers, enabling fast and scalable simulations of many-body
systems to shed light on outstanding phenomena such as the formation of black-hole
binary systems or the origin of the core collapse in dense star clusters.
Cross Validation
 In cross-validation, all the available or chosen data is not used in training the model. There are usually three
folds that help in performing the cross-validation method- the training data, test data, and validation
dataset. You can use a combination of Training and Test data alone, or use all three data folds.
 [Training data = for model training
 Test data = for model hyperparameter tuning
 Validation data = for model validation and accuracy estimation]
 There are many ways to work with these folds, and The Training data is usually 60% of the total dataset, the
test dataset will be 20% and, the validation data set comprises of the remaining 20%.
 The quality of the trained model is tested by first training the model by using just the training data, and then
compare that model with the model that is trained with the test data. In this manner, we can identify which
data points bring about a better prediction. There are many variations of cross-validation:
 Hold-Out: The data is divided into test-data and training data and later compared. In Hold-Out method, we
use only one set of training data that is kept on hold.
 100 samples, 60 training, 20 test, and the 20 in validation dataset. During the training you calculate the
accuracy of the model. Test is to test accuracy after training the model.
 K-Fold Cross Validation: Here data is divided into k-sets. Then the first set or first fold is the validation data
set, and the first fold is removed from the the total number of folds( where, suppose k=10). For each
iteration, we take one fold for validation (the 9th, after the first iteration (k-1)) and then subtract it from
the now remaining total sets of folds (now k=9). This method is effective yet requires huge computational
power.
Data Cleaning:
 The main aim of Data Cleaning is to identify and remove errors & duplicate
data, in order to create a reliable dataset. This improves the quality of the
training data for analytics and enables accurate decision-making. There are
various methods to identify and classify data for data cleansing:
 Data Cleaning consists of two basic stages, first is error identification and
second is error solving. For any data cleaning activity, the first step is to
identify the anomalies.
 No matter which technique you employ to analyze errors, your technique
should involve three questions that should be addressed:
 What types of errors to identify?
 How to identify these errors?
 Where to identify these errors?
 Answering these questions will help you clean the data and improve the
quality of your machine learning data. Apart from this, there are certain
best practices that can be used for data error identification and data
cleaning.

You might also like