Deployment of Analytics Solutions - Module VII - Students
Deployment of Analytics Solutions - Module VII - Students
Module VII
Anomaly Detection, Predictive Analytics and
Streaming Analytics, integrating analytics models
cloud/edge methods
By
Dr Shola Usharani
Anomaly detection
What is anomaly detection?
• An unexpected change within the data patterns, or an event that does not
conform to the expected data pattern, is considered an anomaly.
• Anomaly detection identifies data points, events, and/or observations
that deviate from a dataset’s normal behavior.
• In other words, an anomaly is a deviation from business as usual.
• Anomalous data can indicate critical incidents, such as a technical glitch,
or potential opportunities, for instance a change in consumer behavior.
• Machine learning is progressively being used to automate anomaly
detection.
• Definition
– Within this dataset are data patterns that represent business as usual. An unexpected
change within these data patterns, or an event that does not conform to the expected data
pattern, is considered an anomaly.
Why is anomaly detection?
• In IoT data, signal time series are produced by sensors strategically located on or around
a mechanical component.
• A time series is the sequence of values of a variable over time.
• In this case, the variable describes a mechanical property of the object, and it is
measured via one or more sensors.
• Usually, the mechanical piece is working correctly.
• As a consequence, we have tons of examples for the piece working in normal
conditions and close to zero examples for the piece failure.
• This is especially true if the piece plays a critical role in a mechanical chain because it is
usually retired before any failure happens and compromises the whole machinery.
• In IoT, a critical problem is to predict the chance of a mechanical failure before it actually
happens.
• In this way, we can use the mechanical piece throughout its entire life cycle without
endangering the other pieces in the mechanical chain
In enterprise IT, anomaly detection is
commonly used for:
– Data cleaning
– Intrusion detection
– Fraud detection
– Systems health monitoring (machine failures)
– Event detection in sensor networks (anomalies in
healthacare sensor devices)
– Ecosystem disturbances
• air quality measurements, water levels, or temperature fluctuations, to identify potential environmental hazards, pollution
incidents, or natural disasters
Anomaly detection with ML
• Machine learning, suits the engineer’s
purpose to create an AD system that:
– Works better
– Is adaptive and on time
– Handles large datasets
– apply a machine learning algorithm to predict or recognize deterioration of mechanical
pieces, or to detect cybersecurity breaches
• Healthcare
• Manufacturing
• Agriculture
• Energy
• Smart homes
• Transportation
Reference
• https://www.digiteum.com/iot-data-
collection/
Types of IoT Analytics
• Descriptive analytics on IoT data
• Diagnostic analytics on IoT data
• Predictive analytics on IoT data
• Prescriptive analytics on IoT data
Predictive analytics on IoT data
• Raises the question: what will happen?
• Assesses the likelihood that something will happen within a specific
timeframe, according to historical data.
• The aim is to proactively take corrective action before an undesired
outcome occurs, to mitigate risk, or to isolate opportunities.
• Typically implemented via machine learning models that are
trained with historical data, and stationed on the cloud so that they
can be accessed by end-user applications.
• Addresses questions such as:
– What’s the likelihood of this machine failing in the next 24 hours?
– What is the anticipated useful life of this tool?
– When should I service this machine?
– What will be the demand for this feature or product?
Top 5 Predictive Analytics Models
• Classification Model
• Clustering Model
• Forecast Model
• Outliers Model
• Time Series Model
Classification Model
• The classification model is the simplest of the several types of
predictive analytics models.
• It puts data in categories based on what it learns from
historical data.
• Classification models are best to answer yes or no questions,
providing broad analysis that’s helpful for guiding decisive
action.
• These models can answer questions such as:
– For a retailer, “Is this customer about to churn?”
– For a loan provider, “Will this loan be approved?” or “Is this
applicant likely to default?”
– For an online banking provider, “Is this a fraudulent
transaction?”
• The breadth of possibilities with the classification model—and
the ease by which it can be retrained with new data—means it
can be applied to many different industries.
Clustering Model
• The clustering model sorts data into separate, nested smart groups
based on similar attributes.
– If an ecommerce shoe company is looking to implement targeted
marketing campaigns for their customers, they could go through the
hundreds of thousands of records to create a tailored strategy for each
individual.
– But is this the most efficient use of time? Probably not.
• Using the clustering model, they can quickly separate customers
into similar groups based on common characteristics and devise
strategies for each group at a larger scale.
• Other use cases of this predictive modeling technique might include
grouping loan applicants into “smart buckets” based on loan
attributes, identifying areas in a city with a high volume of crime,
and benchmarking SaaS customer data into groups to identify global
patterns of use.
Forecast Model
• One of the most widely used predictive analytics models, the forecast
model deals in metric value prediction, estimating numeric value
for new data based on learnings from historical data.
• This model can be applied wherever historical numerical data is
available. Scenarios include:
– A SaaS company can estimate how many customers they are likely to
convert within a given week.
– A call center can predict how many support calls they will receive per
hour.
– A shoe store can calculate how much inventory they should keep on
hand in order to meet demand during a particular sales period.
• The forecast model also considers multiple input parameters.
• If a restaurant owner wants to predict the number of customers she is
likely to receive in the following week, the model will take into account
factors that could impact this, such as: Is there an event close by? What
is the weather forecast? Is there an illness going around?
Outliers Model
• The outliers model is oriented around anomalous data entries within a
dataset.
• It can identify anomalous figures either by themselves or in conjunction
with other numbers and categories.
– Recording a spike in support calls, which could indicate a product failure that might
lead to a recall
– Finding anomalous data within transactions, or in insurance claims, to identify fraud
– Finding unusual information in your NetOps logs and noticing the signs of impending
unplanned downtime
• The outlier model is particularly useful for predictive analytics in retail
and finance.
• For example, when identifying fraudulent transactions, the model can
assess not only amount, but also location, time, purchase history and the
nature of a purchase (i.e., a $1000 purchase on electronics is not as likely to
be fraudulent as a purchase of the same amount on books or common
utilities).
Time Series Model
• The time series model comprises a sequence of data points captured, using
time as the input parameter.
• It uses the last year of data to develop a numerical metric and predicts the next
three to six weeks of data using that metric.
• Use cases for this model includes the number of daily calls received in the
past three months, sales for the past 20 quarters, or the number of patients
who showed up at a given hospital in the past six weeks.
• It is a potent means of understanding the way a singular metric is developing over
time with a level of accuracy beyond simple averages.
• It also takes into account seasons of the year or events that could impact the
metric.
• However, growth is not always static or linear, and the time series model can
better model exponential growth and better align the model to a company’s trend.
• It can also forecast for multiple projects or multiple regions at the same time
instead of just one at a time.
References
• https://insightsoftware.com/blog/top-5-
predictive-analytics-models-and-algorithms/
References
• https://www.anodot.com/blog/what-is-
anomaly-detection/
• https://www.bmc.com/blogs/machine-
learning-anomaly-detection/
Streaming
Analytics
What is streaming or real-time
analytics?
• Streaming analytics or real-time analytics is
a type of data analysis that presents real-time
data and allows for performing simple
calculations with it.
• Working with real-time data involves different
mechanisms as compared to working with
historical data.
• It uses a specific type of processing large
amounts of constantly updating data,
called stream processing. Source: www.altexsoft.com/blog/real-time-analytics
Batch processing (Traditional
analytics processing)
• Traditional analytics mean business
intelligence (BI) methods and technical
infrastructure.
• BI is a practice of supporting data-driven
business decision-making.
• It mainly focuses on historical data, which in
most cases doesn’t lose its importance or
relevance.
Source: www.altexsoft.com/blog/real-time-analytics
Source: www.altexsoft.com/blog/real-time-analytics
Batch processing architecture
• Historical data is stored as a stable unit
that can be divided into pieces.
• In the process of ETL and warehousing, the
data is moved and processed by batches.
• A batch has to be queried by a user or a
software program.
• So, the system would understand when to
fetch data and which pieces of it, how to
process it, and present it to the end user.
Source: www.altexsoft.com/blog/real-time-analytics
Stream processing
• Streaming processing deals with data streams.
• A data stream is a constant flow of data, which updates with
high frequency and loses its relevance in a short period of
time.
• For example, these could be transactional data, information
from IoT devices, hardware sensors, etc.
• As data streams have no beginning or end, they can’t be
broken into batches.
• So there is no time when the data can be uploaded into
storage and processed.
• Instead, data streams are processed on the fly.
Source: www.altexsoft.com/blog/real-time-analytics
Source: www.altexsoft.com/blog/real-time-analytics
How Does Stream Analytics Work?
• Streaming analytics, also known as event stream processing, is the
analysis of huge pools of current and “in-motion” data through the
use of continuous queries, called event streams.
• These streams are triggered by a specific event that happens as a
direct result of an action or set of actions, like a financial
transaction, equipment failure, a social post or a website click or
some other measurable activity.
• The data can originate from the Internet of Things (IoT),
transactions, cloud applications, web interactions, mobile devices,
and machine sensors.
• By using streaming analytics platforms, organizations can extract
business value from data in motion just like traditional analytics
tools would allow them to do with data at rest.
• Real-time streaming analytics help a range of industries by spotting
opportunities and risks.
Source: databricks.com/glossary/streaming-analytics
Source: databricks.com/glossary/streaming-analytics
The Advantages of Streaming
•
Analytics
Data visualization. Keeping an eye on the most important
company information can help organizations manage their key
performance indicators (KPIs) on a daily basis. Streaming data can
be monitored in real time allowing companies to know what is
occurring at every single moment
• Business insights. In case an out of the ordinary business event
occurs, it will first show up in the relevant dashboard. It can be used
in cybersecurity, to automate detection and response to the threat
itself. This is an area where abnormal behavior should be flagged
for investigation right away.
• Increased competitiveness. Businesses looking to gain a
competitive advantage can use streaming data to discern trends and
set benchmarks faster. This way they can outpace their competitors
who are still using the sluggish process of batch analysis.
Source: databricks.com/glossary/streaming-analytics
• Cutting preventable losses. With the help of
streaming analytics, we can prevent or at least
reduce the damage of incidents like security
breaches, manufacturing issues, customer churn,
stock exchange meltdowns, and social media
crisis.
• Analyzing routine business operations.
Streaming analytics offers organizations an
opportunity to ingest and obtain an instant
insight from the real-time data that is pouring in.
Source: databricks.com/glossary/streaming-analytics
Example Scenario
• In a smart agriculture setting, IoT sensors are
deployed throughout a farm to monitor
various environmental conditions and
optimize crop production. The farm owner,
Jane, utilizes an IoT data stream application to
collect, analyze, and visualize real-time data
from these sensors to make informed
decisions and maximize crop yield while
minimizing resource usage.
Reference
• https://databricks.com/glossary/streaming-
analytics
• https://www.altexsoft.com/blog/real-time-
analytics/
• https://cloud.google.com/learn/what-is-
streaming-analytics
• https://cloud.google.com/blog/products/data
-analytics/how-streaming-data-analytics-
works-for-real-time-processing
Performance or
effectiveness of
analytical models :
Checking the
chosen model is
predicts correctly
• You can train your supervised machine
learning models all day long, but unless you
evaluate its performance, you can never
know if your model is useful.
Why evaluation is necessary?
• A machine learning model can be trained
extensively with many parameters and new
techniques, but as long as you are skipping
its evaluation, you cannot trust it.
How to read the Confusion Matrix?
• A confusion matrix is a correlation
between the predictions of a model and the
actual class labels of the data points.
● used for devices that support HTTP and TCP/IP and can therefore expose a web API
● useful when a device can directly connect to the internet; for example, it uses Wi-Fi or Ethernet
● resource-constrained devices can use non-web protocols to talk to a more powerful device (the gateway), which then exposes a
REST API for those non-web
● Devices useful for devices that can’t connect directly to the internet; for example, they support only Bluetooth or ZigBee
● can’t serve HTTP requests directly
● powerful and scalable web platform to act as a gateway useful for any
● device that can connect to a cloud server over the internet, regardless of whether it uses HTTP or not,
Direct Integration pattern (REST
implementation)
Used :
when the device isn’t battery
powered and when direct
access from clients such as
mobile web apps is required.
Example : home automation,
where power is usually available
and low-latency local
interactions are important—for
instance, turning lights on/off.
Disadvantages of Direct Integration
much of a power drag
Battery powered devices, Wi-Fi or Ethernet need
much of a power.
intermediate devices or Proxy devices as
Gateways (Gateway Integration)
● an intermediary devices that can expose the device’s
functionality through a web based API
○ These intermediaries are called application gateways or IoT
gateways
● they can talk to Things using any non-web application
protocols and then translate those into a clean REST API that
any HTTP client can use.
● Some gateways can add a layer of security or authentication,
aggregate and store data temporarily, expose semantic
descriptions for Things
● Example :
● a CoAP device using an HTTP and WebSockets API.
CoAP device
● CoAP is an interesting protocol based on REST
● It isn’t HTTP and uses UDP instead of TCP,
● It is a gateway that translates CoAP messages from/to
HTTP is needed.
● device-to-device communication over low-power radio
communication
● can’t talk to a CoAP device from a JavaScript application
in your browser without installing a special plugin or
browser extension.
● Example
○ Pi as a gateway to CoAP devices
Application
mobile phone (Gateway) to upload the data from
your Bluetooth bracelet, by bridging/translating
various protocols.
Cloud Integration Pattern
Need of Cloud Integration Pattern
● you need to manage large quantities of devices
and data
● need to have a much more powerful
environment
● and scalable platform to store and process the
data.
● Example :
○ Industry Automation, smart city, logistic support
Introduction to Cloud Integration
pattern
It is an extension of the gateway pattern where
the gateway is a remote server that devices and
applications access via the internet
Advantages of cloud Integration pattern
● it doesn’t have the physical constraints of devices and gateways,
● it’s much more scalable and can process and store a virtually unlimited amount
of data.
● A cloud platform to support many protocols at the same time, handle protocol
translation
● support many more concurrent clients than an IoT device platform
● Building complex applications in a fraction of the time due to many features that
might take considerable
● time to build from scratch, from industry-grade security, to specialized analytics
capabilities, to flexible data visualization tools and user and access management.
● platforms are natively connected to the web, data and services from
● your devices can be easily integrated into third-party systems to extend your
devices.
Supported cloud computing tools
Xively
THingWorx
THingSpeak
Carriots
thethings.io
Problems in IoT data analysis
• Bring analytics closer to IoT data source in
order to remove unnecessary delays.
• bringing analytics closer to IoT data source
puts forth a new set of challenges, including
limitation of power, storage and computing
resources.
IoT Cloud/edge Computing
technologies
Issues in cloud technology
• CPUs are not sufficient to process all data and
provide accurate and real time results in time
• Optimal decisions based on comprehensive SA
(Situational Awareness) are needed
– Surrounding devices data and options
• Not all data is available at a node
• Collected data size is too big to compute at one
node
• Computing at mobile devices is very less
– Limited battery energy restricts fast & large amounts
of computing
IoT Integration computing
technologies
• Mobile cloud computing
• Mobile edge computing
Mobile cloud computing
• Definition :
– MCC (Mobile Cloud Computing) allows cloud
computing to mobile and IoT users
• UE (User Equipment) and IoT systems can use
power computing and storage resources of a
distant CC (Centralized Cloud) through a CN
(Core Network) of a mobile operator and the
Internet
Mobile cloud
computing
:: Cloud computing
on Mobile
communication -4G
( LTE) Networks
Mobile cloud computing technology
• Accessing huge amounts of data from
different IoT devices to cloud
• Cloud computing is available at internet and is
very far away from the IoT devices
Operation of MCC
• Distance from the UE to cloud
– Operations from UE to base band nodes then to
LTE networks finally to cloud by internet
– Distance of data to travel is very long
Advantages of MCC compared
traditional cloud computing
• Extends battery lifetime by Offloading energy
consuming computations to the cloud
• Enables faster sophisticated application
support (i.e., IaaS, PaaS, SaaS) to mobile users
and IoT systems
– Compute results and send because of service
models
• Provides massive data storage to mobile users
and IoT systems
Drawbacks of MCC
• Conventional MCC
• Cloud services to the mobile device are accessed via
the Internet connection
• Conventional MCC Characteristics
– Long delay time & QoS performance is low
– High usage of the network resources
– High battery usage of IoT & Smart Devices
• Need of the technology :
– Fast , reliable and quickly responses need to bring the
cloud technology near to the devices
• Edge computing
– Brining the cloud resources near to the edge of the network where we
are and the network to use it efficiently is edge computing
Mobile edge computing
EDGE COMPUTING
Edge computing
• Computing & storage resources are moved
closer to the edge (i.e., near the BS (Base
Station) or AP (Access Point)) of the network
closer to the UE
• Edge Computing Characteristics
– Very short delay time with high QoS support
– Low usage of the network resources
– Low battery usage of IoT & Smart Devices
Comparison of MCC & Edge computing
Technical aspect MCC Edge computing
Deployment Centralized Distributed
Distance to the UE Far Close
Latency Long Short
Jitter High Low
Computational Power Abundant Limited: uses small devices
Storage Capacity Abundant Limited
IoT & Mobile edge technology
Type of cloud technologies in edge
computing for IoT
• Fog Computing
• Mobile Edge computing (MEC)
• Cloudlet
IoT & Mobile cloud technology
Fog Computing
Definition
• Fog Computing or Fog Networking
• Uses one or more collaborative end-user
– Clients or near-user Edge devices
• Fog support services
– Substantial amount of storage
• Instead of storing in a cloud data center (stores locally)
– Communications
• Instead of routing over the Internet backbone
– Control, configuration, measurement and
management through Fog Device
• Instead of being controlled by Internet gateways or LTE S-
GWs or the P-GW
Characteristics
• Decentralized Computing infrastructure is based on FCNs (Fog
Computing Nodes)
• FCNs can be placed anywhere between the end devices and cloud
• FCNs are heterogeneous (not one form, one protocol and one
structure) in nature, and can include various functional elements
(some as storing, memory, services)
– Routers, switches, AP (Access Point), IoT gateways, set-top box, etc.
– We can have the above components can be converted as Fog device.
• FCN heterogeneity enables support for different protocols as well as
IP and non-IP based access to Cloud services
• Uniform Fog abstraction layer has functions to support resource
allocation, monitoring, security, device management, storage, and
compute services for various types of end devices
Mobile Edge Computing(MEC)
• The MEC is based on mobile based networks
• It can use wifi based networks on MEC node
as MEC Host that had application inside
MEC
Definition
• MECs enable cloud computing and IT services at the edge of the
mobile cellular network
• Why use MECs?
– Faster cloud services to mobile UEs
– It Reduces network congestion because
• MCC every user request need to go to internet but that is not be done here
– Number of users connected is directly proportional to the traffic.
• Stores the information locally
• Delivering the content by storing the data that would be used by the clients
later
• Delivering the content to the user quickly
• If it finds the information available share it among the users
– More reliable application support
• If a need of transmit a packet, as MEC is near to UE device, it will deliver very
effectively and reliable data transfer support
• No MEC, need to cloud to retransmit the packets
MEC Controller
• MEO (Mobile Edge Orchestrator)
– Manages MEC hosts
– Controls information flow for services offered by
each MEC host
– Controls resources and network topology
– Manages the Mobile Edge applications
Cloudlet
• Access through wifi networks and inside with
virtual machines
• Mainly with cloud technology
Cloudlet
A cloud
server
interconne
cted with
modem
device
May be as
single
device
Connecting as
cloudlet Host
device with all
other local
devices
Definition
• Mobility-enhanced small-scale cloud data centre
• Located at the edge of the Internet nearby the mobile devices
– In some cases Some mobile devices could be acting as cloudlets
• Supporting resource-intensive and interactive mobile applications
• Provides powerful computing resources to mobile devices with low
latency
• Used between UE → Cloudlet→ Cloud
– Middle of the mobile based cloud computing architecture
• Cloudlet is a data center in a box that brings the cloud closer to the
UE
– Really like cloud but as limited features and is miniature cloud device
providing Wi-Fi technology features etc.
• Cloudlets use a VM (Virtual Machine) to provision resources for
UEs in real-time over Wi-Fi networks
– Direct one-hop Cloud access →Low latency
Cloudlet Architecture: 3 Layers
• Cloudlet layer
– Group of co-located nodes
– Managed by a Cloudlet Agent
• Node layer
– Multiple Execution Environment(s) running on top of
the OS (Operating System)
– Managed by a Node Agent
• Component layer
– Includes a set of services that interface to the (higher
layer) execution environment
comparison
Scenario
• Design the smart logistics for any application
based on cloud based, edge based techniques.
– Write about cloud based technique based on
scenario and its diagram
– Write about edge based technique based on
scenario and its diagram
Important topics for FAT
• Module-I:
– Data modelling
– Semantic data modelling
– Information modelling
• Module-II
– Tagging the data
– Processing large data set
– Predictions models
• Module-III
– Problems based on Decision tree & K-Means & linear regressions
– Stages of data life models
• Module-IV
– Requirements gathering
– Requirements tracing
– Use cases, problem statement & user stories
– Use case development
• Module-V
– Value engineering importance and its stages, sdlc models, design of IoT hardware
• Module-VI
– Application of analytics, data generation to analytics, EDA
• Module-VII
– Anamoly detection, predictive analytics, streaming analytics, performace analysis & integration methods,IoT data collection
Reference
• IoT Wireless & Cloud Emerging Technology
(course era) by Prof. Jong-Moon Chung