0% found this document useful (0 votes)
266 views74 pages

UNIT-4-IOT Notes

Structured and unstructured data from IoT sensors pose challenges for data analytics. Modern tools are needed to analyze massive amounts of both structured sensor readings and unstructured data like images. Machine learning is widely used to classify patterns in IoT data and take intelligent actions. Both supervised and unsupervised learning help analyze IoT data to detect anomalies, predict failures, and group similar readings.

Uploaded by

Jay Ram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
266 views74 pages

UNIT-4-IOT Notes

Structured and unstructured data from IoT sensors pose challenges for data analytics. Modern tools are needed to analyze massive amounts of both structured sensor readings and unstructured data like images. Machine learning is widely used to classify patterns in IoT data and take intelligent actions. Both supervised and unsupervised learning help analyze IoT data to detect anomalies, predict failures, and group similar readings.

Uploaded by

Jay Ram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 74

Data and Analytics for

IoT
UNIT- 4
An Introduction to Data Analytics for IoT

 In the world of IoT, the creation of massive amounts of


data from sensors is common and one of the biggest
challenges— not only from a trANsport perspective but
also from a DATA MANAgement standpoint

 Modern jet engines are fitted with thousands of sensors


that generate a whopping 10GB of DATA per second

 Analyzing this amount of data in the most efficient manner


possible falls under the umbrella of data analytics
 Not all data is the same; it can be categorized and thus
analyzed in different ways.

 Depending on how data is categorized, various data


analytics tools and processing methods can be applied.

 Two important categorizations from an IoT


perspective are whether the data is structured or
unstructured and whether it is in motion or at rest.
Structured Versus Unstructured Data
 Structured data and unstructured data are important
classifications as they typically require different toolsets
from a data analytics perspective
 Structured data means that the data follows a
model or schema that defines how the data is represented
or organized, meaning it fits well with a traditional
relational database management system (RDBMS).
 In many cases you will find structured data in a simple
TAbULAr form—for example, a sprEAdsheet where
data occupies a specific cell and can be explicitly defined and
referenced
 Structured data can be found in most computing
systems and includes everything from banking
transaction and invoices to computer log files and
router configurations.

 IoT sensor data often uses structured values, such as


temperAture, pressure, humidity, AND so on, which ARe
ALl sent in A known fORMAt.

 Structured data is easily formatted, stored, queried, and


processed
 Because of the highly organizational format of
structured data, a wide array of data analytics tools
are readily available for processing this type of
data.

 From custom scripts to commercial software like


Microsoft Excel and Tableau
Unstructured data lacks a logical schema for
understanding and decoding the data through traditional
programming means.

 Examples of this data type include text, speech, images,


and video.

 As a general rule, any data that does not fit neatly into
a predefined data model is classified as unstructured
data
 According to some estimates, ARound 80% of A business’s
DATA is unstructured.
 Because of this fact, data analytics methods that can
be applied to unstructured data, such as cognitive
computing and machine learning, are deservedly garnering
a lot of attention.
 With machine learning applications, such as natural
language processing (NLP), you can decode
speech.
 With image/facial recognition applications, you can
extract critical information from still images and
video
 Smart objects in IoT networks generAte
both structured AND unstructured DATA.

 Structured data is more easily managed and processed


due to its well-defined organization.

 On the other hand, unstructured data can be harder to


deal with and typically requires very different analytics
tools for processing the data
 From an IoT perspective, the data from smart objects is
considered data in motion as it passes through the network en
route to its final destination.
 This is often processed At the edge, using fog computing.
 When data is processed at the edge, it may be filtered and
deleted or forwarded on for further processing and possible
storage at a fog node or in the data center.
 Data does not come to rest at the edge.
 When data arrives at the data center, it is possible to process
it in real-time, just like at the edge, while it is still in
motion.
 Tools with this sort of capability, are Spark, Storm, and Flink
 Data at rest in IoT networks can be typically
found in IoT brokers or in some sort of storAge
ARrAy At the DATA center

 Hadoop not only helps with data processing but also


data storage
Types of Data Analysis Results
Four types of data analysis results
 Descriptive:
 Descriptive data analysis tells you what is happening,
either now or in the past.
 For example, a thermometer in a truck engine
reports temperature values every second.
 From a descriptive analysis perspective, you can pull this
data at any moment to gain insight into the current
operating condition of the truck engine.
 If the temperature value is too high, then there may
be a cooling problem or the engine MAy be
experiencing too much LOAd.
 Diagnostic:
 When you are interested in the “why,” diagnostic data analysis
can provide the answer.
 Continuing with the example of the temperature sensor in
the truck engine, you might wonder why the truck engine
fAILEd.
 Diagnostic analysis might show that the temperature
of the engine was too high, and the engine overheated.
 Applying diagnostic analysis across the data generated by a
wide range of smart objects can provide a clear picture of
why a problem or an event occurred
 Predictive:
 Predictive analysis aims to foretell problems or issues
before they occur.
 For example, with historical values of temperatures for the
truck engine, predictive analysis could provide an
estimate on the remaining life of certain components in
the engine.
 Prescriptive:
 Prescriptive analysis goes a step beyond predictive and
recommends
solutions for upcoming problems.
 A prescriptive analysis of the temperature data from a truck
engine might calculate various alternatives to cost-
effectively
maintain our truck
 These calculations could range from the cost necessary for more frequent
oil
changes and cooling maintenance to installing new cooling equipment on the
engine or upgrading to a lease on a model with a more powerful engine.
IoT Data Analytics Challenges
Problems by using RDMS in IoT

1.Scaling Problems (performance issues, costly to


resolve, req more h/w, architechture changes)

2. Volatility of Data (change in schema)


Machine Learning
 ML is central to IoT.
 Data collected by smart objects needs to be analyzed, and
intelligent actions need to be taken based on these analyses.
 Performing this kind of operation manually is almost
impossible (or very, very slow and inefficient).
 A simple example is an app that can help you
find your parked car.
 A GPS reading of your position at regular intervals
calculates your speed.
 A basic threshold system determines whether you are
driving (for example, “if speed > 20 mph or 30 kmh,
then start calculating speed”).
 When you park and disconnect from the car
Bluetooth system, the app simply records the
location when the disconnection happens.
 This is where your car is parked.
 ML is a vast field but can be simply divided in two
main categories: supervised and unsupervised
learning
Supervised Learning
 In supervised learning, the machine is trained with input
for which there is a known correct answer.
 For example, suppose that you are training a system to
recognize when there is a human in a mine tunnel.
 A sensor equipped with a basic camera can capture
shapes and return them to a computing system that is
responsible for determining whether the shape is a
human or something else (such as a vehicle, a pile of ore,
a rock, a piece of wood, and so on.).
 In other cases, the learning process is not about classifying in
two or more categories but about finding a correct value.
 For example, the speed of the flow of oil in a pipe is a
function of the size of the pipe, the viscosity of the oil, pressure, and a
few other factors.
Unsupervised Learning
 In some cases, supervised learning is not the best method
for a machine to help with a human decision.
 Suppose that you are processing IoT data from a factory
manufacturing small engines.

 You know that about 0.1% of the produced engines on


average need adjustments to prevent later defects, and your
task is to identify them before they get mounted into machines
and shipped away from the factory.

 With hundreds of parts, it may be very difficult to detect the


potential defects, and it is almost impossible to train a machine
to recognize issues that may not be visible
 However, you can test each engine and record multiple
parameters, such as sound, pressure, temperature of
key parts, and so on.
 Once data is recorded, you can graph these elements in
relation to one another (for example, tempERAture AS
A function of pressure, sound versus rotATIng speed
overtime).
 You can then input this data into a computer and use
mathematical functions to find groups.
 For example, you may decide to group the engines by the
sound they make at a given temperature.
 A standard function to operate this grouping, K-MEANS
clustering, finds the MEAn vALUes for A group of engines (for
eXAMPle, MEAn vAlue for temperAture, MEAN frequency for
sound).
 Grouping the engines this way can quickly reveal several types of
engines that all belong to the same category (for example, small
engine of chainsaw type, medium engine of lawnmower type).
 All engines of the same type produce sounds and temperatures in
the same range as the other members of the same group.
Big Data Analytics Tools and Technology
 Big data analytics can consist of MANy different
softwAre pieces that together collect, store, MANIPULAte,
AND ANAlyze ALL different DATA types.
 Generally, the industry looks to the “three Vs” to
categorize big data:
 Velocity
 Refers to how quickly DATA is being collected And ANAlyzed.
 Hadoop Distributed File System is designed to ingest and
process data very quickly.
 Smart objects can generate machine and sensor data at a very fast
rate and require database or file systems capable of equally fast
ingest functions.
 Variety
 refers to different types of dATA.
 Often you see data categorized as structured, semi-
structured, or unstructured.
 Different database technologies may only be capable of accepting
one of these types.
 HADOOP is Able to collect ANd store ALL three types
 Volume
 refers to the SCAle of the DATA.

 Typically, this is MEASured from gigAbytes on the very


low end to petAbytes or even eXAbytes of DATA ON the
other extreme
NoSQL Databases
 NoSQL (“not only SQL”) is a class of databases that
support semi-structured and unstructured data, in addition
to the structured data handled by data warehouses and
MPPs
 NoSQL is not A specific DATABASE technology; rather, it
is an umbrella term that encoMPASSes severAL different
types of DATABASES, including the following
Hadoop
 Hadoop is the most recent entrant into the data
management market, but it is arguably the most popular
choice as a DATA repository AND processing engine.

 Hadoop was originally developed AS A result of projects


At Google ANdYAHoo!

 The original intent for Hadoop was to index millions


of websites and quickly return SEArch results for open
source SEArch engines
 Initially, the project had two key elements:
 Hadoop Distributed File System (HDFS):
A system for storing DATA ACRoss multiple nodes

 MapReduce:
 A distributed processing engine that splits A LARge TASK
into SMALLER ones THAt cAN be run in PARALLEl.

 Hadoop relies on a SCALE-out ARchitecture that leverAges


LOCAL processing, memory, AND storAge to distribute
tASKs AND provide A SCALAble storAge system for DATA.
YARN
 Introduced with version 2.0 of Hadoop,YARN (Yet
Another Resource Negotiator) was designed to
enhance the functionality of MapReduce.
 With the initial release, MapReduce was responsible
for batch data processing and job tracking and
resource management across the cluster.
 YARN was developed to take over the resource
negotiation and job/task tracking, allowing MapReduce to
be responsible only for data processing.
The Hadoop Ecosystem
 Since the initial release of Hadoop in 2011, many
projects have been developed to add incremental
functionality to Hadoop and have collectively become
known as the Hadoop ecosystem.
 Apache Kafka
 Apache Spark
 Apache Storm and Apache Flink
 Lambda Architecture
Comparing Big Data and Edge Analytics
 When you hear the term big data, it is usually in
reference to unstructured data that has been collected
and stored in the cloud
 Tools like Hadoop and MapReduce are great at
tackling problems that require deep analytics on a large
and complex quantity of unstructured data;
 However, due to their distance from the IoT endpoints
and the bandwidth required to bring all the data back
to the cloud, they are generally not well suited to real-
time analysis of data as it is generated.
 In applying data analytics to the car racing example, big
data analytics is used to examine all the statistics of the
racing team and players based on their performance in
the data center or cloud
 Streaming analytics involves analyzing a race while it
is happening and trying to figure out who is going to win
based on the actual performance in real-time—and this
analysis is typically performed as close to the edge as
possible.
 Streaming analytics allows you to continually monitor
and assess data in real-time so that you can adjust or
fine-tune your predictions as the race progresses.
 In the context of IoT, with streaming analytics
performed at the edge (either at the sensors themselves
or very close to them, in a fog node that is, for
example, integrated into the gateway), it is possible to
process and act on the data in realtime without waiting
for the results from a future batch- processing job in the
cloud.
 The key values of edge streaming analytics
include the following:
 Reducing data at the edge
 Analysis and response at the edge
 Time sensitivity
Edge Analytics Core Functions
 To perform analytics at the edge, data needs to be
viewed as real-time flows.
 Whereas big data analytics is focused on large
quantities of data at rest, edge analytics continually
processes streaming flows of data in motion
 Streaming analytics at the edge can be broken down
into three simple stages:
 Raw input data
 Analytics processing unit (APU)
 Output streams
 In order to perform analysis in real-time, the APU
needs to perform the following functions:
 Filter
 Transform
 Time
 Correlate
 Match patterns
 Improve business intelligence
Network Analytics
 Another form of analytics that is extremely
important in managing IoT systems is network-
based analytics
 Network analytics is concerned with discovering patterns
in the communication flows from a network traffic
perspective.
 Network analytics has the power to analyze details
of communications patterns made by protocols and
correlate this across the network.
 It allows you to understand what should be
considered normal behavior in a network and to
quickly identify anomalies that suggest network
problems due to suboptimal paths, intrusive malware,
or excessive congestion.
Securing IoT
 Information technology (IT) environments have faced active
attacks and information security threats for many
decades, and the incidents and lessons learned are well-
known and documented.
 Operational technology (OT) environments were
traditionally kept in silos and had only limited
connection to other networks.
 Thus, the history of cyber attacks on OT systems is
much shorter and has far fewer incidents
documented
 A Brief History of OT Security
 Common Challenges in OT Security
 How IT and OT Security Practices and Systems Vary
A Brief History of OT Security
 Cybersecurity incidents in industrial environments can
result in physical consequences that can cause threats to
human lives as well as damage to equipment,
infrastructure, and the environment.
 While there are certainly traditional IT-related
security threats in industrial environments, it is the
physical manifestations and impacts of the OT security
incidents that capture media attention and elicit broad-
based public concern.
 Historically, attackers were skilled individuals with
deep knowledge of technology and the systems they were
attacking.
 However, as technology has advanced, tools have been
created to make attacks much easier to carry out.
 To further complicate matters, these tools have become
more broadly available and more easily obtainable.
 Compounding this problem, many of the legacy protocols
used in IoT environments are many decades old, and there was
no thought of security when they were first developed.
 This means that attackers with limited or no technical
capabilities now have the potential to launch cyber attacks,
greatly increasing the frequency of attacks and the overall
threat to end operators.
Common Challenges in OT Security
 Erosion of Network Architecture
 Two of the major challenges in securing
industrial environments have been initial design
and ongoing maintenance.
 The initial design challenges arose from the concept
that networks were safe due to physical separation
from the enterprise with minimal or no connectivity to
the outside world, and the assumption that attackers
lacked sufficient knowledge to carry out security
attacks.

You might also like