IIOT Unit 3 NOTES
IIOT Unit 3 NOTES
IIOT ANALYTICS
Big Data Analytics and Software Defined Networks, Machine Learning and Data Science, Julia
Programming, Data Management with Hadoop.
IoT data is just a curiosity, and it’s even useful if handled correctly. However,
given time, as more and more devices are added to IoT networks, the data
generated by these systems becomes overwhelming.
The real value of IoT is not just in connecting things but rather in the data
produced by those things, the new services you can enable via those connected
things, and the business insights that the data can reveal.
In the world of IoT, the creation of massive amounts of data from sensors is
common and one of the biggest challenges—not only from a transport
perspective but also from a data management standpoint.
Analysing large amount of data in the most efficient manner possible falls
under the umbrella of data analytics.
Data analytics must be able to offer actionable insights and knowledge from
data, no matter the amount or style, in a timely manner, or the full benefits of
IoT cannot be realized.
Example:
Modern jet engines are fitted with thousands of sensors that generate a
whopping 10GB of data per second may be equipped with around 5000
In fact, a single wing of a modern jumbo jet is equipped with 10,000 sensors.
The potential for a petabyte (PB) of data per day per commercial airplane is
not farfetched—and this is just for one airplane. Across the world, there are
approximately 100,000 commercial flights per day. The amount of IoT data
coming just from the commercial airline business is overwhelming.
IIoT Analytics: Data Science
modelling approach
Structured data means that the data follows a model or schema that defines
how the data is represented or organized, meaning it fits well with a traditional
relational database management system (RDBMS).
IoT sensor data often uses structured values, such as temperature, pressure,
humidity, and so on, which are all sent in a known format. Structured data is
easily formatted, stored, queried, and processed; for these reasons, it has been
the core type of data used for making business decisions.
From custom scripts to commercial software like Microsoft Excel and Tableau,
most people are familiar and comfortable with working with structured data.
Unstructured data lacks a logical schema for understanding and decoding the
data through traditional programming means.
Examples of this data type include text, speech, images, and video. As a general
rule, any data that does not fit neatly into a predefined data model is classified
as unstructured data. such as cognitive computing and machine learning, are
deservedly garnering a lot of attention.
Smart objects in IoT networks generate both structured and unstructured data.
Structured data is more easily managed and processed due to its welldefined
organization.
On the other hand, unstructured data can be harder to deal with and typically
requires very different analytics tools for processing the data.
Data saved to a hard drive, storage array, or USB drive is data at rest.
➢ From an IoT perspective, the data from smart objects is considered data
in motion as it passes through the network en route to its final destination.
➢ This is often processed at the edge, using fog computing. When data is
processed at the edge, it may be filtered and deleted or forwarded on for further
processing and possible storage at a fog node or in the data center.
➢ Tools with this sort of capability, such as Spark, Storm, and Flink, are
relatively nascent compared to the tools for analysing stored data.
Data at rest in IoT networks can be typically found in IoT brokers or in some
sort of storage array at the data center. Myriad tools, especially tools for
structured data in relational databases, are available from a data analytics
perspective.
The best known of these tools is Hadoop. Hadoop not only helps with data
processing but also data storage. IoT Data Analytics Overview
The true importance of IoT data from smart objects is realized only when the
analysis of the data leads to actionable business intelligence and insights.
Data analysis is typically broken down by the types of results that are
produced.
Descriptive: Descriptive data analysis tells you what is happening, either now or
in the past.
Diagnostic: When you are interested in the “why,” diagnostic data analysis can
provide the answer.
Both predictive and prescriptive analyses are more resource intensive and
increase complexity, but the value they provide is much greater than the value
from descriptive and diagnostic analysis.
Figure 7-4 illustrates the four data analysis types and how they rank as
complexity and value increase. You can see that descriptive analysis is the least
complex and at the same time offers the least value. On the other end,
prescriptive analysis provides the most value but is the most complex to
implement.
Most data analysis in the IoT space relies on descriptive and diagnostic
analysis, but a shift toward predictive and prescriptive analysis is
understandably occurring for most businesses and organizations.
Scaling problems: Due to the large number of smart objects in most IoT
networks that continually send data, relational databases can grow incredibly
large very quickly. This can result in performance issues that can be costly to
resolve, often requiring more hardware and architecture changes.
must be kept at a minimum. IoT data, however, is volatile in the sense that the
data model is likely to change and evolve over time.
Some other challenges:
• IoT also brings challenges with the live streaming nature of its data and
with managing data at the network level. Streaming data, which is generated as
smart objects transmit data, is challenging because it is usually of a very high
volume, and it is valuable only if it is possible to analyse and respond to it in
real-time.
Open SDN: Experience the power of open protocols as they orchestrate and
govern both virtual and physical devices, seamlessly directing the flow of data
packets.
Hybrid M odel SDN: Embrace the best of both worlds with the Hybrid Model
SDN. By seamlessly blending the realms of SDN and traditional networking,
this versatile approach enables the optimal selection of protocols for various
traffic types. Harness the power of Hybrid SDN as a phased implementation
strategy for a smooth transition into the world of SDN.
Enhanced Control with Unparalleled Speed and Flexibility: SDN elimin ates the
need for manual configuration of various hardware devices from different
vendors. Instead, developers can exert control over network traffic by
programming a software based controller adhering to open standards. This
approach empowers networking managers with the freedom to select
networking equipment and communicates with multiple hardware devices using
a single protocol via a centralized controller, resulting in remarkable speed and
flexibility.
Robust Security: SDN in IoT offers comprehensive visibility across the entire
network, presenting a holistic view of potential security threats. As the number
of intelligent devices connecting to the Internet continues to proliferate, SDN
surpasses traditional networking in terms of security advantages. Operators
can create distinct zones for devices requiring different security levels or
promptly isolate compromised devices to prevent the spread of infections
throughout the network.
M achine Learning
You need to record a set of predetermined sentences to help the tool match
well- known words to the sounds you make when you say the words. This
process is called machine learning.
ML is concerned with any process where the computer needs to receive a set of
data that is processed to help perform a task with more efficiency. ML is a vast
field but can be simply divided in two main categories: supervised and
unsupervised learning.
1. Unsupervised Learning
3. Reinforcement Learning
2. Supervised Learning
Unsupervised Learning
dataset, based on the inner structure of the data without looking into the
specific outcome.
Supervised learning
In supervised learning, the machine is trained with input for which there is a
known correct answer. For example, suppose that you are training a system to
recognize when there is a human in a mine tunnel.
A sensor equipped with a basic camera can capture shapes and return them to
a computing system that is responsible for determining whether the shape is a
human or something else (such as a vehicle, a pile of ore, a rock, a piece of
wood, and so on.)
With supervised learning techniques, hundreds or thousands of images are fed
into the machine, and each image is labeled (human or nonhuman in this case).
This is called the training set. An algorithm is used to determine common
parameters and common differences between the images.
The comparison is usually done at the scale of the entire image, or pixel by
pixel. Images are resized to have the same characteristics (resolution, color
depth, position of the central figure, and so on), and each point is analyzed.
Human images have certain types of shapes and pixels in certain locations.
and a deviation is calculated to determine how different the new image is from
the average human image and, therefore, the probability that what is shown is
a human figure. This process is called classification.
After training, the machine should be able to recognize human shapes. Before
real field deployments, the machine is usually tested with unlabelled pictures—
this is called the validation or the test set, depending on the ML system used—
to verify that the recognition level is at acceptable thresholds. If the machine
does not reach the level of success expected, more training is needed.
specific environment.
Data science :
Julia programming
With the help of multiple dispatch, the user can define function behavior across
many combinations of arguments.It has powerful shell that makes Julia able to
manage other processes easily.The user can cam call C function without any
wrappers or any special APIs.Julia provides an efficient support for Unicode.
It also provides its users the Lisp-like macros as well as other metaprogramming
processes.It provides lightweight green threading, i.e., coroutines.
The coding done in Julia is fast because there is no need of vectorization of code
for performance.
Open source
Distributed computation and parallelism possible
Support efficiently Unicode
Call c functions directly
Basic math
Assigning string
Use of $ sign for string interpolation
String concatenation
Data structures
1. Tuples
Dictionary
3. Arrays
Data M anagement
Hadoop
M apReduce
that process large amount of datasets in
parallel
-generation MapReduce
Hadoop cluster.
Namenode
The namenode is the commodity hardware that contains the GNU/Linux
operating system and the namenode software. It is a software that can be
run on commodity hardware. The system having the namenode acts as
the master server and it does the following tasks −
Datanode
The datanode is a commodity hardware having the GNU/Linux operating
system and datanode software. For every node (Commodity
hardware/System) in a cluster, there will be a datanode. These nodes
manage the data storage of their system.
Block
Generally the user data is stored in the files of HDFS. The file in a file
system will be divided into one or more segments and/or stored in
individual data nodes. These file segments are called as blocks. In other
words, the minimum amount of data that HDFS can read or write is
called a Block. The default block size is 64MB, but it can be increased as
per the need to change in HDFS configuration.
Goals of HDFS
Fault detection and recovery − Since HDFS includes a large number of
commodity hardware, failure of components is frequent. Therefore HDFS
should have mechanisms for quick and automatic fault detection and
recovery.
Step 1
You have to create an input directory.
Step 1
Initially, view the data from HDFS using cat command.
$ stop-dfs.sh
There are many more commands in "$HADOOP_HOME/bin/hadoop fs"
than are demonstrated here, although these basic operations will get you
started. Running ./bin/hadoop dfs with no additional arguments will list
all the commands that can be run with the FsShell system. Furthermore,
$HADOOP_HOME/bin/hadoop fs -help commandName will display a
short usage summary for the operation in question, if you are stuck.
Users and applications can retrieve data from Hadoop using various
query and analysis tools. SQL-like languages (e.g., Hive’s HQL), scripting
languages (e.g., Pig Latin), and programming languages (e.g., Java,
Python) can be used for data retrieval.
Data Security:
Metadata about data assets, such as data lineage, data definitions, and
data ownership, can be stored in data catalogs and metadata repositories
to aid in data discovery and usage.
Data Compression and Optimization: