0% found this document useful (0 votes)

3 views44 pages

IoT & Its Applications Unit-IV

The document discusses data analytics, focusing on structured, unstructured, and semi-structured data, as well as data in motion and at rest. It highlights the role of machine learning in processing and analyzing IoT data, and introduces various data analysis types such as descriptive, diagnostic, predictive, and prescriptive analysis. Additionally, it covers NoSQL databases and big data technologies, emphasizing their significance in managing large volumes of diverse data generated by IoT systems.

Uploaded by

Andro Jeevan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views44 pages

IoT & Its Applications Unit-IV

Uploaded by

Andro Jeevan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 44

UNIT IV

DATA ANALYTICS AND SUPPORTING SERVICES

Structured Vs Unstructured Data and Data in Motion Vs Data in Rest– Role

of Machine Learning – No SQL Databases – Hadoop Ecosystem – Apache Kafka, Apache
Spark – Edge Streaming Analytics and Network Analytics – Xively Cloud for IoT, Python
Web Application Framework – Django – AWS for IoT – System Management with
NETCONF-YANG

4.1STRUCTURES AND UNSTRUCTURED DATA:

Structured Data:

Structured data defines data which are organized properly .All Relational
databases come under the structured data. Structured data is categorized as quantitative
data. Data that fits neatly in fixed fields and columns. Example: spreadsheets. Examples of
structured data include names, dates, addresses, credit card numbers, stock information,
Geo location, and more. In relational databases we can give input, search, and manipulate
structured data fast.

The programming language used by structured data is called structured

query language, also known as SQL.The data got from the IoT sensors like temperature,
pressure, humidity are the structured data.

Unstructured Data:

Unstructured data is a qualitative data, and it cannot be analyzed using any

standard tools or methods. Examples of unstructured data include text, video, audio, mobile
activity, social media activity, satellite imagery and surveillance. No pre-defined model is
available for unstructured data; it is not organized as relational databases.

Data acquired by the all the business is the unstructured data. Non-relational,
or No SQL databases, are used for managing unstructured data..More than 80 percent of
all data generated by the business process today is considered to be an unstructured data.
Advanced analytics is considered for manipulating the unstructured data. For example data
mining techniques, machine learning techniques and Natural language Processing are
used for the analysis of unstructured data of text, video and image.

For example, data from sensors attached to industrial machinery can alert
manufacturers of strange activity ahead of time. With this information, a repair can be made
before the machine suffers a costly breakdown.
Semi structured data:

This data is a hybrid data which shares the attributes of structured data and
unstructured data. It contains certain schema and consistency.Email, JSON is an example
of the unstructured data.

The sensors in IoT generate both structured and un structured data.

Structured data is is managed by the well defined scheme, unstructured data is managed
by analytical tools.

Figure 4.1: Comparison between Structured and Unstructured Data

Figure 4.2 Structured Data VS Unstructured Data

4.2 DATA IN MOTION VS DATA IN REST

Data in IoT is operated as Data in transit (motion) or data at rest. The data
acquired from the IoT sensor objects is the data in motion. The data in motion is utilized by
the fog and edge computing. Data is sent to data center from the fog and edge computing.
Data in motion:

Data is actively moving from one location to another in the data in motion .for
example data is transferred between two networks.

Data at rest:

Data at rest is data that is not actively moving from device to device or
network to network such as data stored on a hard drive, laptop, and flash drive.eg: USB.

Protecting sensitive data both in transit and at rest is much needed for
modern systems as intruders find more complicated ways to steal data. Spark, storm and
Flink are the tools used for analysing the stored data. Myriad tools are used for processing
the structured data.Hadoop helps data processing and data storage.

Figure 4.3: Digital Data Examples

IOT Data Analytics overview:

IoT Data from the smart devices are realized and analysed in many ways.
Most of the IoT systems deploy the descriptive analysis and diagnostic analysis.
Prescriptive analysis and predictive analysis are complex to implement but modern
business are trending towards it.

There are four data analysis namely

 Descriptive analysis

 Diagnostic analysis

 Predictive analysis

 Prescriptive analysis

Descriptive analysis: This analysis explain s what is happening now, or in

the past. You can gain data about the current working condition.eg: thermometer reading in
truck engine.

Diagnostic analysis: This analysis provides us the details of why it has

happened. We are able to achieve the answer for the Why question. If the engine is hot,
the analysis may give a answer why the engine has become hot.

Predictive analysis: This analysis helps to forecast or predict the future

outcomes.(what is likely to happen in future).the data recorded is analysed and the it
predicts the outcome. Temperature recorded in the engine determines the remaining life of
the parts in the engine.

Prescriptive analysis: This analysis is beyond the predictive analysis it

gives solutions for the predicted problems (what should i do about it).if engine gets heated
up a lot, it provides a solution of adding a cooling system to the engine.

Descriptive Analysis,Diagnostic Analysis,Predictive Analysis ,Prescriptive Analysis

Big data technologies(Collect,Integrate,Process,Aggregate,Visualize)

DATA(geolocation,sensors ,video,social media)

Figure 4.4: Types Of Data Analysis Results.

Data from Iot sensors undergo a challenge with the relational databases. The
challenges include,Scaling problems, Volatility of data.

NO SQL database is used for the abovementioned two challenges.Iot also

faces lot of challenges as it involves huge live data streaming from the sensors. Companies
like Google, Microsoft provide cloud services for handling of the huge volume of data
generated from the sensors and they perform analytics .Flexible Net Flow and IPFIX are
the network analytics tools used to monitor the flow of data in network.

Machine learning:

The data is generated from the IoT sensors are processed by a set of
algorithms and tools to come out with the relationship between the data. This data
processing is carried out by machine learning. Data obtained from the sensors should be
analyzed to take proper decision.

Machine learning is important tool for the IoT and data analytics .Machine
learning, Deep learning, Neural Networks and Convolutional networks are the various terms
related to the field of IoT.Self driving vehicles are embedded with self-learning capacity to
make intelligent decisions during driving is due to advancements in the machine learning
concepts.

4.1 ROLE OF MACHINE LEARNING

ML role is to process the following

 Predictions.

 Foreseeing.

 (over 90%) accuracy

Both Amazon and Netflix make use of machine learning figuring out how to
absorb our dispositions and deliver a superior ordeal to the client.The below figure depicts
the Roles and Responsibilities of ML in IOT and DATA ANALYTICS related to various
industries.
Figure 4.5: Various Fields Integrated With Deep Intelligence.

Machine learning overview:

Machine learning comes under the roof of artificial intelligence .Artificial

intelligence is framed in such a way it inhibits the characteristics of human brain intelligence
.A simple application to find the car parked is an area is an example of this artificial
intelligence. Machine learning deals with the concept of recording the data then processing
the data to acquire the certain important decision. Machine learning is a wide concept
applied in various fields to analyze the data. This machine learning can be categorized as
supervised learning and unsupervised learning.

Supervised learning:

Supervised learning involves a set of inputs and their corresponding outputs.

The system will be trained on set of inputs called the training set, algorithms work on the
training set and it calculates the difference between the input in the training set and it finally
classifies the different set of classes in the input. This process of classifying independent
classes from a given set of inputs is called Classification. The inputs given will be labeled
set of inputs in classification. The training and testing is done .testing is done with
unlabeled data sets. The classification results in finding correct value. Classification and
regression is considered to the important approaches of supervised learning. Classification
predicts discrete value and the regression predicts the continuous value. Greater number of
inputs or larger datasets would result in better training for the systems to obtain good
accuracy.

Unsupervised Learning:
Figure 4.6: Unsupervised Learning

The given data is unlabeled and we are able to find different categories of the
input it is said to unsupervised learning. This algorithm finds the different set of groups in
the given unlabeled set of data. This grouping is performed by the K Means clustering. The
mean of the particular input is calculated and all the data with similar kind are grouped
together. The following figure depicts the three different clusters formed from given set of
unlabeled data.

Neural networks are the extensions of machine learning approach the system
are able to recognize or differentiate and mimic human brain. Network is formed with
different set of layers namely input layer, first layer, higher layer, top layer and the output
layer. The following figure explains how a system is trained to find a dog from a given set of
labeled images of animals, through proper learning to classify them.

In Input layer unlabeled image is sent to the pretrained network. The first
layer finds the different shapes and in the higher layer complex structures are identified
(different features like face, arm) and top layer would identify the different high complex
structures (differentiate different animal categories).the final output layer predicts the animal
based on the training .the output unit gives the final output with high accuracy.

A neural network has much research focus. A neural network has been used
with various image processing application. There are different kinds of neural network
namely artificial neural network, convolutional neural network and recurrent neural network.
Deep learning concept was further developed which consists of more number of layers. The
result of one layer is fed into the next layer and the processing is done fast at the
intermediate layers. Numerous applications nowadays rely on deep learning concept and
neural network approaches.
Figure 4.7: NETCONF – YANG

For every possible use cases, it is necessary to determine the proper

algorithm usage to obtain good result when integrated with the IoT Application.ML
operation can be handled by two ways namely local learning and remote learning.

Local learning: if the data is processed in the sensor node or fog node

Remote Learning: data is collected and it is processed in the cloud server.

ML for IoT in major domains: Weather sensor can provide the details of
pollution level at the city. Light embedded on street can change the luminosity based on the
local light conditions of the environment.ML integrated with IoT is deployed on various
applications. The following actions are performed on the sensors embedded on various
places.

Monitoring: The sensors are used for monitoring the environment for
example the temperature sensor.ML integrated with this sensor can find the failure
condition.

Behavior control: For example, if a system monitors a hot atmosphere in

the environment the ML may be used to control the behavior of the system and inducing
the system to generate fresh cool air to the environment.

Operations optimization: behavior control focus on the corrective operation,

This operation optimization aims at providing increased efficiency and optimized solution.

Self healing, Self-optimizing: The system which identifies the fault by itself
and it can find a corrective action for the fault being identified.
Predictive Analytics: This kind of analytics is done to predict the issue
which is going to arise due to some fault in the system. Predictive analysis is done to
improve the safety and maintenance of the system .sensors which are embedded in
machines can predict the faults which is going to occur through the help of big data
analytics

Big Data Analytics Tools and Technology:

The data management is done by the big data and hadoop. Hadoop is the
backbone of various big data application. The data is being collected; stored, manipulated
and analyzed .The big data has three Vs

Velocity: Velocity deals how fast the data is collected and

processed.Hadoop file system is used to process the data fast which is collected by the
sensor objects

Variety: deals with different kinds of data like structured, unstructured and
semi structured data stored in the hadoop.Data from sensors is the example of structured
data, data from the social media is the unstructured data

Volume: deals with the huge volume of data ranging from giga bytes to exa
bytes. Clusters of servers are used for big deployments.

Types of Data Sources:

Machine data: Data generated from the sensors embedded in IoT systems
Transaction data: Data obtained from transactions

Social data: Data obtained from the social media like face book, twitter
(huge amount of data generated from the social media)

Enterprise data: Data from the enterprises are structured in nature.

Industrial automation and control systems feed their data into relational
databases and historians. Examples of relational databases include oracle and Microsoft
SQL.Historian databases include the time series data recorded from the sensors.

There are new technologies for handling the data management. They are

 Massively Parallel Processing Databases

 NoSQL Databases

 Hadoop
 Massively Parallel Processing Databases:

The data from the enterprises are structured data and it is being stored in
relation databases. These group of relational databases together constitute the data
warehouses.MPP is a concept which is built on the top of the relational data warehouses
for faster access and reducing the query time. These systems can process the data in
parallel so it results in faster query process time.MPP is also termed as the analytic
databases. Refer the following figure for the MPP nothing sharing architecture. It possess
the master node to which all nodes are connected .each node has the processor, memory
and storage within itself. The whole process is optimized with the help of SQL. Fast
processing is an important aspect of MPP.

4.4 NO SQL DATABASE

NoSQL (“non SQL” or “not only SQL”) databases store data in a format
other than relational tables. The semi structured and unstructured data are processed by
NO SQL. NoSQL database has been characterized in many types which include document
stores, key-value stores, wide-column stores, and graph stores.

Figure 4.7: MPP Shared Nothing Architecture

Document stores: It involves unstructured data (XML and JSON)

Key value stores: It stores in the form of associative arrays. Key is paired
with value. Wide column stores: stores key value pairs but formatting takes place row by
row Graph stores: it describes the relationship between elements. Well suited for natural
Language processing and social media.

A common misconception is that NoSQL databases or non-relational

databases don’t store relationship data well. NoSQL databases can store relationship data
—they just store it differently than relational databases do.

The cost of storage is decreased due to the invent of No SQL databases

NoSQL databases are used in real-time web applications.

The data structures used by NoSQL databases are different from those used
by default in relational databases which makes some operations faster in NoSQL.

Most NoSQL stores lack true ACID(Atomicity, Consistency, Isolation,

Durability) transactions but a few databases, such as MarkLogic, Aerospike, FairCom c-
treeACE, Google Spanner (though technically a NewSQL database), Symas LMDB, and
OrientDB have made them central to their designs.

Figure 4.8: SQL Databases and No SQL Databases.

Features of NoSQL Non-relational

 NoSQL databases do not follow the relational model

 Tables are not provided with flat fixed-column records

 Work with self-contained aggregates or BLOBs.

 Schema-free

 NoSQL databases are either schema-free or have relaxed schemas

 They don’t require any kind of definition of the schema of the data Offers
heterogeneous structures of data in the same domain
Figure 4.9: Difference Between RDBMS And NoSQL DB

Simple API

 Easy use of interfaces for storage and querying data provided

 Text-based protocols are used with HTTP REST with JSON

 Web-enabled databases running as internet-facing services

Distributed

 Multiple NoSQL databases can be executed in a distributed fashion

 Offers auto-scaling and fail-over capabilities

 Often ACID concept can be sacrificed for scalability and throughput

 Mostly no synchronous replication between distributed nodes Asynchronous Multi-

Master Replication, peer-to-peer, HDFS Replication

Figure 4.10: No SQL Databases

Types of NoSql Database

 Key-value Pair Based

 Column-oriented Graph

 Graphs based

 Document-oriented

 Key Value Pair Based

Data is stored in key/value pairs. It is designed to handle lots of data.

Key-value pair storage databases store data as a hash table where each key
is unique, and the value can be a JSON, BLOB (Binary Large Objects), string, etc.

For example, a key-value pair may contain a key like "Website" associated
with a value like "Guru99".

Table: 4.1 key value pair based

Column-based

Column-oriented databases work on columns and are based on Big Table

paper by Google. Every column is treated separately.

They deliver high performance on aggregation queries like SUM, COUNT,

AVG, MIN etc. as the data is readily available in a column.

Column-based NoSQL databases are widely used to manage data

warehouses, business intelligence,

HBase, hyper table are examples of column based database.

Table 4.2 Column Family

Document-Oriented:

Document-Oriented NoSQL DB stores and retrieves data as a key value pair

but the value part is stored as a document. The document is stored in JSON or XML
formats.

In this diagram on your left you can see we have rows and columns, and in
the right, we have a document database which has a similar structure to JSON.

The document type is mostly used for CMS systems, blogging platforms,
real-time analytics & e-commerce applications..

Amazon Simple DB,,Mongo DB, are popular Document originated DBMS

systems.

Table 4.3: Relational Vs. Document, Table document oriented Graph-Based

A graph type database stores entities as well the relations amongst those
entities. The entity is stored as a node with the relationship as edges. An edge gives a
relationship between nodes. Every node and edge has a unique identifier.
Compared to a relational database where tables are loosely connected, a
Graph database is a multi-relational in nature..

Graph base database mostly used for social networks, logistics, and spatial
data.

Figure 4.11: Graph

WHEN TO USE NOSQL

 Some specific cases when NoSQL databases are a better choice than
RDBMS include the following:

 When there is a large need for storing large amounts of unstructured data
with changing schemas.

 When you are interconnected by cloud computing.

 When you need to develop rapidly.

 When a hybrid data environment is available.

4.5 HADOOP ECOSYSTEM

Hadoop:

Hadoop is recent data management for processing of data.Hadoop system

was initially developed to handle the millions of websites and to enhance the fast
search .Hadoop has two key elements. (HDFS and Map reduce)

Hadoop Distributed File System: system for storing data from different
nodes.Map reduce: Processing engine which divides a big task into small one and it runs in
parallel for faster approach.

Hadoop is an open-source framework

It helps to process big data and store data in a distributed environment.

It is designed to scale up from single servers to thousands of machines,

each offering local computation and storage.

Hadoop runs applications using the Map Reduce algorithm, where the data is
processed in parallel on different CPU nodes.

Hadoop framework has developed applications capable of running on

clusters of computers. Hadoop has the ability to perform statistical analysis for a large
volume of data.

Figure 4.12: Hadoop Eco System.

Figure 4.13: Distributed Hadoop Cluster

The above figure depicts the hadoop cluster; it includes the name nodes and
the data nodes.

Name Nodes: This Node is important for data ads, deletes reads on the
HDFS system. Namenode takes the request from clients and it gives the requested block to
the available nodes. It gives instruction to the data nodes when to perform the replication.

Data nodes: This node is to store the data .The various blocks are
distributed in the data nodes .The same block is shared to one or more nodes as per their
replication policy. This is done to ensure the data redundancy.

Figure 4.14: Datanode and Namenode

Figure 4.15: Writing A File To HDFS

The steps involved in writing a file to HDFS

Create a DFS

Create a Namenode

Write to Data Input stream

Write a packet to data node

Perform acknowledgement of packet to the input stream

Close the operation

Complete the process with Name node.

Hadoop Architecture

Hadoop framework includes following four modules:

Hadoop Common: These are Java libraries and utilities required by other
Hadoop modules.

Hadoop YARN: This is a framework for job scheduling and cluster resource
management.

Hadoop Distributed File System (HDFS™): A distributed file system that

provides high-throughput access to application data.

Hadoop Map Reduce: This is YARN-based system for parallel processing of

large data sets.
Figure 4.16: Hadoop Framework Hadoop-related

Apache projects:

Pig: Provides a high-level data-flow programming language

Hive: Provides SOL-like access

Mahout: Provides analytical tools

HBase: Provides real-time reads and writes

Mapreduce::

Hadoop divides the job into two important tasks. There are two types of
tasks:

Map tasks (Splits & Mapping)

Reduce tasks (Shuffling, Reducing)

The execution process is controlled and controlled by two types of entities

called a Job tracker: Acts like a master (responsible for complete execution of submitted
job)

Multiple Task Trackers: Acts like slaves, each of them performing the job

YARN :(Yetanother Resource Negotiator) was developed to improve the

working principle of MapReduce .YARN separates the resource management of the cluster

From the scheduling and monitoring of jobs running on the cluster. YARN has
replaced the work done by the Job Tracker and TaskTracker daemons .YARN is the basic
requirement for Enterprise Hadoop, which provides resource management .It delivers a
consistent operations, security, and data governance for the Hadoop. YARN also extends
the power of Hadoop to include more new technologies found within the data center. Yarn
has an advantage of cost effective, linear-scale storage and processing. It provides ISVs
and developers a consistent framework for writing data access applications that run IN
Hadoop.

Figure4.17 : Yarn

n Features:N FEATURES:

Multi-tenancy: YARN allows multiple access engines .It provides common

standard for batch, interactive and real-time engines that can simultaneously access the
same data set.

Cluster utilization: YARN’s provides dynamic allocation of cluster resources

Scalability: Data center processing power can be expanded rapidly.

Compatibility:Existing MapReduce applications developed for Hadoop 1 can

run YARN without any disruption to existing processes that already work

APACHE KAFKA:

APACHE KAFKA is a messaging scheme that is working based on the

distributed publisher subscribers. It is a real time event streaming system. It delivers an
information or message to stream processing engine like spark streaming or storm. It has
numerous producers and consumers connected to the Kafka Cluster .producers and
consumers exchange information between them through the kafka cluster. The producers
generate the data and the consumers read the data
Figure 4.18 : apache kafka Data flow

Figure 4.19: Kafka Architecture

Kafka is used for analysis of real-time streams of data.

Kafka utilized in real-time streaming data architectures for providing real-time

analytics.

Kafka has higher throughput, reliability, and replication characteristics,

Kafka can work with Flume/Flafka, Spark Streaming, Storm, HBase, Flink,
and Spark for real-time ingesting, analysis and processing of streaming data.

Many companies who handle a lot of data use Kafka. LinkedIn, Twitter uses it
as part of Storm to provide a stream processing infrastructure. It's also used by other
companies like Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco, CloudFlare,
and Netflix.

Kafka has operational simplicity.

Figure 4.20: apache kafka (various companies applications)

APACHE SPARK:Spark was introduced by Apache Software Foundation for

speeding up the Hadoop computational computing software process.

Apache Spark is an open-source distributed processing system used for big

data workloads. This utilizes in-memory caching. The task is performed at a rapid speed
the data is transferred to the high speed memory for the read and write operation. It
provides development APIs in Java, Scale, Python and R.The data is being processed in
real time Real time processing is done by the apache spark project and it is also termed as
spark streaming. Live streaming and messing system activities are performed by the spark
core. Spark core takes the data from the Kafka. The data collected from the Kafka is further
divided into small batches or micro batches. For the purpose of security.

Spark uses Hadoop in two ways – one is storage and second is processing.
Since Spark has its own cluster management computation, it uses Hadoop for storage
purpose only.
Figure 4.21 SPARK

There are three ways of Spark deployment as explained below.

Standalone –it is on the top of HDFS

Hadoop Yarn –SPARK runs on Yarn

Spark in MapReduce (SIMR) − Spark in MapReduce is used to launch spark

job in addition to standalone deployment

COMPONENTS OF SPARK
Figure 4.22: Apache spark core

Apache Spark Core

Spark Core is the underlying general execution engine for spark .It provides
In-Memory computing and referencing datasets in external storage systems.
Spark SQL

Spark SQL is a component focuses on a new data abstraction called

SchemaRDD.

Spark Streaming

Spark Streaming leverages Spark Core's fast scheduling capability to

perform streaming analytics. It ingests data in mini-batches and performs RDD
(Resilient Distributed Datasets) transformations on those mini-batches of data.

MLlib (Machine Learning Library)

MLlib is a distributed machine learning framework above Spark because

of the distributed memory-based Spark architecture.

GraphX

GraphX is a distributed graph-processing framework on top of Spark.

(User defined graphs)

Features of Apache Spark

Speed –faster processing

Supports multiple languages you can write the application in different

languages

Advanced Analytics − Spark not only supports ‘Map’ and ‘reduce’. It also
supports SQL queries, Streaming data, Machine learning (ML), and Graph algorithms.

Figure 4.23: Spark Core

5.6 APACHE STORM AND APACHE FLINK:

Apache storm and apache flink is built for the distributed stream
processing and it is mainly deployed for the IoT systems. Storm takes the data from the
Kafka and it processes it for the data streaming.

Lambda Architecture:Lambda architecture is data management system

where the processing is performed at three layers. The three layers are batch layer,
stream layer and the serving layer. Stream layer is responsible for the real time
processing of data using apache stream, storm or flink. Batch layer is responsible for the
batch processing and storage purpose. Serving layer provides the services to the users
or consumers.

Figure 4.24: Lambda Architecture

5.7 NETWORK ANALYTICS

In the era of information technology every company relies on the cloud to

store, or retrieves or market their business with the help of cloud.IoT integrated with the
cloud plays a major role. The data stored in the cloud is analyzed and various decisions
are taken. In automobile racing, various sensors in car produce enormous amount of
data per second, resulting in huge gigabytes of data, similarly weather forecasting
involves numerous data generated from the various sensors Edge analytics is the
collection, processing, and analysis of data at the edge of a network either at sensor or
near it. Retail, manufacturing, transportation, are generating huge volume of data at the
edge of the network. Edge analytics is data analytics in real-time and on site where data
collection is happening. Edge analytics could be descriptive or diagnostic or predictive
analytics.

Comparing Big Data and Edge Analytics:

Big data refers to the unstructured data collected and stored in the cloud.
Big data analytics can be performed on the data centre data in the cloud .it performs
batch job analytics .This edge streaming analytics allows you to analyse and monitor the
streaming of data at the edges to make the prediction decision wisely. In edges
analytics the data is not been analysed in single edge. It is analysed in
distributed edge nodes, each node has to communicate with one another. Streaming
analytics is being done on the traffic data which gives information to the driver in taking
important decisions due to analytics on the traffic data. Big data analytics is performed
on the data at rest, streaming analytics is performed on the data in motion.

 Key values of edge streaming analytics:

 Reducing data at the edge.

 Analysis and response at the edge

 Time sensitivity

 Edge Analytics Core Function:

The data in real time is analysed is by the streaming analytics. This

analytics is performed in three stages

Raw input data: data from sensors are given as input

Analytics processing unit: It takes the data streams and processes by

time windows by operating it with analytical functions

Output streams: output depicts the communication using messaging

protocol MQTT

Figure 4.25: Edge Analytics Processing Unit

APU has the following functions

Filter: It filters out the irrelevant data and takes only important data
needed for processing that is the work of filter in APU.

Transform: The data extracted is formatted for processing.

Time: As the data flow through real time basis, timing should be framed. If
there is a fluctuation of data at different times .The average value is calculated from the
various time fluctuated data. Average value between the certain time intervals is
calculated.

Correlate: The data is obtained from different sensors and finally

combined into single record. For example data comes from the different instruments is
combined into single health record of a patient finally. Combining real time data with the
historical data of the patients leads to know the insights of the current health condition of
the patients. This process is called correlation.

Match patterns: Matching patterns aims at the alerting the system if there
is a kind of emergency. For example the matching pattern may alert a nurse by
notification of an alarm. Machine learning technique is adopted to find the matching
patterns of the system

Improve business Intelligence: Edge Analytics improves the business

intelligence by improving the basic operations which in turn gives better efficiency.

Advantages

 Reduce latency of data analytics.

 Scalability of analytics.

 The amount of bandwidth needed to transmit all the data collected by thousands
of these edge devices will also grow exponentially with the increasing number of
these devices

 Edge analytics will reduce overall expenses by minimizing bandwidth.

Figure 4.26: IoT Edge Analytics

Distributed Analytics systems:

Fog analytics is performed at many nodes and data is correlated from

many nodes. Sensors communicate with the help of MQTT.MQTT message broker
sends to the fog processing, streaming analytics is performed and data is is
communicated to the cloud data center.
Fig 4.27: Distributed Analytics throughout the IoT Systems.

5.8 EDGE STREAMING ANALYTICS

In management of IoT systems network analytics plays an important role.

It provides a structure for understanding the network traffic patterns. It analyses the
patterns between the communications of different nodes. This analytics helps to find the
abnormal behavior of the network and it would suggest a way to rectify the problems of
the network by providing optimal solution. Network analytics is the best tool for trouble
shooting. The below figure depicts the traffic analytics performed on the router of the
smart grid. This network analytics is performed to analyze the abnormal traffic in the
distributed systems by analyzing the patterns. The protocols TCP and UDP port
numbers are used in the network analytics.

Figure 4.28: Net Flow Example On Smart Grid

Network management services are given below

Network traffic monitoring and profiling: This feature lets you analyze
the network by monitoring the traffic and it rectifies the problem.
Application traffic monitoring and profiling: This kind of monitoring is
done by the protocols MQTT, CoAP, and DNP3

Capacity planning: It helps in analyzing the data for certain period of

time. This analysis may help to monitor the traffic growth.

Security analysis: This kind of analysis is done to monitor the denial of

service attack

Accounting: For this kind of accounting process the software like cisco
Jasper is used for monitoring the flow of data

Figure 4.29 : Flexible netflow overview FNF Components:

Data warehousing and data mining: Data stored in the warehouse will
be analyzed for multiservice application.

Flexible Net flow Architecture: FNF is used for networks and it can be
deployed in the IoT Infrastructure. This has the advantage of flexibility, scalability and it
can check the progress of network packet .It also monitors the network behavior.

FNF Flow monitors (Net flow cache): It is a record with key fields (flow
record) and non key fields (flow of attributes). It monitors the information stored in the
cache. It is the flow exporter it sends the information.

FNF Flow records: It is a predefined records for monitoring the

application of netflow.security detection, traffic analysis and capacity planning are the
information kept in the flow record. User defined records are also present.

FNF Exporter: defines the net flow where the data has to be sent
(destination address).The information from the reporter is being sent to the Net flow
reporting collector.

Flow export timers: timers indicate how many times flow should be
exported to the server
Net flow export format:

Netflow server for collection and reporting: Problems in network the final
destination of the net flow is analyzed by the server.

Flexible Net Flow in Multiservice IoT Networks:

FNF is installed on the routers, this provides the view of multiservice

performed in the IoT network.LoRaWAN cannot perform the net flow analysis.MQTT can
do this only with the help of IoT Broker. Challenges are faced if network does not
support flow analytics, or any additional bandwidth systems has to be reviewed.

4.9 XIVELY CLOUD FOR IOT

Xively is a system for incorporating IoT applications on the cloud. It is

considered to be the PaaS. Xively is defined by a data collection, management, and
distribution infrastructure. It provides a platform to connect and develop
application.Xively comes under the category of Connected Product Management (CPM)
platform. Xively has tools to strengthen your business. Xively is consisidered to be very
beneficial in developing IoT based application or the IoT product. Xively is being
connected to many of the IoT frameworks and microcontrollers to develop a smart
product.

Xively Python Libraries are used to embed python code as per the Xively
APIs. A Xively Web interface is available for creating the front end part. Xively can work
with different programming language platform .HTTP protocols, APIs, MQTT are the
protocols used in Xively. All the devices are connected to Xively Cloud for real-time
processing and archiving to cloud. IOT application developers can write the frontend for
IoT applications as per their requirements. Management of apps is very flexible with
Xively cloud and other APIs. Xively is very popular with companies which deal with IoT
based device manufacturing and development. Companies using Xively has the secure
connectivity of devices and good data management capability.

Xively is an IoT cloud platform that is “an enterprise platform for building,
managing, and deriving business value from connected products”. It is a cloud-based
API with an SDK which simplifies and reduces the time of the development process.

It supports several platforms like

 Android

 Arduino

 Arm mbed
 C

 Java and much more.

How to use Xively?

Step1: Register with Xively to use cloud services. (Programmers or

Developers )

Step 2: Developers can create different devices for which he has to

create an IoT app. Templates are provided in the Web Interface of Xively.

Step 3: Unique FEED_IDV is allocated to the connected devices. It

specifies the data stream of the connected device.

Step 4: IoT devices are assigned using the available APIs. The
permissions are given to perform the Create, Update, Delete and Read operation.

Step 5: Bidirectional channels are created after we connect a device with

Xively. Each channel is unique to the device connected.

Step 6 : Xively cloud is connected with the help of these channels.

Step 7: Xively APIs are used by IoT devices to create communication

enabled products.

Figure 4.30: Xively Cloud Services

Figure 4.31: XIVELY BY LOG MEIN

4.10 PYTHON WEB APPLICATION FRAMEWORK

Web Frameworks for Python

A Web framework is a collection of packages or modules. This framework

makes developers to write Web applications (see WebApplications) or services.
Framework eliminates the need of protocols and sockets. The most of the Web
frameworks are server-side technology. alt, s Web frameworks are including AJAX code
that helps developers with the programming task for (client-side) the user's browser.

This "plugging in" aspect of Web development is often seen as being in

opposition to the classical distinction between programs and libraries, and the notion of
a "main loop" dispatching events to application code is very similar to that found in GUI
programming. Frameworks provide support for a number of activities such as sending
requests, producing responses, and storing data.Full-stack frameworks supply
components for each layer in the stack.

The need for Python frameworks

A Python framework is a platform for developing software applications. It

provides a foundation for Programmers can build programs for a specific platform. A
framework may include predefined classes and functions that can be used to process
input, manage hardware devices, and interact with system software.
Figure 4.32: Python Web Frameworks

4.11 DJANGO:

DJANGO is a Web framework. Django help us to build better Web apps

very fast with less code. Django is a high-level Python Web framework that encourages
rapid development and proper design. Web applications are fast and results with good
performance. Django focuses on automating part. It follows the DRY (Don't Repeat
Yourself) principle. To develop an e- commerce website, Django is best. The execution
of the work would be very fast. It’s free and open source.

Django Python framework advantages:

 The admin structure is powerful and customizing a product is easy.

 Tools like Django Rest Framework helpful for developing mobile apps.

 Django’s ORM is powerful; it streamlines the process of dealing with data.

 Django Python framework cons

 The template system is not considered to be the most powerful.

 Third-party library is used to configure different types of deployment

environments.

 Upgrading Django is not easy as it requires a lot changes to be made in the

code.

Django architecture: The below fig depicts a simple DJANGO framework

with templates and caching framework.
Figure 4.33: DJANGO

4.12 AWS FOR IOT

Billions of devices in homes, factories, oil wells, hospitals, cars, and

thousands of other devices are found in many places. Solutions are needed to connect
them, and collect, store, and analyze device data. AWS IoT provides broad functionality,
spanning the edge to the cloud, for building the IoT solutions virtually across a wide
range of devices for any kind of devices. Since AWS IoT integrates with AI services the
devices become smarter.AWS IoT can easily scale based on the requirements of the
business. AWS IoT provides good security features and preventive security policies.
These policies respond immediately to all security related issues.

AWS IoT provides secure, bi-directional communication between Internet-

connected devices such as sensors, actuators, embedded micro-controllers, or smart
appliances and the AWS Cloud. Data is being collected from from multiple devices,
stored and analyzed. Users or the customers can build applications that will enable
them to control devices from their phones or tablets.

AWS IoT Components

AWS IoT consists of the following components:

 Alexa Voice Service (AVS) Integration for AWS IoT

 This service Brings Alexa Voice to any connected device. AVS for AWS IoT
reduces the cost and complexity of integrating Alexa.

 AVS for AWS IoT enables Alexa built-in functionality on MCUs, such as the
ARM Cortex M class with less than 1 MB embedded RAM. To do so, AVS
offloads memory and compute tasks to a virtual Alexa Built-in device in the
cloud.

Custom Authentication service

This feature allows us to manage our own authentication and

authorization strategy using a custom authentication service and a Lambda function.
Custom authorizers allow AWS IoT to authenticate your devices and authorize
operations using bearer token authentication and authorization strategies. example,
JSON Web Token verification, OAuth provider callout, Device gateway.

This Feature provides devices to securely communicate with AWS IoT.

Device provisioning service

This feature Allows us to provision devices using a template that

describes the resources required for your device: a thing, a certificate, and one or more
policies.

The templates contain variables that are replaced by values in a dictionary

(map).

Device shadow service

A JSON document used to store and retrieve current state information for
a device.

The device can be synchronized with other applications. devices publish

their current state to a shadow for use by other devices.

Group registry

Several devices are managed at once by categorizing them into groups.

Action performed on a parent group will be applied to its child groups, and to all the
devices its child groups as well.

Jobs service

Remote operations are set to the devices connected to AWS IoT. For
example, you can define a job that instructs a set of devices to download and install
application reboot, perform remote troubleshooting process.

Message broker

The MQTT protocol is used for the secure transmission over WebSocket
to publish and subscribe. HTTP REST interface is used to publish.
Registry

Register your devices and associate up to three custom attributes with

each one.

Rules engine

Provides message processing and integration with other AWS services.

SQL-based language to select data from message payloads, and then process and
send the data to other services, such as Amazon S3, Amazon DynamoDB, and AWS
Lambda.

Security and Identity service

Provides shared responsibility for security in the AWS Cloud. The

message broker and rules engine use AWS security features to send data securely to
devices or other AWS services.

AWS IoT solutions

Figure 4.34: Industrial

AWS IoT customers are building industrial IoT applications for predictive
quality and maintenance and to remotely monitor operations.

Figure 4.35: Connected home

AWS IoT customers are building connected home applications for home
automation, home security and monitoring, and home networking.
Figure 4.36: Commercial

AWS IoT customers are building commercial applications for traffic

monitoring, public safety, and health monitoring.

Figure 4.37: AWS IoT Services

Figure 4.38: AWS Analytics Services

Figure 4.39: AWS IOT

4.13 SYSTEM MANAGEMENT WITH NETCONF

NETCONF: Network Configuration Protocol (NETCONF) is a session-

based network management protocol. NETCONF retrieves state or configuration data
and manipulates configuration data on network devices. Network Configuration
Protocol, better known as NETCONF, gives access to the device within a network,
defining methods to manipulate its configuration database, retrieve operational data,
and invoke specific operations. YANG provides the means to define the content carried
via NETCONF, for both data and operations. Together, they help users build network
management applications that meet the needs of network operators.
Figure 4.40: Definition of NETCONF and YANG.

The motivation behind NETCONF and YANG was, instead of individual

devices with functionalities, to have a network management system that manages the
network at the service level that includes:

 Standardized data model (YANG)

 Network-wide configuration transactions

 Validation and roll-back of configuration

 Centralized backup and restore configuration

Businesses have used SNMP for a long time, but it was being used more
for reading device states than for configuring devices. NETCONF and YANG address
the disadvantages of SNMP and it has added the various functionality in network
management, such as:

 Configuration transactions

 Network-wide orchestrated activation

 Network-level validation and roll-back.

 Save and restore configurations

Service provider and enterprise network teams are changing their trends
towards a service oriented approach for managing their networks. IETF’s Network
Configuration Protocol (NETCONF) and YANG, a data modelling language, to help
remove the time, cost and manual steps involved in network element configuration.

NETCONF is the standard for installing, manipulating and deleting

configuration of network devices .YANG is used to model both configuration and state
data of network elements. YANG structures the data definitions into tree structures and
provides many modelling features, including an extensible type system, formal
separation of state and configuration data and a variety of syntactic and semantic
constraints. YANG data definitions are contained in modules and provide a strong set of
features for extensibility and reuse.
Figure 4.41: IoT System Management with NETCONF-YANG

4.14 YANG

YANG is a data modelling language used to model configuration and state

data manipulated by the NETCONF protocol

YANG modules contain the definitions of the configuration data, state

data, RPC calls and can be formatted according to the notifications.

A YANG module defines the data exchanged between the NETCONF

client and server.

A module comprises of a number of 'leaf' nodes which are organized into

a hierarchical tree structure. The 'leaf' nodes are specified using the 'leaf' or 'leaf-list'
constructs. Leaf nodes are organized using 'container' or 'list' constructs. The below fig
depicts the leaf node structure(it is a hierarchical tree structure)

A YANG module can import definitions from other modules. Constraints

can be defined on the data nodes, e.g. allowed values.

YANG can model both configuration data and state data using the 'config'
statement

This YANG module is a YANG version of the toaster MIB

The toaster YANG module begins with the header information followed by
identity declarations which define various bread types.

The leaf nodes (‘toaster Manufacturer’ , ‘toaster Model Number’ and

oaster Status’) are defined in the ‘toaster’ container.

Each leaf node definition has a type and optionally a description and
default value.

The module has two RPC definitions (‘make-toast’ and ‘cancel-toast’).

Figure 4.42: YANG Features With Leaf Nodes

Figure 4.42: Yang Modules

PART-A

1.List the six pillars/components of Cisco IoT Systems.

2.Summarize the use of Watson Conversation services.

3.Summarize on Grid Blocks reference model.

4.Summarize in detail the architecture model of CPwE.

5.Interpret on design and implementation guidance of CPwE.

PART-B

1.Analyze the purpose of the Six-Pillar Approach for Cisco IoT System

2.Examine the Features of IBM on IoT platform, and brief on the services

Provided in it.

3.Analyze in detail the architecture of Converged Plant wide Ethernet

Model with

suitable illustration.

4.Explain in detail about connected lighting architecture with

necessary diagrams.

5.Summarize about IoT strategy for smart city and design the
layered architecture for

implementing smart cities.