Figure 1: Iot Cloud Platform Proliferation Iot Architecture

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

IoT Cloud Platform

Every IoT network must be capable of scaling to securely handle massive amounts of data in each of the stages of IoT
data handling.

Figure 1: IoT Cloud Platform proliferation

IoT Architecture:

1. IoT architectures must be capable of scaling connectivity of devices, data ingestion, data processing, and data
storage.
2. Sending ever-increasing amounts of data to the cloud slows processor times and requires more bandwidth to
transfer and store data.

Key Points-

Fog or Edge computing: The edge refers to the geographic distribution of computing nodes in the
network as Internet of Things devices, which are at the "edge" of a network. This in turn increases the
demand for devices that are capable of cleaning, processing, and analyzing data locally. The result is that
only cleaned metadata is sent to the cloud.

Scaling: Scaling also means that the ability to easily monitor and maintain thousands of devices must
also scale. A system that allows for asynchronous communication would be less brittle. A
communication protocol that separates sending and receiving, such as MQTT, is a necessity in IoT
architecture.
Figure 2: IoT architecture components

Google's Cloud IoT Architecture:

GCP Services:

Figure 3: Google Cloud Services

Stages: To accomplish this, Google's IoT architecture can be divided into four stages: data gathering, data ingest, data
processing, and data analysis

1. Data Gathering: Sensors gather data from the environment and send it to the cloud, either directly or through
an intermediary device. Depending on the network, preparation can include cleaning, preprocessing, analysis,
and even machine learning inference
2. Cloud IoT Edge: Cloud IoT Edge is a collection of devices capable of doing real-time analytics and ML. Edge
devices can act on data in real time and predict outcomes locally. There are two components of Cloud IoT Edge:
Edge IoT Core and Edge ML.
3. Ingest and process data: Google IoT Cloud processing data encompasses Cloud IoT Core, Cloud Functions, Cloud
Pub/Sub, and Cloud Dataflow. Cloud Pub/Sub receives messages from devices and publishes them for
subscribers to read. Cloud Dataflow to create data pipelines from the device to its destination, which can be
BigQuery, Cloud Storage, or BigTable. We can use google templates and cloud functions to create pipelines
4. Data Analytics and ML: Data analysis and ML can be done on the Edge or on the cloud. Google's Cloud IoT Core
Data Analytics and ML are fully integrated with IoT data.

Stage_1
Stage_2 Stage_3 Stage_4

Figure 4: Google's Cloud IoT Architecture stages

Figure 5: Ingest and processing of data

An illustration: Often, the value of IoT analytics comes from combining data from the physical world with data from
other sources; for example, online information systems or customer-relationship data. This data often accumulates in
various storage systems in Cloud Storage and is accessible to BigQuery and Cloud Bigtable. Combining historical data,
metadata, and real time streaming data can lead to deeper and actionable data insights.

Cloud IoT platform stages: Google Cloud IoT platform includes the three stages necessary for an IoT pipeline: Data
ingestion, Data processing, and Data analysis.
Data Ingesting: It includes managing and optimizing IoT device data through secure device connections. Real-time data is
collected with sensors. Devices are authorized through Cloud IoT Core. Then the data is uploaded to the cloud through
Cloud Pub/Sub.

Data processing: It includes cleaning and storing the data with on-demand solutions that scale. We can also use Cloud
Dataflow pipeline to direct data to Cloud Storage or BigQuery.

Data Analysis: It includes visualizing and predicting outcomes to generate actionable outcomes. We can use BigQuery,
Cloud Dataprep, and Cloud Machine Learning Engine to analyze data and gain valuable insights.

Some important points:

1. Cloud Functions: Google Cloud Functions is a lightweight compute solution for developers to create single-
purpose, stand-alone functions that respond to Cloud events without the need to manage a server or runtime
environment. Google cloud functions can be created in three languages i.e. Python, Go, Node.js
2. Cloud Dataflow:
3. Cloud IoT Core : A fully managed service to easily and securely connect, manage, and ingest data from globally
dispersed devices
4. Cloud Bigtable: It is a sparsely populated table that can scale to billions of rows and thousands of columns,
enabling you to store terabytes or even petabytes of data. It is ideal for storing very large amounts of single-
keyed data with very low latency.
5. Cloud Storage: It is an online file storage service that is used for storing and accessing data on Google Cloud
Platform. It offers secure, scalable, high-performance access to your data
6. BigQuery: It is a highly scalable enterprise data warehouse that helps you understand your devices' behavior at
scale. It is an enterprise data warehouse that stores and queries large data by enabling super-fast SQL queries.
7. Cloud Machine Learning (ML) Engine: It is a managed service that enables developers and data scientists to build
and bring machine learning models to production. Cloud ML Engine offers training and prediction services that
can be used together or individually.
8. Cloud Datalab: It is a powerful interactive tool created to explore, analyze, transform, and visualize data and
build machine learning models on GCP.
9. Cloud Dataprep: It is an intelligent data service for visually exploring, cleaning, and preparing structured and
unstructured data for analysis, reporting, and machine learning. It is serverless and works at any scale, there is
no infrastructure to deploy or manage

Figure 6: Cloud IoT platform stages


A must: Devices in an IoT network must be securely connected to the network, new devices must be easily
added, and all devices must be easily updated when necessary.
Cloud IoT Core is where users create registries and devices. A Pub/Sub topic is selected when a registry is
created. Authorizations and keys are associated with each device as it is added to the registry.
Device management on Cloud IoT covers the three main concerns of sensor and device management: Adding
new devices, Monitoring devices, and Updating devices.
Adding new devices:
1. Must have an ID, basic metadata
2. Credentials and Authentication are checked
3. Devices are registered and tracked, details of the device include: Heartbeat, telemetry event received,
config set, config acknowledge, and errors.
Monitoring devices: Cloud IoT Core monitors the daily operations and status of devices with Stackdriver
Logging with different levels. i.e.
None - no logging of the device is maintained by Stackdriver
Error - record only error messages associated with the device
Info - log errors, status, and state of the device
Debug - record debug level information for the device
Updating devices: Cloud IoT gives you the option to push updates over the air (OTA)

Security: Cloud IoT Core has several security features to protect your IoT network.
1. Devices are authenticated individually. Which means if there is an attack on your IoT network it is
limited to one device and not the whole fleet.
2. There are four public key formats available for devices: RS256 and RSA256_X509, and ES256 and
ES256_X509.
3. We can also define an expiration time for each device credential (public key). After it expires, the key is
ignored but not automatically deleted. If you don't specify an expiration time for a key, it will not
expire.
4. The connection to the cloud is a TLS 1.2 connection, using root certificate authorities (required for
MQTT).

GC IoT Features:
1. It is serverless by design: The cloud acts as the server itself
2. Cloud Has intelligence built-in with ML and AI capabilities: Google Cloud includes all the ML models and
AI capacity available with Google Cloud ML Engine. Edge TPU board is can also perform ML. Performing
ML on edge enables increased privacy with reduced latency.
Sensors and Devices
Data flow process: Sensors are gathering real-world data, and devices prepare it for the cloud. Once it is ready
to send, you need a communication protocol to send this data over the cloud, which indeed can be done in
HTTP or MQTT.
Sensor: It is a module that observes changes in its environment and sends information about these changes to
a device.

Figure 7: Data flow process: Sensors

Acc. to signals produced by the sensors


Type Definition Example
A temperature sensor that changes
Does not require external power to operate. They
passive resistance in response to temperature
respond to input from their environment.
changes
Active Requires external power to operate Camera, etc.
Acc. to Sensors can be divided by their external power requirements:
analog Outputs an analog continuous signal Accelerometers, temperature sensors
The output is converted to discrete values (digital 1s Digital pressure sensor, digital
Digital
and 0s) before transmitting to a device temperature sensor
Acc. to measuring of data
chemical Responds to chemical changes in its environment Gas sensor
mechanical Responds to physical changes in its environment Microswitch
Electrical Responds to electrical changes in its environment Optical sensor
Figure 8: Types of sensors

Selection of a sensor:

1. Durability: Sensor should operate for a reasonable period of time, without incurring unnecessary maintenance
cost. E.g. For example, a water-resistant temperature sensor may be acceptable for a remote weather station,
but it would be completely unsuitable for monitoring water temperature in a pool because it is not waterproof.
2. Accuracy: To correct monitor the environment accuracy is required, but not beyond a certain cost.
3. Versatility: Sensors must be able to operate within reasonable variations of environment.
4. Power Consumption: low-power, or even very low–power devices
5. Special Environmental Considerations: E.g. when designing a system for monitoring water quality, a sensor that
can be placed within the main water supply piping is far more cost-effective and accurate than a sensor that
requires diverting water samples.
6. Cost: IoT networks usually involve hundreds or even thousands of sensors and devices. Consideration must be
given to the cost of placement, maintenance, reliability, etc.

Devices: A "Thing" in the "Internet of Things" is a processing unit that is capable of connecting to the internet and
exchanging data with the cloud. Devices communicate two types of data: telemetry and state.

Figure 9: Flow of Information: Devices

Information of devices:

1. Metadata: Metadata contains information about a device. E.g. of Metadata field includes:
a. Identifier (ID) - An identifier that uniquely identifies a device.
b. Class or type
c. Model
d. Revision
e. Date manufactured
f. Hardware serial number
2. Telemetry: It is the data collected by the device. Telemetry is read-only data about the environment, usually
collected through sensors.

Communicating with Devices

Note: When connecting devices to Google Cloud Platform, you will need to specify which communication protocol your
devices will use. The choices are MQTT, HTTP, or both.
Figure 10: Google Cloud Platform

MQTT: It is an industry-standard IoT protocol (Message Queue Telemetry Transport). It is a publish/subscribe (pub/sub)
messaging protocol.

Figure 11: publish/subscribe model of MQTT

Working of MQTT: The publish/subscribe model is event-driven. Messages are pushed to clients that are subscribed to
the topic. The broker is the hub of communication. Clients publish messages to the broker, and the broker pushes
messages out to subscribers. MQTT is a highly scalable architecture. There must be an open TCP connection to the
broker.

HTTP: It is a "connectionless" protocol: with the HTTP Bridge, devices do not maintain a connection to the cloud.
Instead, they send requests and receive responses.
Figure 12: HTTP model

Working of HTTP: In connectionless communication, client requests are sent without having to first check that the
recipient is available. This means that devices have no way of knowing whether they are in a conversation with the
server, and vice versa. This means some of the features that Cloud IoT Core provides, for example, last Heartbeat
detected, will not be available with an HTTP connection

MQTT HTTP
Lower bandwidth usage Lighter weight
Lower latency, higher throughput Fewer firewall issues
Supports raw binary data Binary data must be base 64 encoded
Data focused Document focused

MQTT has three levels of service:

1. At most once. Guarantees at least one attempt at delivery.


2. At least once. Guarantees the message will be delivered at least once.
3. Exactly once. Guarantees the message is delivered only once
4. Last will and testament. E.g. If a client (ie device) is disconnected unexpectedly, the subscribers will be notified
by the MQTT broker.
5. Retained messages. New subscribers will get an immediate status update

Note: Both bridges use public key (asymmetric) device authentication and JSON Web Tokens (JWTs)

Google Cloud Services


Cloud Pub/Sub: It is an integral Google Cloud IoT. It interacts with Cloud IoT Core, Cloud Functions, and Cloud Dataflow.

Figure 13: Google Cloud Pub/Sub

1. Cloud Pub/Sub is a fully-managed real-time messaging service that allows you to send and receive messages
between independent applications.
2. It is an independent, scalable, managed messaging queuing service that will guarantee delivery of all of those
individual messages. It will hold on to that data for up to seven days.
3. It does not guarantee first in first out. So, we’re not guaranteed to get things in order.
4. It is a globally managed service with extremely low latency.
5. Cloud Pub/Sub uses two levels of indirection between the publisher and the subscriber.
6. Cloud Pub/Sub is message-oriented middleware to the cloud. It is the foundation of a simple, reliable, scalable
foundation for streaming analytics and event-driven computing.
7. It does this by the senders and receivers, allowing for secure, highly available communication between devices
and services.
8. Cloud Pub/Sub ingests event streams and delivers them to Cloud Dataflow. Cloud Dataflow processes the data
and delivers it to BigQuery for analysis and storage or to Google Cloud storage.

Figure 14: Process in Cloud Pub/Sub_1


Figure 15: Process in Cloud Pub/Sub_2 for Multiple IoT devices

Figure 16: Process in Cloud Pub/Sub_2 for N IoT devices with behemoth complexity
Figure 17: Integration of Publisher & Subscribers

Figure 18: Applications for Cloud Pub/Sub


Figure 19: Process flow for Cloud Pub/Sub

Flow of Pub/Sub: Cloud Pub/Sub ingests event streams and delivers them to Cloud Dataflow. Cloud Dataflow processes
the data and delivers it to BigQuery for analysis and storage or to Google Cloud Storage.

Cloud IoT Core

Definition: Cloud IoT Core is a 100% managed service, which means there is no need for us to do autoscaling, setup
redundancy, database partitioning, or resource pre-provisioning. We can connect one or millions of devices, and Cloud
IoT Core will scale to meet your needs.

Two main components of Cloud IoT Core:

1. Device manager: To register devices with the service, hence to monitor and configure them.
2. MQTT HTTP protocol bridge: To connect to google cloud platform.

Note: Device telemetry data is forwarded to a Cloud Pub/Sub topic, which can then be used to trigger Cloud Functions.
We can also perform streaming analysis with Cloud Dataflow or custom analysis with our own subscribers.

Cloud IoT Core, using Cloud Pub/Sub, can combine device data that is widely distributed into a single global system.
Process: Cloud IoT combines MQTT protocol with the highest level of security (TLS 1.2 with certificates), and it is a single
GLOBAL endpoint. When communicating with a device, we don’t need to know the device location, and we don't have to
replicate its configuration in each region. Data is automatically published to Cloud Pub/Sub and is accessible globally.

Device Registration: In order for a device to connect, it must first be registered in the device manage. The device
manager can be used through the Google Cloud Platform Console, G-cloud commands, or the REST-style API.

Device registries: It is a container of devices, it is created either by using MQTT, HTTP protocol or both.

Note:

1. Each device registry is created in a specific cloud region and belongs to a cloud project
2. A registry is identified in the cloudiot.googleapis.com service by its full name as: projects/{project-
id}/locations/{cloud-region}/registries/{registry-id}.
3. The device registry is configured with one or more Cloud Pub/Sub topics to which telemetry events are
published for all devices in that registry. A single topic can be used to collect data across all regions.
4. Stackdriver monitoring is automatically enabled for each registry.
5. Cloud Identity and Access Management (IAM) can be used for access control, granting users permission to view,
provision, or fully manage devices.

Note: Cloud IoT Core supports HTTP 1.1 only.


Figure 20

MQTT bridge HTTP bridge


Device connection is maintained Connectionless (request/response)
Full-duplex TCP connection Half-duplex TCP connection
JWT is sent in the password field of the CONNECT JWT is sent in the header of the HTTP request
message
Telemetry events are pushed to Cloud Pub/Sub Telemetry events are pushed to Cloud Pub/Sub
Device connection status is reported No device connection status reported
Device configurations are propagated via subscriptions Device configurations must be explicitly requested (via
polling)
Most recent configuration (whether newer or not) is Devices can specify that only newer configurations should
always received by devices on subscription be received
Device configurations are acknowledged (ACKed) when No explicit ACK for device configurations
using QoS 1
Last device heartbeat time is retained No device heartbeat data

Google Cloud Storage

Definition: It is unified object storage. You can store and/or retrieve data from anywhere in the world, at any time.
Types of Storage: Multi-regional, regional, nearline and coldline

All storage classes offer low latency (time to first byte is typically tens of milliseconds) and high durability. The
classes differ by their availability, minimum storage durations, and pricing for storage and access.
Storage Class Name for APIs and gsutil Minimum storage duration Typical monthly availability
Standard Storage Standard None  >99.99% in multi-regions and dual-
regions
 99.99% in regions
Nearline Storage nearline 30 days  99.95% in multi-regions and dual-
regions, 99.9% in regions
Coldline Storage coldline 90days  99.95% in multi-regions and dual-
regions, 99.9% in regions

Multi-regional Regional Nearline Coldline


Good for highest availability of data accessed frequently data accessed less Data accessed less
frequently accessed data within a region than once a month than once a year
Redundancy Geo-redundant Regional, redundant Regional Regional
across availability zones
Applications Video, multimedia, Transcoding, data Store infrequently Archive storage,
business continuity analytics, within a region accessed content backup and recovery

Project: All data in Cloud Storage belongs inside a project. A project consists of a set of users, a set of APIs, and billing,
authentication, and monitoring settings for those APIs.

Buckets: A bucket has three properties:

1. A globally unique name


2. A location where the bucket and its contents are stored
3. A default storage class for objects added to the bucket

Definition: These are the basic containers that which hold the data, but unlike directories and folders, nesting of buckets
is not possible. Bucket name and location can be changed by deleting and re-creating the bucket.

Buckets are used to integrate storage into your apps, access data instantly from any storage class, and they are designed
for secure and durable storage

Objects: These are the individual pieces of data that you store in Cloud Storage. No limit on no. of objects created.

Objects have two components:

1. Object Data: It is a file which is stored in cloud storage.


2. Object Metadata: Object metadata is a collection of name-value pairs that describe various object qualities.

Cloud Dataflow

Definition: It is a fully managed service for transforming and enriching data in stream (real time) or batch (historical)
modes. It uses a server-less approach to resource provisioning and management.
Figure 21: Cloud dataflow

Types of dataflow pipelines:

1. Batch type: processing bounded input like a file or database table


2. Streaming type: processing unbounded input from a source like Cloud Pub/Sub

Google Cloud pipelines can be created by three methods:

1. Cloud Functions: Cloud Dataflow provides a highly capable analytics tool that can be applied to streaming and
batch data. Cloud Functions allows you to write custom logic that can be applied to each event as it arrives. It
can also trigger alerts, filter invalid data, or invoke other API. Cloud Functions can operate on each published
event individually.
2. Apache Beam SDK-based pipelines
3. Cloud Dataflow templates

Illustration: IoT events and data can be sent to the cloud at a high rate and need to be processed quickly. For many IoT
applications, the decision to place the device into the physical environment is made in order to provide faster access to
data. For example, fruit exposed to high temperatures during shipping may become damaged. Using data gathered from
IoT devices, the produce can be flagged and disposed of immediately. In order to analysis data with more sophisticated
techniques we can include time-windowing techniques or converging data from multiple streams

Figure 22:

Dataflow pipelines are based on Apache Beam:

The basic concepts of Apache Beam programming include:


1. PCollections: The PCollection abstraction represents a potentially distributed, multi-element data set that acts as
the pipeline's data. Beam transforms use PCollection objects as inputs and outputs.
2. Transforms: These are the operations in our pipeline. A transform takes a PCollection (or multiple PCollections)
as input, performs an operation that specify on each element in that collection, and produces a new output
PCollection.
3. Pipeline I/O: Beam provides read and write transforms for several common data storage types and allows you to
create our own data.

Figure 23: Dataflow by Apache Beam

Templates: It allows to stage pipelines on Cloud Storage and execute them from a variety of environments.

Templates have some advantages over traditional Cloud Dataflow deployments:

1. Pipeline execution does not require code compilation for each run (like in Apache & Cloud functions)
2. One can execute your pipelines without the development environment and associated dependencies, which is
useful for recurring batch jobs.
3. One can customize the execution of the pipeline with runtime parameters.

Templates can be executed using:

1. Google Cloud Platform Console


2. Gcloud commands in the shell
3. The REST API

Google provided templates

Source Destination
Cloud Bigtable Cloud Storage SequenceFile
Cloud Pub/Sub BigQuery
Cloud Pub/Sub Cloud Storage Text
Cloud Pub/Sub Cloud Pub/Sub
Cloud Storage Text Cloud Pub/Sub - batch
Cloud Storage Text Cloud Pub/Sub - stream
Cloud Storage Text BigQuery
Cloud Storage Text Cloud Datastore
Cloud Datastore Cloud Storage Text
Cloud Storage SequenceFile Cloud Bigtable
Cloud Spanner Cloud Storage Avro
Cloud Storage Avro Cloud Spanner
Word Count Cloud Storage to Cloud Storage
Template Task
Bulk Compress Cloud Storage Files Bulk compression of files
Bulk Decompress Cloud Storage Files Bulk decompression of files
Cloud Datastore Bulk Delete Bulk delete of files

Traditional Cloud Dataflow jobs:

Figure 24: Traditional Cloud dataflow

1. Developers create a development environment and develop their pipeline. The environment includes the
Apache Beam SDK and other dependencies.
2. Users execute the pipeline from the development environment. The Apache Beam SDK stages files in Cloud
Storage, creates a job request file, and submits the file to the Cloud Dataflow service.
3. In Cloud Dataflow templates, staging and execution are separate steps. This separation gives an additional
flexibility to decide who can run jobs and where the jobs are run from.

Templated Cloud Dataflow jobs:

1. Developers create a development environment and develop their pipeline. The environment includes the
Apache Beam SDK and other dependencies.
2. Developers execute the pipeline and create a template. The Apache Beam SDK stages files in Cloud Storage,
creates a template file (similar to job request), and saves the template file in Cloud Storage.
3. Non-developer users can easily execute jobs with the GCP Console, gcloud command-line tool, or the REST API
to submit template file execution requests to the Cloud Dataflow service.

Figure 25: Templated Cloud dataflow


Pipelines

Definition: Pipelines manage data after it arrives on Google Cloud Platform, similar to how parts are managed on a
factory line.

Process steps in Pipeline:

1. Transforming data: One can the data into another format, for example, converting a captured device signal
voltage to a calibrated unit measure of temperature.
2. Aggregating and computing data: Data can be aggregated and mathematical operations can be applied to it.
3. Enriching data: Data can be combined with other datasets from another devices. E.g. weather or traffic data, for
use in subsequent analysis.
4. Moving data: Data can be stored in more than one final storage locations.

Cloud Dataflow: It is built to perform all pipeline tasks on both batch and streaming data

You might also like