Figure 1: Iot Cloud Platform Proliferation Iot Architecture
Figure 1: Iot Cloud Platform Proliferation Iot Architecture
Figure 1: Iot Cloud Platform Proliferation Iot Architecture
Every IoT network must be capable of scaling to securely handle massive amounts of data in each of the stages of IoT
data handling.
IoT Architecture:
1. IoT architectures must be capable of scaling connectivity of devices, data ingestion, data processing, and data
storage.
2. Sending ever-increasing amounts of data to the cloud slows processor times and requires more bandwidth to
transfer and store data.
Key Points-
Fog or Edge computing: The edge refers to the geographic distribution of computing nodes in the
network as Internet of Things devices, which are at the "edge" of a network. This in turn increases the
demand for devices that are capable of cleaning, processing, and analyzing data locally. The result is that
only cleaned metadata is sent to the cloud.
Scaling: Scaling also means that the ability to easily monitor and maintain thousands of devices must
also scale. A system that allows for asynchronous communication would be less brittle. A
communication protocol that separates sending and receiving, such as MQTT, is a necessity in IoT
architecture.
Figure 2: IoT architecture components
GCP Services:
Stages: To accomplish this, Google's IoT architecture can be divided into four stages: data gathering, data ingest, data
processing, and data analysis
1. Data Gathering: Sensors gather data from the environment and send it to the cloud, either directly or through
an intermediary device. Depending on the network, preparation can include cleaning, preprocessing, analysis,
and even machine learning inference
2. Cloud IoT Edge: Cloud IoT Edge is a collection of devices capable of doing real-time analytics and ML. Edge
devices can act on data in real time and predict outcomes locally. There are two components of Cloud IoT Edge:
Edge IoT Core and Edge ML.
3. Ingest and process data: Google IoT Cloud processing data encompasses Cloud IoT Core, Cloud Functions, Cloud
Pub/Sub, and Cloud Dataflow. Cloud Pub/Sub receives messages from devices and publishes them for
subscribers to read. Cloud Dataflow to create data pipelines from the device to its destination, which can be
BigQuery, Cloud Storage, or BigTable. We can use google templates and cloud functions to create pipelines
4. Data Analytics and ML: Data analysis and ML can be done on the Edge or on the cloud. Google's Cloud IoT Core
Data Analytics and ML are fully integrated with IoT data.
Stage_1
Stage_2 Stage_3 Stage_4
An illustration: Often, the value of IoT analytics comes from combining data from the physical world with data from
other sources; for example, online information systems or customer-relationship data. This data often accumulates in
various storage systems in Cloud Storage and is accessible to BigQuery and Cloud Bigtable. Combining historical data,
metadata, and real time streaming data can lead to deeper and actionable data insights.
Cloud IoT platform stages: Google Cloud IoT platform includes the three stages necessary for an IoT pipeline: Data
ingestion, Data processing, and Data analysis.
Data Ingesting: It includes managing and optimizing IoT device data through secure device connections. Real-time data is
collected with sensors. Devices are authorized through Cloud IoT Core. Then the data is uploaded to the cloud through
Cloud Pub/Sub.
Data processing: It includes cleaning and storing the data with on-demand solutions that scale. We can also use Cloud
Dataflow pipeline to direct data to Cloud Storage or BigQuery.
Data Analysis: It includes visualizing and predicting outcomes to generate actionable outcomes. We can use BigQuery,
Cloud Dataprep, and Cloud Machine Learning Engine to analyze data and gain valuable insights.
1. Cloud Functions: Google Cloud Functions is a lightweight compute solution for developers to create single-
purpose, stand-alone functions that respond to Cloud events without the need to manage a server or runtime
environment. Google cloud functions can be created in three languages i.e. Python, Go, Node.js
2. Cloud Dataflow:
3. Cloud IoT Core : A fully managed service to easily and securely connect, manage, and ingest data from globally
dispersed devices
4. Cloud Bigtable: It is a sparsely populated table that can scale to billions of rows and thousands of columns,
enabling you to store terabytes or even petabytes of data. It is ideal for storing very large amounts of single-
keyed data with very low latency.
5. Cloud Storage: It is an online file storage service that is used for storing and accessing data on Google Cloud
Platform. It offers secure, scalable, high-performance access to your data
6. BigQuery: It is a highly scalable enterprise data warehouse that helps you understand your devices' behavior at
scale. It is an enterprise data warehouse that stores and queries large data by enabling super-fast SQL queries.
7. Cloud Machine Learning (ML) Engine: It is a managed service that enables developers and data scientists to build
and bring machine learning models to production. Cloud ML Engine offers training and prediction services that
can be used together or individually.
8. Cloud Datalab: It is a powerful interactive tool created to explore, analyze, transform, and visualize data and
build machine learning models on GCP.
9. Cloud Dataprep: It is an intelligent data service for visually exploring, cleaning, and preparing structured and
unstructured data for analysis, reporting, and machine learning. It is serverless and works at any scale, there is
no infrastructure to deploy or manage
Security: Cloud IoT Core has several security features to protect your IoT network.
1. Devices are authenticated individually. Which means if there is an attack on your IoT network it is
limited to one device and not the whole fleet.
2. There are four public key formats available for devices: RS256 and RSA256_X509, and ES256 and
ES256_X509.
3. We can also define an expiration time for each device credential (public key). After it expires, the key is
ignored but not automatically deleted. If you don't specify an expiration time for a key, it will not
expire.
4. The connection to the cloud is a TLS 1.2 connection, using root certificate authorities (required for
MQTT).
GC IoT Features:
1. It is serverless by design: The cloud acts as the server itself
2. Cloud Has intelligence built-in with ML and AI capabilities: Google Cloud includes all the ML models and
AI capacity available with Google Cloud ML Engine. Edge TPU board is can also perform ML. Performing
ML on edge enables increased privacy with reduced latency.
Sensors and Devices
Data flow process: Sensors are gathering real-world data, and devices prepare it for the cloud. Once it is ready
to send, you need a communication protocol to send this data over the cloud, which indeed can be done in
HTTP or MQTT.
Sensor: It is a module that observes changes in its environment and sends information about these changes to
a device.
Selection of a sensor:
1. Durability: Sensor should operate for a reasonable period of time, without incurring unnecessary maintenance
cost. E.g. For example, a water-resistant temperature sensor may be acceptable for a remote weather station,
but it would be completely unsuitable for monitoring water temperature in a pool because it is not waterproof.
2. Accuracy: To correct monitor the environment accuracy is required, but not beyond a certain cost.
3. Versatility: Sensors must be able to operate within reasonable variations of environment.
4. Power Consumption: low-power, or even very low–power devices
5. Special Environmental Considerations: E.g. when designing a system for monitoring water quality, a sensor that
can be placed within the main water supply piping is far more cost-effective and accurate than a sensor that
requires diverting water samples.
6. Cost: IoT networks usually involve hundreds or even thousands of sensors and devices. Consideration must be
given to the cost of placement, maintenance, reliability, etc.
Devices: A "Thing" in the "Internet of Things" is a processing unit that is capable of connecting to the internet and
exchanging data with the cloud. Devices communicate two types of data: telemetry and state.
Information of devices:
1. Metadata: Metadata contains information about a device. E.g. of Metadata field includes:
a. Identifier (ID) - An identifier that uniquely identifies a device.
b. Class or type
c. Model
d. Revision
e. Date manufactured
f. Hardware serial number
2. Telemetry: It is the data collected by the device. Telemetry is read-only data about the environment, usually
collected through sensors.
Note: When connecting devices to Google Cloud Platform, you will need to specify which communication protocol your
devices will use. The choices are MQTT, HTTP, or both.
Figure 10: Google Cloud Platform
MQTT: It is an industry-standard IoT protocol (Message Queue Telemetry Transport). It is a publish/subscribe (pub/sub)
messaging protocol.
Working of MQTT: The publish/subscribe model is event-driven. Messages are pushed to clients that are subscribed to
the topic. The broker is the hub of communication. Clients publish messages to the broker, and the broker pushes
messages out to subscribers. MQTT is a highly scalable architecture. There must be an open TCP connection to the
broker.
HTTP: It is a "connectionless" protocol: with the HTTP Bridge, devices do not maintain a connection to the cloud.
Instead, they send requests and receive responses.
Figure 12: HTTP model
Working of HTTP: In connectionless communication, client requests are sent without having to first check that the
recipient is available. This means that devices have no way of knowing whether they are in a conversation with the
server, and vice versa. This means some of the features that Cloud IoT Core provides, for example, last Heartbeat
detected, will not be available with an HTTP connection
MQTT HTTP
Lower bandwidth usage Lighter weight
Lower latency, higher throughput Fewer firewall issues
Supports raw binary data Binary data must be base 64 encoded
Data focused Document focused
Note: Both bridges use public key (asymmetric) device authentication and JSON Web Tokens (JWTs)
1. Cloud Pub/Sub is a fully-managed real-time messaging service that allows you to send and receive messages
between independent applications.
2. It is an independent, scalable, managed messaging queuing service that will guarantee delivery of all of those
individual messages. It will hold on to that data for up to seven days.
3. It does not guarantee first in first out. So, we’re not guaranteed to get things in order.
4. It is a globally managed service with extremely low latency.
5. Cloud Pub/Sub uses two levels of indirection between the publisher and the subscriber.
6. Cloud Pub/Sub is message-oriented middleware to the cloud. It is the foundation of a simple, reliable, scalable
foundation for streaming analytics and event-driven computing.
7. It does this by the senders and receivers, allowing for secure, highly available communication between devices
and services.
8. Cloud Pub/Sub ingests event streams and delivers them to Cloud Dataflow. Cloud Dataflow processes the data
and delivers it to BigQuery for analysis and storage or to Google Cloud storage.
Figure 16: Process in Cloud Pub/Sub_2 for N IoT devices with behemoth complexity
Figure 17: Integration of Publisher & Subscribers
Flow of Pub/Sub: Cloud Pub/Sub ingests event streams and delivers them to Cloud Dataflow. Cloud Dataflow processes
the data and delivers it to BigQuery for analysis and storage or to Google Cloud Storage.
Definition: Cloud IoT Core is a 100% managed service, which means there is no need for us to do autoscaling, setup
redundancy, database partitioning, or resource pre-provisioning. We can connect one or millions of devices, and Cloud
IoT Core will scale to meet your needs.
1. Device manager: To register devices with the service, hence to monitor and configure them.
2. MQTT HTTP protocol bridge: To connect to google cloud platform.
Note: Device telemetry data is forwarded to a Cloud Pub/Sub topic, which can then be used to trigger Cloud Functions.
We can also perform streaming analysis with Cloud Dataflow or custom analysis with our own subscribers.
Cloud IoT Core, using Cloud Pub/Sub, can combine device data that is widely distributed into a single global system.
Process: Cloud IoT combines MQTT protocol with the highest level of security (TLS 1.2 with certificates), and it is a single
GLOBAL endpoint. When communicating with a device, we don’t need to know the device location, and we don't have to
replicate its configuration in each region. Data is automatically published to Cloud Pub/Sub and is accessible globally.
Device Registration: In order for a device to connect, it must first be registered in the device manage. The device
manager can be used through the Google Cloud Platform Console, G-cloud commands, or the REST-style API.
Device registries: It is a container of devices, it is created either by using MQTT, HTTP protocol or both.
Note:
1. Each device registry is created in a specific cloud region and belongs to a cloud project
2. A registry is identified in the cloudiot.googleapis.com service by its full name as: projects/{project-
id}/locations/{cloud-region}/registries/{registry-id}.
3. The device registry is configured with one or more Cloud Pub/Sub topics to which telemetry events are
published for all devices in that registry. A single topic can be used to collect data across all regions.
4. Stackdriver monitoring is automatically enabled for each registry.
5. Cloud Identity and Access Management (IAM) can be used for access control, granting users permission to view,
provision, or fully manage devices.
Definition: It is unified object storage. You can store and/or retrieve data from anywhere in the world, at any time.
Types of Storage: Multi-regional, regional, nearline and coldline
All storage classes offer low latency (time to first byte is typically tens of milliseconds) and high durability. The
classes differ by their availability, minimum storage durations, and pricing for storage and access.
Storage Class Name for APIs and gsutil Minimum storage duration Typical monthly availability
Standard Storage Standard None >99.99% in multi-regions and dual-
regions
99.99% in regions
Nearline Storage nearline 30 days 99.95% in multi-regions and dual-
regions, 99.9% in regions
Coldline Storage coldline 90days 99.95% in multi-regions and dual-
regions, 99.9% in regions
Project: All data in Cloud Storage belongs inside a project. A project consists of a set of users, a set of APIs, and billing,
authentication, and monitoring settings for those APIs.
Definition: These are the basic containers that which hold the data, but unlike directories and folders, nesting of buckets
is not possible. Bucket name and location can be changed by deleting and re-creating the bucket.
Buckets are used to integrate storage into your apps, access data instantly from any storage class, and they are designed
for secure and durable storage
Objects: These are the individual pieces of data that you store in Cloud Storage. No limit on no. of objects created.
Cloud Dataflow
Definition: It is a fully managed service for transforming and enriching data in stream (real time) or batch (historical)
modes. It uses a server-less approach to resource provisioning and management.
Figure 21: Cloud dataflow
1. Cloud Functions: Cloud Dataflow provides a highly capable analytics tool that can be applied to streaming and
batch data. Cloud Functions allows you to write custom logic that can be applied to each event as it arrives. It
can also trigger alerts, filter invalid data, or invoke other API. Cloud Functions can operate on each published
event individually.
2. Apache Beam SDK-based pipelines
3. Cloud Dataflow templates
Illustration: IoT events and data can be sent to the cloud at a high rate and need to be processed quickly. For many IoT
applications, the decision to place the device into the physical environment is made in order to provide faster access to
data. For example, fruit exposed to high temperatures during shipping may become damaged. Using data gathered from
IoT devices, the produce can be flagged and disposed of immediately. In order to analysis data with more sophisticated
techniques we can include time-windowing techniques or converging data from multiple streams
Figure 22:
Templates: It allows to stage pipelines on Cloud Storage and execute them from a variety of environments.
1. Pipeline execution does not require code compilation for each run (like in Apache & Cloud functions)
2. One can execute your pipelines without the development environment and associated dependencies, which is
useful for recurring batch jobs.
3. One can customize the execution of the pipeline with runtime parameters.
Source Destination
Cloud Bigtable Cloud Storage SequenceFile
Cloud Pub/Sub BigQuery
Cloud Pub/Sub Cloud Storage Text
Cloud Pub/Sub Cloud Pub/Sub
Cloud Storage Text Cloud Pub/Sub - batch
Cloud Storage Text Cloud Pub/Sub - stream
Cloud Storage Text BigQuery
Cloud Storage Text Cloud Datastore
Cloud Datastore Cloud Storage Text
Cloud Storage SequenceFile Cloud Bigtable
Cloud Spanner Cloud Storage Avro
Cloud Storage Avro Cloud Spanner
Word Count Cloud Storage to Cloud Storage
Template Task
Bulk Compress Cloud Storage Files Bulk compression of files
Bulk Decompress Cloud Storage Files Bulk decompression of files
Cloud Datastore Bulk Delete Bulk delete of files
1. Developers create a development environment and develop their pipeline. The environment includes the
Apache Beam SDK and other dependencies.
2. Users execute the pipeline from the development environment. The Apache Beam SDK stages files in Cloud
Storage, creates a job request file, and submits the file to the Cloud Dataflow service.
3. In Cloud Dataflow templates, staging and execution are separate steps. This separation gives an additional
flexibility to decide who can run jobs and where the jobs are run from.
1. Developers create a development environment and develop their pipeline. The environment includes the
Apache Beam SDK and other dependencies.
2. Developers execute the pipeline and create a template. The Apache Beam SDK stages files in Cloud Storage,
creates a template file (similar to job request), and saves the template file in Cloud Storage.
3. Non-developer users can easily execute jobs with the GCP Console, gcloud command-line tool, or the REST API
to submit template file execution requests to the Cloud Dataflow service.
Definition: Pipelines manage data after it arrives on Google Cloud Platform, similar to how parts are managed on a
factory line.
1. Transforming data: One can the data into another format, for example, converting a captured device signal
voltage to a calibrated unit measure of temperature.
2. Aggregating and computing data: Data can be aggregated and mathematical operations can be applied to it.
3. Enriching data: Data can be combined with other datasets from another devices. E.g. weather or traffic data, for
use in subsequent analysis.
4. Moving data: Data can be stored in more than one final storage locations.
Cloud Dataflow: It is built to perform all pipeline tasks on both batch and streaming data