0% found this document useful (0 votes)
10 views8 pages

009 Ex 7

The document outlines the implementation of a real-time IoT data analytics system using Azure services. It details the steps to collect IoT device data via Azure IoT Hub, process it with PySpark in Azure Synapse, and store the cleaned data in Azure Data Lake Storage Gen2. The successful completion of the process enables efficient downstream analytics of the IoT data.

Uploaded by

rennierra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views8 pages

009 Ex 7

The document outlines the implementation of a real-time IoT data analytics system using Azure services. It details the steps to collect IoT device data via Azure IoT Hub, process it with PySpark in Azure Synapse, and store the cleaned data in Azure Data Lake Storage Gen2. The successful completion of the process enables efficient downstream analytics of the IoT data.

Uploaded by

rennierra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

EX NO:

DATE:

AIM:

To implement real-time IoT data analytics by collecting IoT device data through Azure

IoT Hub, streaming it using Azure Stream Analytics, transforming the data using

PySpark scripts in Azure Synapse Spark Pools, and storing the cleaned data into Azure

Data Lake Storage Gen2 for downstream analytics.

PROCEDURE:

● First, create an IoT Hub named iothubnum1 and register a device called

device1.

● Then, simulate IoT data using a Python script in VS Code by sending random

temperature and humidity values to the IoT Hub.

● Create an Azure Data Lake Storage Gen2 account named iotdatalakenum, and

enable the hierarchical namespace option while creating it.

● Inside the storage account, create two containers named raw-data and

processed-data to separately store incoming raw and processed data.

● Next, set up a Stream Analytics Job named iotStreamJob where the input will

be the IoT Hub (iotInput) and the output will be the raw-data container of the

storage account with the path pattern raw/{date}/{time}.

● Create an Azure Synapse Analytics Workspace using basic settings, allow

public network access, and complete the workspace creation.

● After that, create an Apache Spark Pool inside Synapse named iotSparkPool,

choose node size as Small, and set auto-pause to 5 minutes to save credits.

● Then, in Synapse Studio, create a new Notebook, attach it to the iotSparkPool,

and write a PySpark script to read data from the raw-data container.

● Apply transformations to clean and convert the data types (temperature to float

and humidity to int) using PySpark commands.


● Write the transformed data into the processed-data container in Parquet file

format for efficient storage and future analytics.

● Finally, the entire process of ingesting IoT data, transforming it using PySpark,

and storing the cleaned data into ADLS Gen2 is completed successfully.

PYSPARK CODE:

PySpark ETL Job for IoT Data Processing in Azure Synapse:

from pyspark.sql.functions import col

# Read from ADLS raw container

df = spark.read.option("header",

"true").csv("abfss://[email protected]/raw/")

# Transformations

df_clean = df.withColumn("temperature", col("temperature").cast("float")) \

.withColumn("humidity", col("humidity").cast("int"))

# Write to ADLS processed container in Parquet format

df_clean.write.mode("overwrite").parquet("abfss://processed-data@iotdatalake

num.dfs.core.windows.net/cleaned/")

OUTPUT:

AZURE DASHBOARD:
PYTHON SCRIPT FOR SENDING SIMULATED IOT TELEMETRY DATA TO AZURE IOT
HUB:

IOT SPARK POOL


BANDWIDTH UTILIZATION REPORT - EGRESS AND INGRESS TRAFFIC

SERVICE MONITORING: END-TO-END LATENCY AND REQUEST SUCCESS RATES

STORAGE – CONTAINER CREATED :


STREAM ANALYTICS TEST QUERY FOR IOT DATA PROCESSING

AZURE SYNAPSE SPARK POOL RESOURCE ALLOCATION:

SYNAPSE ANALYTICS WORKSPACE:


DATA LAKE STORAGE STRUCTURE IN AZURE SYNAPSE:

AZURE SYNAPSE SPARK NOTEBOOK: IOT DATA TRANSFORMATION PIPELINE


AZURE SYNAPSE SPARK JOB MONITORING DASHBOARD

SPARK LOGS:

SPARKPOOL ACTIVE SESSIONS DASHBOARD:

RESULT:
The real-time IoT data analytics system was successfully implemented by streaming

IoT data from the IoT Hub to Azure Synapse Analytics, transforming the data using

PySpark in Synapse Notebooks, and storing the processed data into Azure Data Lake

Storage Gen2 for further analysis.

You might also like