009 Ex 7
009 Ex 7
DATE:
AIM:
To implement real-time IoT data analytics by collecting IoT device data through Azure
IoT Hub, streaming it using Azure Stream Analytics, transforming the data using
PySpark scripts in Azure Synapse Spark Pools, and storing the cleaned data into Azure
PROCEDURE:
● First, create an IoT Hub named iothubnum1 and register a device called
device1.
● Then, simulate IoT data using a Python script in VS Code by sending random
● Create an Azure Data Lake Storage Gen2 account named iotdatalakenum, and
● Inside the storage account, create two containers named raw-data and
● Next, set up a Stream Analytics Job named iotStreamJob where the input will
be the IoT Hub (iotInput) and the output will be the raw-data container of the
● After that, create an Apache Spark Pool inside Synapse named iotSparkPool,
choose node size as Small, and set auto-pause to 5 minutes to save credits.
and write a PySpark script to read data from the raw-data container.
● Apply transformations to clean and convert the data types (temperature to float
● Finally, the entire process of ingesting IoT data, transforming it using PySpark,
and storing the cleaned data into ADLS Gen2 is completed successfully.
PYSPARK CODE:
df = spark.read.option("header",
"true").csv("abfss://[email protected]/raw/")
# Transformations
.withColumn("humidity", col("humidity").cast("int"))
df_clean.write.mode("overwrite").parquet("abfss://processed-data@iotdatalake
num.dfs.core.windows.net/cleaned/")
OUTPUT:
AZURE DASHBOARD:
PYTHON SCRIPT FOR SENDING SIMULATED IOT TELEMETRY DATA TO AZURE IOT
HUB:
SPARK LOGS:
RESULT:
The real-time IoT data analytics system was successfully implemented by streaming
IoT data from the IoT Hub to Azure Synapse Analytics, transforming the data using
PySpark in Synapse Notebooks, and storing the processed data into Azure Data Lake