Big Data Analytics Lecture 1
Big Data Analytics Lecture 1
April/May 2024
The course is designed to deepen learners' expertise in Big Data Analytics, with a focus
on IoT and mobile data applications.
● Equips students with skills to analyze, visualize, and interpret Big Data using scalable
machine learning algorithms, preparing them to deliver actionable insights in practical
settings.
● Grasp the broad implications of Big Data Analytics in various sectors, emphasizing its impact
on IoT and mobile data.
● Demonstrate expertise in fundamental Big Data platforms such as Hadoop and Spark,
ensuring the capability to process and manage large datasets.
● Apply knowledge of diverse data storage systems, including key-value (KV) stores, document
databases, graph databases, and timeseries databases, to optimize data structuring and
retrieval in different Big Data scenarios.
● Utilize Message Brokers and the MQTT protocol to effectively manage real-time data flows
from IoT devices and mobile sources, addressing the unique requirements of streaming
data.
● Employ scalable machine learning algorithms to conduct comprehensive analytics on
multi-structured data from various platforms, drawing actionable insights from complex
datasets.
● Develop sophisticated data visualization skills to clearly present and interpret analytical
findings, catering to both general and specialized audiences including mobile analytics.
Key Topics
Introduction and Big Data Ecosystem
Component Weight
Final 40%
Midterm 30%
Mobile Big Data as a Subset: Advances in mobile computing, mobile internet, IoT, and
crowdsensing have intensified the generation of Mobile Big Data (MBD). This data comes from a
vast array of mobile and wireless devices, capturing diverse information from sensors carried
by moving objects, people (e.g., wearables), or vehicles.
Significance: The analysis of MBD and other Big Data sources provides critical insights that can
drive decision-making and operational efficiencies across multiple sectors.
Applications: Big Data Analytics is pivotal in fields like healthcare, finance, retail, and urban
planning, where large-scale, real-time data analysis can lead to impactful outcomes.
Examples & Use Cases
MBD/IOT
Agenda
● IOT Data
● Big Data vs Mobile Big Data(MBD)
● Characteristics of MBD
● Applications of MBD
● Mobile Big Data Analytics
● Summary
Internet of Things (IoT): Definition
● Big data: data that is too big (volume), too fast (velocity), and too diverse
(variety)
● Other characteristics: veracity, variability.
● Data can be streaming or historical.
● IoT is both a source and sink of Big Data
Big Data
● Convergence of:
○ Internet technologies,
○ Mobile Computing,
○ Cloud Computing,
○ Big Data,
○ Data Analytics, and
○ IoT
● There are challenges that are peculiar to IoT Big Data Analytics
Characteristics of Classical/Traditional Big Data
IoT Big Data vs Big Data
MBD Characteristics: Multi-sensory
MBD Characteristics: Multi-dimensional
MBD Characteristics: Personalized
MBD Characteristics: Real-time
MBD Characteristics: Spatio-temporal
IoT Analytics aka MBD Analytics
Issues with IoT Analytics (1/3)
Issues with IoT Analytics (2/3)
Issues with IoT Analytics (3/3)
PROJECT
Problem Statement 1
● You have just been hired by the World Bank to monitor 50 million
energy meters in Sub-Saharan Africa.
● Each energy meter sends consumption data (I, V, Hz, kW, kWh,
timestamp) every 15 minutes.
● Your first task is to capture and save these records for efficient
retrieval and analysis.
Challenges? Issues?
Sensors
Message
Broker
Headend/
Controller
TimescaleDB
Device Layer
● Mosquitto
○ Receives and forwards MQTT messages between devices and
the Headend System
○ Handles device connections, authentication, and topic
subscriptions
Headend System
● Server-side application
○ Subscribes to MQTT topics for device telemetry data
○ Processes and analyzes device data
○ Sends control commands to devices via MQTT
○ Stores time-series data in TimescaleDB
Timeseries Database
● TimescaleDB
○ Stores and manages large amounts of time-series data from
IoT devices
○ Optimized for efficient storage and querying of time-series data
MQTT Crash Course 1/4
● Protocol Basics:
○ MQTT is a lightweight, publish-subscribe network protocol that
transports messages between devices.
○ It is designed for connections with remote locations where a "small
code footprint" is required or network bandwidth is limited
● Publish-Subscribe Model:
○ Unlike traditional client-server models, MQTT uses a broker-based
publish-subscribe pattern.
○ In this model, clients (publishers) do not send messages directly to
other clients (subscribers). Instead, they publish messages to a broker,
which then distributes these messages to interested subscribers
based on the topic of the messages.
MQTT Crash Course 2/4
● Quality of Service Levels:
○ MQTT supports three levels of quality of service (QoS) to deliver
messages:
■ QoS 0: At most once delivery (fire-and-forget).
■ QoS 1: At least once delivery (ensures the message is delivered at
least once).
■ QoS 2: Exactly once delivery (ensures the message is delivered
one time only).
○ These levels allow for message delivery guarantees according to the
requirements of different applications.
MQTT Crash Course 3/4
● Topics and Wildcards:
○ MQTT uses topics to filter messages for each connected client. Clients
subscribe to a topic or topics, and messages are sent to clients based
on their subscriptions.
○ MQTT also supports wildcards in topic subscription, allowing for
greater flexibility in message delivery to subscribers.
● Security Features:
○ Although MQTT itself does not provide intrinsic security features, it
supports secure transmission via SSL/TLS.
○ Additional security measures such as user name/password
authentication and access control can be implemented at the broker
level.
MQTT Crash Course 4/4
● Use Cases:
○ Ideal for IoT applications, telemetry in low-bandwidth scenarios, and
any application where minimal network overhead and low power
consumption are required.
○ Commonly used in real-time analytics, monitoring of remote sensors,
controlling devices over networks, and various M2M
(machine-to-machine) contexts.
# MQTT settings
broker_url = "3.138.185.79"
broker_port = 1883
username = "auca"
password = "gishushu"
topic = "auca_class"
client_id = "my_mqtt_client"
# Callback when the client receives a CONNACK response from the server
if rc == 0:
else:
client.loop_stop()
Python Code: Publish 2/3
# Create a new instance of the MQTT client with a specific client ID
client.loop_start()
try:
while True:
if message.lower() == 'exit':
break
except KeyboardInterrupt:
client.loop_stop()
client.disconnect()
Python Code: Subscribe 1/2
import paho.mqtt.client as mqtt
# MQTT settings
broker_url = "3.138.185.79"
broker_port = 1883
username = "auca"
password = "gishushu"
topic = "auca_class"
client_id = "my_mqtt_client_subscriber"
# Callback when the client receives a CONNACK response from the server
if rc == 0:
client.subscribe(topic, qos=2)
else:
client.loop_forever()
Hypertables
);
Create a Hypertable 2/2
● Convert the table to a hypertable. Specify the name of the table
you want to convert, and the column that holds its time values.
SELECT create_hypertable('sensor', 'time', chunk_time_interval => interval '1 week');
● Retrieves the name and size of each hypertable present in the database
SELECT sensor, hypertable_size(format('%I.%I', hypertable_schema, hypertable_name)::regclass) FROM
timescaledb_information.hypertables;
https://docs.timescale.com/self-hosted/latest/install/