0% found this document useful (0 votes)

19 views70 pages

Big Data Analytics Lecture 1

The document outlines the MSDA9215: Big Data Analytics course, focusing on IoT and mobile data applications, taught by Temitope Oguntade. It covers course logistics, learning outcomes, key topics, and assessment methods, emphasizing the use of technologies like Hadoop, Spark, and MQTT for real-time data management. The course aims to equip students with skills in analyzing, visualizing, and interpreting Big Data to address real-world challenges.

Uploaded by

Vincent BIKORIMANA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views70 pages

Big Data Analytics Lecture 1

Uploaded by

Vincent BIKORIMANA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

1

MSDA9215: Big Data Analytics

April/May 2024

Week 1: Introduction & Big Data Ecosystem

Temitope Oguntade
Agenda

● Logistics and Introductions

● Course Description
● Course Learning Outcomes
● Student Assessment and Grading
● Course Schedule
● Academic Integrity & Well-being
Logistics

● Instructor: Temitope Oguntade

● Email: [email protected]
● TAs:
● Class Time: 08:30 - 17:00 Mon, 18:00 - 20:45 Wed
● Course Credit:
● Prerequisite: As may be determined
● Code Source:
https://github.com/toguntad/AUCA_IOT/blob/9bd866ac48a6d3071
a20007e9ac5a64b3e9020a1/AUCA_IOT.ipynb
Brief
Temitope Oguntade is the CEO and Founder of Spiral
Systems, a startup dedicated to building innovative, AI-driven
metering solutions specifically designed for micro-utilities in
Sub-Saharan Africa. These solutions are tailored to be
cost-effective, addressing unique regional challenges and
improving utility management. He holds an M.Sc. in
Information Technology from Carnegie Mellon University and
a B.Eng in Electrical and Computer Engineering from the
Federal University of Technology Minna. With a career
spanning over 15 years, Temitope has developed a deep
expertise in entrepreneurship, distributed computing, cloud
computing, and technology management, driving significant
advancements in the tech sector.
It is your turn

Let’s start with 10 people

● Who are you?

● What MSc program?
● What’s your background?
● What are your expectations for this course?
● Experience with data analytics?
Course Description

The course is designed to deepen learners' expertise in Big Data Analytics, with a focus
on IoT and mobile data applications.

● Leverages foundational platforms like Hadoop and Spark alongside advanced

timeseries databases to enhance Big Data processing capabilities.

● Emphasizes real-time data management using Message Brokers and MQTT,

addressing the speciﬁc needs of IoT device data streams.

● Equips students with skills to analyze, visualize, and interpret Big Data using scalable
machine learning algorithms, preparing them to deliver actionable insights in practical
settings.

● Focuses on applying these technologies to real-world challenges, enhancing analytical

and decision-making skills in IoT and mobile data contexts.
Course Learning Outcomes
Upon completion of this course, students will:

● Grasp the broad implications of Big Data Analytics in various sectors, emphasizing its impact
on IoT and mobile data.
● Demonstrate expertise in fundamental Big Data platforms such as Hadoop and Spark,
ensuring the capability to process and manage large datasets.
● Apply knowledge of diverse data storage systems, including key-value (KV) stores, document
databases, graph databases, and timeseries databases, to optimize data structuring and
retrieval in different Big Data scenarios.
● Utilize Message Brokers and the MQTT protocol to effectively manage real-time data flows
from IoT devices and mobile sources, addressing the unique requirements of streaming
data.
● Employ scalable machine learning algorithms to conduct comprehensive analytics on
multi-structured data from various platforms, drawing actionable insights from complex
datasets.
● Develop sophisticated data visualization skills to clearly present and interpret analytical
findings, catering to both general and specialized audiences including mobile analytics.
Key Topics
Introduction and Big Data Ecosystem

● Overview of Big Data Analytics: Scope and applications.

● Incorporating Message Brokers for Big Data applications.
● Understanding and using MQTT in the context of IoT
● Understanding timeseries databases in Big Data.

Data Storage Methods and Real-Time Data Handling

● Introduction to foundational platforms: Hadoop Ecosystem and Spark.

● In-depth exploration of HDFS and its role in the Hadoop ecosystem.
● Introduction to HBase: Concepts, architecture, and how it integrates with
Hadoop.
● Discussion on the types and characteristics of databases: KV stores,
document databases, and graph databases.
Key Topics(2)

Big Data Processing and Analytics

● Big Data processing frameworks: MapReduce and beyond.

● Analytics algorithms: Understanding the basics and applications.
● Parallel processing and scalability concerns in Big Data.
● Special focus on analytics for mobile and IoT Big Data.

Visualization and Real-world Applications

● Principles of data visualization in Big Data Analytics.

● Mobile issues and solutions in Big Data contexts.
● Case studies: Real-world Big Data challenges and solutions.
Assessment

Component Weight

Final 40%

Midterm 30%

Assignments | Quizzes | Participation 30%

What is Big Data Analytics
Overview: Big Data Analytics involves examining large and varied data sets to uncover hidden
patterns, unknown correlations, market trends, customer preferences, and other useful
information.

Mobile Big Data as a Subset: Advances in mobile computing, mobile internet, IoT, and
crowdsensing have intensiﬁed the generation of Mobile Big Data (MBD). This data comes from a
vast array of mobile and wireless devices, capturing diverse information from sensors carried
by moving objects, people (e.g., wearables), or vehicles.

Signiﬁcance: The analysis of MBD and other Big Data sources provides critical insights that can
drive decision-making and operational eﬃciencies across multiple sectors.

Applications: Big Data Analytics is pivotal in ﬁelds like healthcare, ﬁnance, retail, and urban
planning, where large-scale, real-time data analysis can lead to impactful outcomes.
Examples & Use Cases
MBD/IOT
Agenda

● IOT Data
● Big Data vs Mobile Big Data(MBD)
● Characteristics of MBD
● Applications of MBD
● Mobile Big Data Analytics
● Summary
Internet of Things (IoT): Deﬁnition

● “A global infrastructure for the information society,

enabling advances services by interconnecting (physical
and virtual) things based on existing and evolving
interoperable information and communication
technologies.” - -
https://www.itu.int/en/ITU-T/gsi/iot/Pages/default.aspx

● Data is exchanged between systems and devices using the

Internet or other communications networks.
Source: https://www.techtarget.com/iotagenda/deﬁnition/Internet-of-Things-IoT
IoT Data

● IoT generates large amounts of data from multiple components.

● Data should be analysed to enable action/decision making.
● Data management and data mining are key technical and managerial
challenges in IoT development.
● Intrinsic properties of IoT data contribute to the two challenges above.
Properties of IoT Data

● Categorized into data generation, data quality, and data interoperability

● Data generation properties:
○ Velocity- generated at different rates
○ Scalability- large scale
○ Dynamics- changing location & environments, intermittent
connections
○ Heterogeneity- different data generators, different data formats
Properties of IoT Data (2)

● Data generation properties:

○ Incompleteness- need to ﬁnd data sources to address
incompleteness
○ Semantics- need to inject semantics
Properties of IoT Data (3)

● Data interoperability properties:

○ Uncertainty- originating from diﬀerent sources
○ Redundancy- multiple measures of same thing/metric
○ Ambiguity- means diﬀerent things
Data types

● Textual (un/semi structured)

● Time series
● Geospatial
● Numerical
● Categorical
● Multimodal (image/video/audio)
Sources of IoT Data

● Sensors and actuators ● Users/crowd

● Social media ● Web
● Documents ● Graphs/ontologies
● Databases ● Expert/knowledge bases
Data types

● Big data: data that is too big (volume), too fast (velocity), and too diverse
(variety)
● Other characteristics: veracity, variability.
● Data can be streaming or historical.
● IoT is both a source and sink of Big Data
Big Data

● Convergence of:
○ Internet technologies,
○ Mobile Computing,
○ Cloud Computing,
○ Big Data,
○ Data Analytics, and
○ IoT
● There are challenges that are peculiar to IoT Big Data Analytics
Characteristics of Classical/Traditional Big Data
IoT Big Data vs Big Data
MBD Characteristics: Multi-sensory
MBD Characteristics: Multi-dimensional
MBD Characteristics: Personalized
MBD Characteristics: Real-time
MBD Characteristics: Spatio-temporal
IoT Analytics aka MBD Analytics
Issues with IoT Analytics (1/3)
Issues with IoT Analytics (2/3)
Issues with IoT Analytics (3/3)
PROJECT
Problem Statement 1

● You have just been hired by the World Bank to monitor 50 million
energy meters in Sub-Saharan Africa.
● Each energy meter sends consumption data (I, V, Hz, kW, kWh,
timestamp) every 15 minutes.
● Your ﬁrst task is to capture and save these records for eﬃcient
retrieval and analysis.
Challenges? Issues?

Five points for each.

Let’s go!
Technologies & Tools

● Protocol: Message Queuing Telemetry Transport (MQTT)

● MQTT Comm API: Paho
● MQTT Broker: Mosquitto (https://mosquitto.org/)
● Headend(or Server) System
● Timeseries Database: TimescaleDB
System Architecture

Sensors

Message
Broker

Headend/
Controller

TimescaleDB
Device Layer

● IoT Devices (sensors, actuators, gateways)

○ Communicate with the MQTT Broker using the Paho MQTT
Comm API
○ Publish telemetry data to MQTT topics
○ Subscribe to control commands from the Headend System
MQTT Broker

● Mosquitto
○ Receives and forwards MQTT messages between devices and
the Headend System
○ Handles device connections, authentication, and topic
subscriptions
Headend System

● Server-side application
○ Subscribes to MQTT topics for device telemetry data
○ Processes and analyzes device data
○ Sends control commands to devices via MQTT
○ Stores time-series data in TimescaleDB
Timeseries Database

● TimescaleDB
○ Stores and manages large amounts of time-series data from
IoT devices
○ Optimized for eﬃcient storage and querying of time-series data
MQTT Crash Course 1/4
● Protocol Basics:
○ MQTT is a lightweight, publish-subscribe network protocol that
transports messages between devices.
○ It is designed for connections with remote locations where a "small
code footprint" is required or network bandwidth is limited

● Publish-Subscribe Model:
○ Unlike traditional client-server models, MQTT uses a broker-based
publish-subscribe pattern.
○ In this model, clients (publishers) do not send messages directly to
other clients (subscribers). Instead, they publish messages to a broker,
which then distributes these messages to interested subscribers
based on the topic of the messages.
MQTT Crash Course 2/4
● Quality of Service Levels:
○ MQTT supports three levels of quality of service (QoS) to deliver
messages:
■ QoS 0: At most once delivery (fire-and-forget).
■ QoS 1: At least once delivery (ensures the message is delivered at
least once).
■ QoS 2: Exactly once delivery (ensures the message is delivered
one time only).
○ These levels allow for message delivery guarantees according to the
requirements of different applications.
MQTT Crash Course 3/4
● Topics and Wildcards:
○ MQTT uses topics to filter messages for each connected client. Clients
subscribe to a topic or topics, and messages are sent to clients based
on their subscriptions.
○ MQTT also supports wildcards in topic subscription, allowing for
greater flexibility in message delivery to subscribers.

● Security Features:
○ Although MQTT itself does not provide intrinsic security features, it
supports secure transmission via SSL/TLS.
○ Additional security measures such as user name/password
authentication and access control can be implemented at the broker
level.
MQTT Crash Course 4/4
● Use Cases:
○ Ideal for IoT applications, telemetry in low-bandwidth scenarios, and
any application where minimal network overhead and low power
consumption are required.
○ Commonly used in real-time analytics, monitoring of remote sensors,
controlling devices over networks, and various M2M
(machine-to-machine) contexts.

● Last Will and Testament:

○ A unique feature where a "last will" message is deﬁned in case of an
unexpected disconnection of the client. This message is sent by the
broker to notify other clients about the disconnection.
Broker Setup Information
● Broker Url: tcp://3.138.185.79 :1883
● QoS: 2
● Clean Session: true
● Username: auca
● Password: gishushu

Setup Python Env: Install Paho

● pip install paho-mqtt==1.6.1
Python Code: Publish 1/3
import paho.mqtt.client as mqtt

# MQTT settings

broker_url = "3.138.185.79"

broker_port = 1883

username = "auca"

password = "gishushu"

topic = "auca_class"

client_id = "my_mqtt_client"

# Callback when the client receives a CONNACK response from the server

def on_connect(client, userdata, ﬂags, rc):

if rc == 0:

print("Connected successfully to broker")

else:

print(f"Failed to connect, return code {rc}\n")

# If the client fails to connect then we should stop the loop

client.loop_stop()
Python Code: Publish 2/3
# Create a new instance of the MQTT client with a speciﬁc client ID

client = mqtt.Client(client_id, clean_session=True)

client.on_connect = on_connect # attach the callback function to the client

client.username_pw_set(username, password) # set username and password

client.connect(broker_url, broker_port, 60) # connect to the broker

# Start the network loop in a separate thread

client.loop_start()

try:

while True:

message = input("Enter message to publish or type 'exit' to quit: ")

if message.lower() == 'exit':

break

client.publish(topic, message, qos=2)

except KeyboardInterrupt:

print("Program interrupted by user, exiting...")

Python Code: Publish 3/3

# Stop the network loop and disconnect

client.loop_stop()

client.disconnect()
Python Code: Subscribe 1/2
import paho.mqtt.client as mqtt

# MQTT settings

broker_url = "3.138.185.79"

broker_port = 1883

username = "auca"

password = "gishushu"

topic = "auca_class"

client_id = "my_mqtt_client_subscriber"

# Callback when the client receives a CONNACK response from the server

def on_connect(client, userdata, ﬂags, rc):

if rc == 0:

print("Connected successfully to broker")

# Subscribe to the topic once connected

client.subscribe(topic, qos=2)

else:

print(f"Failed to connect, return code {rc}\n")

Python Code: Subscribe 1/2
# Callback for when a PUBLISH message is received from the server

def on_message(client, userdata, msg):

print(f"Message received on topic {msg.topic}: {msg.payload.decode()}")

# Create a new instance of the MQTT client with a speciﬁc client ID

client = mqtt.Client(client_id, clean_session=True)

client.on_connect = on_connect # attach the connection callback function to the client

client.on_message = on_message # attach the message callback function to the client

client.username_pw_set(username, password) # set username and password

client.connect(broker_url, broker_port, 60) # connect to the broker

# Start the network loop in a separate thread

client.loop_forever()
Hypertables

● Hypertables are PostgreSQL tables with special features that

make it easy to handle time-series data. Anything you can do with
regular PostgreSQL tables, you can do with hypertables. In
addition, you get the beneﬁts of improved performance and user
experience for time-series data.
● They automatically partition your data by time.
● In Timescale, hypertables exist alongside regular PostgreSQL
tables. Use hypertables to store time-series data. This gives you
improved insert and query performance, and access to useful
time-series features. Use regular PostgreSQL tables for other
relational data.
Hypertable Partitioning

● When you create and use a hypertable, it automatically partitions

data by time, and optionally by space.
● Each hypertable is made up of child tables called chunks.
● Each chunk is assigned a range of time, and only contains data
from that range. If the hypertable is also partitioned by space, each
chunk is also assigned a subset of the space values.
Time Partitioning
● Each chunk of a
hypertable only holds
data from a speciﬁc time
range.
● When you insert data
from a time range that
doesn't yet have a chunk,
Timescale automatically
creates a chunk to store
it.
Create a Hypertable 1/2
● To create a hypertable, you need to create a standard PostgreSQL
table, and then convert it into a hypertable.
● Create a standard PostgreSQL table.
CREATE TABLE sensor (

time TIMESTAMPTZ NOT NULL,

location TEXT NOT NULL,

device TEXT NOT NULL,

voltage DOUBLE PRECISION NOT NULL,

current DOUBLE PRECISION NOT NULL,

frequency DOUBLE PRECISION NOT NULL,

power DOUBLE PRECISION NOT NULL,

energy DOUBLE PRECISION NOT NULL

);
Create a Hypertable 2/2
● Convert the table to a hypertable. Specify the name of the table
you want to convert, and the column that holds its time values.
SELECT create_hypertable('sensor', 'time', chunk_time_interval => interval '1 week');

● Some possible chunk intervals

- ‘1 hour’
- ‘1 day’
- ‘1 month’
- ‘1 year’
Hypertable Auto Compression
● Enable Compression
ALTER TABLE sensor SET (timescaledb.compress, timescaledb.compress_orderby = 'time');

● Add compression Policy

SELECT add_compression_policy('sensor', INTERVAL '7 days');
Hypertable Manual Compression
● Find Chunks (all chunks older than x minutes)
SELECT show_chunks('sensor', older_than => INTERVAL '3 minutes');

● Compression all chunks older than 3 minutes

SELECT compress_chunk(i)
FROM show_chunks('sensor', older_than => INTERVAL '3 minutes') AS i;
Timescale: Essential Commands 1/2
● Disk Size of a hypertable; both compressed and uncompressed chunks
SELECT hypertable_size('sensor');

● Retrieves the name and size of each hypertable present in the database
SELECT sensor, hypertable_size(format('%I.%I', hypertable_schema, hypertable_name)::regclass) FROM
timescaledb_information.hypertables;

● Detailed view of the disk space usage of a hypertable; If running on a distributed

hypertable, ordering by node_name would show the size distribution across diﬀerent data nodes

SELECT * FROM hypertable_detailed_size('sensor') ORDER BY node_name;

Timescale: Essential Commands 2/2
● Manually compresses a speciﬁc chunk identiﬁed by its internal name.
SELECT compress_chunk( '_timescaledb_internal._hyper_1_1_chunk');

● Provides compression statistics for all chunks of the speciﬁed

hypertable.
SELECT * FROM chunk_compression_stats('sensor');

● Attempts to compress chunks of the 'sensor' hypertable that were

created between three weeks ago and one week ago.
SELECT compress_chunk(i) from show_chunks('sensor', now() - interval '1 week', now() - interval '3
weeks') i;
TimescaledDB Installation

https://docs.timescale.com/self-hosted/latest/install/

Big Data
No ratings yet
Big Data
190 pages
Techknowledge Publication: Big Data Analytics
No ratings yet
Techknowledge Publication: Big Data Analytics
156 pages
IoT - New 6
No ratings yet
IoT - New 6
186 pages
Literature Review Chapter Two
No ratings yet
Literature Review Chapter Two
24 pages
Big Data 2
No ratings yet
Big Data 2
49 pages
Big Data Hadoop Complete Final Spaced
No ratings yet
Big Data Hadoop Complete Final Spaced
15 pages
IERG4230 BigData Analytics
No ratings yet
IERG4230 BigData Analytics
54 pages
Module 1-BDA
No ratings yet
Module 1-BDA
82 pages
4-2 Bda PPTS
No ratings yet
4-2 Bda PPTS
114 pages
Unit I
No ratings yet
Unit I
64 pages
IOT and Big Data Analytics
No ratings yet
IOT and Big Data Analytics
12 pages
Chp4 Advance Analytics-KMeans
No ratings yet
Chp4 Advance Analytics-KMeans
40 pages
Keerthana 236 HPN
No ratings yet
Keerthana 236 HPN
26 pages
Unit III Lecture Notes
No ratings yet
Unit III Lecture Notes
6 pages
Big Data Analytics
No ratings yet
Big Data Analytics
61 pages
Bda U1
No ratings yet
Bda U1
80 pages
1 Introduction To Big Data Management and Processing
No ratings yet
1 Introduction To Big Data Management and Processing
46 pages
Unit 1 BD
No ratings yet
Unit 1 BD
3 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
33 pages
BDA Unit 1
No ratings yet
BDA Unit 1
36 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
Big Data Ashish
No ratings yet
Big Data Ashish
7 pages
BD Imp Ques 1
No ratings yet
BD Imp Ques 1
22 pages
IOTBDM - Mid Sem
No ratings yet
IOTBDM - Mid Sem
16 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
17 pages
Big Data A Comprehensive Overview
No ratings yet
Big Data A Comprehensive Overview
25 pages
Unit 1 Handouts
No ratings yet
Unit 1 Handouts
8 pages
No SQL Database in Bda
No ratings yet
No SQL Database in Bda
84 pages
Big Data Analytics Unit - 1 Notes
No ratings yet
Big Data Analytics Unit - 1 Notes
24 pages
Report On Bigdata
No ratings yet
Report On Bigdata
3 pages
Big Data Analytics
No ratings yet
Big Data Analytics
8 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
Big Data - Cloud - AI
No ratings yet
Big Data - Cloud - AI
45 pages
Unit 1
No ratings yet
Unit 1
11 pages
Big Data
No ratings yet
Big Data
10 pages
Big Data Analytics
No ratings yet
Big Data Analytics
37 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
Types of Digital Data: Unit 1 Big Data KCS-061
No ratings yet
Types of Digital Data: Unit 1 Big Data KCS-061
12 pages
BIG Data - Unit - 1
No ratings yet
BIG Data - Unit - 1
24 pages
DBMS Unit1
No ratings yet
DBMS Unit1
30 pages
Big Data
No ratings yet
Big Data
18 pages
Automatic Storage Management: Student Guide
100% (1)
Automatic Storage Management: Student Guide
275 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
Bda U1
No ratings yet
Bda U1
78 pages
Introduction of Subject
No ratings yet
Introduction of Subject
28 pages
Chapter 1
No ratings yet
Chapter 1
49 pages
CS8091 Big Data Analytics
No ratings yet
CS8091 Big Data Analytics
28 pages
Managing Your Assets With Big Data Tools
No ratings yet
Managing Your Assets With Big Data Tools
54 pages
Big Data Manual - Edited
No ratings yet
Big Data Manual - Edited
69 pages
Lecture 2 - Hadoop 221
No ratings yet
Lecture 2 - Hadoop 221
28 pages
Cp5293 Big Data Analytics Question Bank
0% (1)
Cp5293 Big Data Analytics Question Bank
13 pages
Ashish Presentation Stage1 Modify LR
No ratings yet
Ashish Presentation Stage1 Modify LR
24 pages
cp5293 Big Data Analytics Question Bank
0% (1)
cp5293 Big Data Analytics Question Bank
13 pages
PassLeader Veeam VMCE v11 2021 Dumps 397
No ratings yet
PassLeader Veeam VMCE v11 2021 Dumps 397
127 pages
CS8091 BDA Unit1
No ratings yet
CS8091 BDA Unit1
63 pages
Java Backend Bootcamp
No ratings yet
Java Backend Bootcamp
4 pages
Modulo 1 - Fundamentos de Big Data
No ratings yet
Modulo 1 - Fundamentos de Big Data
4 pages
Big Data Analytics
No ratings yet
Big Data Analytics
31 pages
Business Intelligence & Big Data Analytics-CSE3124Y
No ratings yet
Business Intelligence & Big Data Analytics-CSE3124Y
25 pages
CS8091 Syllabus
No ratings yet
CS8091 Syllabus
2 pages
Java NIO
No ratings yet
Java NIO
13 pages
Big Data Analytics (BDA) : Name of The Faculty: Affiliation: Teaching Area
No ratings yet
Big Data Analytics (BDA) : Name of The Faculty: Affiliation: Teaching Area
8 pages
Core Java Project Final
No ratings yet
Core Java Project Final
3 pages
B Trees
No ratings yet
B Trees
50 pages
Dual Stack Upgrade 7.5 - PART - 1
No ratings yet
Dual Stack Upgrade 7.5 - PART - 1
54 pages
Caching Techniques
No ratings yet
Caching Techniques
4 pages
CS 3306 01 Written Assignment Unit 2
No ratings yet
CS 3306 01 Written Assignment Unit 2
5 pages
SQL For Beginners The Simplified Guide To Managing, Analyzing Data PDF
100% (3)
SQL For Beginners The Simplified Guide To Managing, Analyzing Data PDF
109 pages
Big Data Analytics
No ratings yet
Big Data Analytics
3 pages
COS10022 - Lecture 03 - Data Preparation PDF
No ratings yet
COS10022 - Lecture 03 - Data Preparation PDF
61 pages
Deploying PostgreSQL Clusters Using Kubernetes StatefulSets
No ratings yet
Deploying PostgreSQL Clusters Using Kubernetes StatefulSets
9 pages
3
No ratings yet
3
10 pages
Set 1 Comp. Sci Ans. Key
No ratings yet
Set 1 Comp. Sci Ans. Key
6 pages
How To Migrate From Oracle To PostgreSQL
No ratings yet
How To Migrate From Oracle To PostgreSQL
13 pages
Aws S3
No ratings yet
Aws S3
11 pages
Chapter 3 Olap and Oltp
No ratings yet
Chapter 3 Olap and Oltp
29 pages
5 The Relational Calculus
No ratings yet
5 The Relational Calculus
61 pages
t3 Simple SQL
No ratings yet
t3 Simple SQL
5 pages
Descriptive Analysis of The 2019 Stack Overflow Developer Survey Data - Presentation PDF
No ratings yet
Descriptive Analysis of The 2019 Stack Overflow Developer Survey Data - Presentation PDF
19 pages
How To Generate CRF Report in Both Online and COB?
No ratings yet
How To Generate CRF Report in Both Online and COB?
7 pages
Dbms Lab1
No ratings yet
Dbms Lab1
5 pages
ERD Exercise1 LimaCoyoca
No ratings yet
ERD Exercise1 LimaCoyoca
2 pages
Company Database
No ratings yet
Company Database
1 page
Uas Komputec Foren
No ratings yet
Uas Komputec Foren
6 pages
Us 16 Litchfield Hackproofing Oracle Ebusiness Suite WP 1
No ratings yet
Us 16 Litchfield Hackproofing Oracle Ebusiness Suite WP 1
2 pages
RHCSA9 New Demo
No ratings yet
RHCSA9 New Demo
11 pages
Useful Logging Information For CM and DM
No ratings yet
Useful Logging Information For CM and DM
2 pages
ThoughtWorks Sample Technical Placement Paper Level1
100% (2)
ThoughtWorks Sample Technical Placement Paper Level1
7 pages
Big Data for Enterprise Architects
From Everand
Big Data for Enterprise Architects
Dr Mehmet Yildiz
4.5/5 (3)
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
From Everand
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
Dr Mehmet Yildiz
4.5/5 (2)

Big Data Analytics Lecture 1

Uploaded by

Big Data Analytics Lecture 1

Uploaded by

1

MSDA9215: Big Data Analytics

Week 1: Introduction & Big Data Ecosystem

● Logistics and Introductions

● Instructor: Temitope Oguntade

Let’s start with 10 people

● Who are you?

● Leverages foundational platforms like Hadoop and Spark alongside advanced

● Emphasizes real-time data management using Message Brokers and MQTT,

● Focuses on applying these technologies to real-world challenges, enhancing analytical

● Overview of Big Data Analytics: Scope and applications.

Data Storage Methods and Real-Time Data Handling

● Introduction to foundational platforms: Hadoop Ecosystem and Spark.

Big Data Processing and Analytics

● Big Data processing frameworks: MapReduce and beyond.

Visualization and Real-world Applications

● Principles of data visualization in Big Data Analytics.

Assignments | Quizzes | Participation 30%

● “A global infrastructure for the information society,

● Data is exchanged between systems and devices using the

● IoT generates large amounts of data from multiple components.

● Categorized into data generation, data quality, and data interoperability

● Data generation properties:

● Data interoperability properties:

● Textual (un/semi structured)

● Sensors and actuators ● Users/crowd

Five points for each.

● Protocol: Message Queuing Telemetry Transport (MQTT)

● IoT Devices (sensors, actuators, gateways)

● Last Will and Testament:

Setup Python Env: Install Paho

def on_connect(client, userdata, ﬂags, rc):

print("Connected successfully to broker")

print(f"Failed to connect, return code {rc}\n")

# If the client fails to connect then we should stop the loop

client = mqtt.Client(client_id, clean_session=True)

client.on_connect = on_connect # attach the callback function to the client

client.username_pw_set(username, password) # set username and password

client.connect(broker_url, broker_port, 60) # connect to the broker

# Start the network loop in a separate thread

message = input("Enter message to publish or type 'exit' to quit: ")

client.publish(topic, message, qos=2)

print("Program interrupted by user, exiting...")

# Stop the network loop and disconnect

def on_connect(client, userdata, ﬂags, rc):

print("Connected successfully to broker")

# Subscribe to the topic once connected

print(f"Failed to connect, return code {rc}\n")

def on_message(client, userdata, msg):

print(f"Message received on topic {msg.topic}: {msg.payload.decode()}")

# Create a new instance of the MQTT client with a speciﬁc client ID

client = mqtt.Client(client_id, clean_session=True)

client.on_connect = on_connect # attach the connection callback function to the client

client.on_message = on_message # attach the message callback function to the client

client.username_pw_set(username, password) # set username and password

client.connect(broker_url, broker_port, 60) # connect to the broker

# Start the network loop in a separate thread

● Hypertables are PostgreSQL tables with special features that

● When you create and use a hypertable, it automatically partitions

time TIMESTAMPTZ NOT NULL,

location TEXT NOT NULL,

device TEXT NOT NULL,

voltage DOUBLE PRECISION NOT NULL,

current DOUBLE PRECISION NOT NULL,

frequency DOUBLE PRECISION NOT NULL,

power DOUBLE PRECISION NOT NULL,

energy DOUBLE PRECISION NOT NULL

● Some possible chunk intervals

● Add compression Policy

● Compression all chunks older than 3 minutes

● Detailed view of the disk space usage of a hypertable; If running on a distributed

SELECT * FROM hypertable_detailed_size('sensor') ORDER BY node_name;

● Provides compression statistics for all chunks of the speciﬁed

● Attempts to compress chunks of the 'sensor' hypertable that were

You might also like