0% found this document useful (0 votes)

11 views3 pages

Topic 1:: Spark Structured Streaming

hgjk

Uploaded by

dharnamittal07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views3 pages

Topic 1:: Spark Structured Streaming

hgjk

Uploaded by

dharnamittal07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Topic 1:

What is Spark Streaming?

Spark Streaming is a part of Apache Spark that helps process and analyse real-time data
streams. Think of it as a tool that takes live data (like tweets, website clicks, or sensor data)
and processes it to provide useful insights, like counting events or detecting patterns. It
works in small, fast batches to handle streaming data efficiently.

In Spark Streaming, "sliding window analytics" means analysing data within overlapping time
periods. Instead of dividing data into separate, non-overlapping chunks, a sliding window
looks at data from overlapping intervals. You set a window size (how much data to analyse)
and a slide interval (how often to update), so you can continuously track trends and
calculate results as new data comes in.

How to implement sliding window analysis in Spark Streaming:

 Spark Structured Streaming:

Use the window function within a Spark SQL query to define a sliding window with specified
duration and slide interval.

 Defining window parameters:

When using the window function, you specify the time column to group by, the window
duration, and the slide interval.

 Aggregations on windows:

Once you've defined the sliding window, you can apply aggregations like sum, average,
count, etc., to calculate values within each overlapping window.

Example scenario:

 Monitoring website traffic: You could use a 5-minute sliding window with a 1-minute
slide interval to analyse website hits over a continuous period, capturing changes in
traffic volume as new data arrives.

Topic 2:
The CAP theorem, or Consistency, Availability, and Partition tolerance theorem, describes
the trade-offs between these three properties in distributed systems. It states that it's not
possible to guarantee all three properties at the same time.

The three properties of the CAP theorem

 Consistency: All reads receive the most recent write or an error

 Availability: All reads contain data, but it might not be the most recent
 Partition tolerance: The system can continue operating even if there's a network
fault that splits the system into partitions

How the CAP theorem works

 Distributed systems usually produce two of the three properties simultaneously

 When a partition occurs, the system must choose between consistency and
availability

 Systems can prioritize availability and partition tolerance, accepting temporary data
inconsistency to ensure the system remains operational

Topic 3:
Amazon DynamoDB is a key-value NoSQL database that uses a key-value storage model. It's a
managed database service from Amazon Web Services (AWS).

Key features

 Scalability: DynamoDB is serverless and can scale to zero. It also has auto-scaling,
which automatically adjusts throughput capacity based on traffic demands.

 Data models: DynamoDB supports both key-value and document data models.

 Global tables: DynamoDB's global tables are multi-region databases that

automatically replicate data across different AWS regions.

 Secondary indexes: DynamoDB uses secondary indexes to provide more querying

flexibility.

 Primary keys: DynamoDB uses primary keys to uniquely identify each item in a table.

Data types

 DynamoDB supports three data types: number, string, and binary.

 It also supports document stores such as JSON, XML, or HTML.

Topic 4:
Apache Cassandra is a distributed, open-source database that uses a simple data model to
store structured data. It was originally developed at Facebook and released in 2008.

Data model

 Cassandra's data model is simple and flexible, with dynamic control over data layout
and format

 It stores data as rows organized into tables or column families

 Each row is identified by a primary key value

 The primary key partitions data, allowing for partial or full data fetches

Vsphere Esxi 8.0 Installation Setup Guide
No ratings yet
Vsphere Esxi 8.0 Installation Setup Guide
255 pages
Difficult Riddles For Smart Kids 300 Dif PDF
0% (3)
Difficult Riddles For Smart Kids 300 Dif PDF
7 pages
EcoStruxure Building Operation - System Upgrade Reference Guide
No ratings yet
EcoStruxure Building Operation - System Upgrade Reference Guide
178 pages
Unit-5 Notes
No ratings yet
Unit-5 Notes
17 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
26 pages
The CAP Theorem
100% (1)
The CAP Theorem
3 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
CAP Theorem in Blockchain
No ratings yet
CAP Theorem in Blockchain
4 pages
CSS10 - Q1 - Module2 - Ronald A. Rigua
No ratings yet
CSS10 - Q1 - Module2 - Ronald A. Rigua
26 pages
Multimedia-Unit 3
No ratings yet
Multimedia-Unit 3
23 pages
BDA UT2 QB Answers
100% (1)
BDA UT2 QB Answers
22 pages
Infineon 6EDL7151 DataSheet v01 00 en
No ratings yet
Infineon 6EDL7151 DataSheet v01 00 en
158 pages
KS18 Data Centres - An Introduction To Concepts and Design (2012)
No ratings yet
KS18 Data Centres - An Introduction To Concepts and Design (2012)
85 pages
Session09-Parts 17-18
No ratings yet
Session09-Parts 17-18
98 pages
Program Directory For ENOVIA® Digital Enterprise Solutions ENOVIA Portfolio Enovia Version 1 Release 5 For Use With AIX, HP-UX, IRIX, Solaris
No ratings yet
Program Directory For ENOVIA® Digital Enterprise Solutions ENOVIA Portfolio Enovia Version 1 Release 5 For Use With AIX, HP-UX, IRIX, Solaris
238 pages
Module 1
No ratings yet
Module 1
69 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
11-NoSQL Nhom8
No ratings yet
11-NoSQL Nhom8
72 pages
4.NoSQL 1
No ratings yet
4.NoSQL 1
69 pages
Big Data Topic4 (Nosql Database) (Thanh Binh Nguyen) .TextMark
No ratings yet
Big Data Topic4 (Nosql Database) (Thanh Binh Nguyen) .TextMark
53 pages
HUAWEI IdeaHub S2 Must-See Tips
No ratings yet
HUAWEI IdeaHub S2 Must-See Tips
50 pages
Unit 4-DBP
No ratings yet
Unit 4-DBP
66 pages
Chapter 3-Updated
No ratings yet
Chapter 3-Updated
34 pages
Intro No SQL
No ratings yet
Intro No SQL
44 pages
NGT QB Ans
No ratings yet
NGT QB Ans
43 pages
Module 3&5 21&18
No ratings yet
Module 3&5 21&18
26 pages
Unitw 12 W 2
No ratings yet
Unitw 12 W 2
18 pages
2IL50 Data Structures: 2018-19 Q3 Lecture 1: Introduction
No ratings yet
2IL50 Data Structures: 2018-19 Q3 Lecture 1: Introduction
61 pages
Big Data Analytics Lecture 3A
No ratings yet
Big Data Analytics Lecture 3A
27 pages
IntroNoSQL Revised
No ratings yet
IntroNoSQL Revised
28 pages
Big Data Imp-1
No ratings yet
Big Data Imp-1
16 pages
Acid Vs Base
No ratings yet
Acid Vs Base
13 pages
No SQL
No ratings yet
No SQL
14 pages
DBMS 11
No ratings yet
DBMS 11
13 pages
Chapter 4 1712934164766
No ratings yet
Chapter 4 1712934164766
28 pages
Big Data Slides
No ratings yet
Big Data Slides
26 pages
Module 2.3
No ratings yet
Module 2.3
25 pages
Big Data 3rd Assignment Answers
No ratings yet
Big Data 3rd Assignment Answers
8 pages
PW4 (F1002, F1005, F1020)
No ratings yet
PW4 (F1002, F1005, F1020)
22 pages
Unit Iv DBMS
No ratings yet
Unit Iv DBMS
14 pages
1 s2.0 S0140366419306930 Main
No ratings yet
1 s2.0 S0140366419306930 Main
7 pages
NOSQL
No ratings yet
NOSQL
23 pages
Unit 5 (Big Data Analytics)
No ratings yet
Unit 5 (Big Data Analytics)
11 pages
Microshoft Word Shortcut Keys2
No ratings yet
Microshoft Word Shortcut Keys2
21 pages
BDA Module-3
No ratings yet
BDA Module-3
7 pages
Regulation-2019 Curriculum - UG - CSE 1
No ratings yet
Regulation-2019 Curriculum - UG - CSE 1
19 pages
Lec 14
No ratings yet
Lec 14
13 pages
Lec21Notes Merged
No ratings yet
Lec21Notes Merged
20 pages
Consumer Intentions To Adopt Electronic Commerce - Incorporating Trust and Risk in The Technology Acceptance Model
No ratings yet
Consumer Intentions To Adopt Electronic Commerce - Incorporating Trust and Risk in The Technology Acceptance Model
30 pages
4 NoSql
No ratings yet
4 NoSql
25 pages
Fresher Q&A For Testing 1
No ratings yet
Fresher Q&A For Testing 1
16 pages
Offline Medical Assistant
No ratings yet
Offline Medical Assistant
6 pages
CAP Theorem
No ratings yet
CAP Theorem
15 pages
Nosql
No ratings yet
Nosql
12 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
DSM - CAP Theorem
No ratings yet
DSM - CAP Theorem
7 pages
BDA Answers
No ratings yet
BDA Answers
6 pages
The CAP Theorem in DBMS - GeeksforGeeks
No ratings yet
The CAP Theorem in DBMS - GeeksforGeeks
6 pages
Bda Module 3
No ratings yet
Bda Module 3
24 pages
Big Data Management and Nosql Databases: Doc. Rndr. Irena Holubova, PH.D
No ratings yet
Big Data Management and Nosql Databases: Doc. Rndr. Irena Holubova, PH.D
27 pages
ADBS Short Questions - Lecture 7
No ratings yet
ADBS Short Questions - Lecture 7
4 pages
09 Constraint Satisfaction Problems
No ratings yet
09 Constraint Satisfaction Problems
51 pages
Data Engineering Unit 3
No ratings yet
Data Engineering Unit 3
4 pages
MDS 271 2448001
No ratings yet
MDS 271 2448001
9 pages
Recent Trends - Nosql Database Management
No ratings yet
Recent Trends - Nosql Database Management
26 pages
Resumen Del Hardware Del Ordenador
No ratings yet
Resumen Del Hardware Del Ordenador
2 pages
1 - 6 Years Experience 2nd
No ratings yet
1 - 6 Years Experience 2nd
2 pages
NoSQL Databases and Big Data Storage Systems
No ratings yet
NoSQL Databases and Big Data Storage Systems
4 pages
CAP Theorem in Blockchain
No ratings yet
CAP Theorem in Blockchain
6 pages
Patna High Court Assistant Mock Test - Attempt To
No ratings yet
Patna High Court Assistant Mock Test - Attempt To
1 page
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
Third Space Learning GCSE Topic Revision List Foundation and Higher
No ratings yet
Third Space Learning GCSE Topic Revision List Foundation and Higher
2 pages
7.5 Effects of Layer 2 Devices On Data Flow: 7.5.1 Ethernet LAN Segmentation
No ratings yet
7.5 Effects of Layer 2 Devices On Data Flow: 7.5.1 Ethernet LAN Segmentation
9 pages
Webinar - Online Conference
No ratings yet
Webinar - Online Conference
4 pages
Explain The Term Nosql'. Describe Vertical and Horizontal Scaling
No ratings yet
Explain The Term Nosql'. Describe Vertical and Horizontal Scaling
13 pages
ANT260 Answer Key 2
No ratings yet
ANT260 Answer Key 2
5 pages
f1 Self Assessment Checklist Doliente
No ratings yet
f1 Self Assessment Checklist Doliente
4 pages
Matrices Basic Concepts
No ratings yet
Matrices Basic Concepts
14 pages
3GPP TS 22.090
No ratings yet
3GPP TS 22.090
9 pages
Manual - Excel Masterclass 1 - DS7
No ratings yet
Manual - Excel Masterclass 1 - DS7
4 pages
Note 1670678 - New Features in SAP GUI For Windows 7.30
No ratings yet
Note 1670678 - New Features in SAP GUI For Windows 7.30
4 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Learn Cassandra in 24 Hours
From Everand
Learn Cassandra in 24 Hours
Alex Nordeen
No ratings yet

Topic 1:: Spark Structured Streaming

Uploaded by

Topic 1:: Spark Structured Streaming

Uploaded by

Topic 1:

What is Spark Streaming?

How to implement sliding window analysis in Spark Streaming:

 Spark Structured Streaming:

 Defining window parameters:

The three properties of the CAP theorem

 Consistency: All reads receive the most recent write or an error

How the CAP theorem works

 Distributed systems usually produce two of the three properties simultaneously

 Global tables: DynamoDB's global tables are multi-region databases that

 Secondary indexes: DynamoDB uses secondary indexes to provide more querying

 DynamoDB supports three data types: number, string, and binary.

 It also supports document stores such as JSON, XML, or HTML.

 It stores data as rows organized into tables or column families

You might also like