Spark Streaming Assignment

The assignment involves processing real-time advertisement data from a Kafka topic named ads_data using Spark Streaming. The objective is to perform window-based aggregation on the data, calculating total clicks, views, and average cost per view per ad_id, and then store the results in a Cassandra table. The submission requires the Spark Streaming application code and a report detailing the results and challenges encountered.

Uploaded by

Keshav Durgampudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views2 pages

Spark Streaming Assignment

Uploaded by

Keshav Durgampudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Spark Streaming Assignment: Real-Time Advertisement Data

Aggregation

Objective: Process real-time advertisement data using Spark Streaming to gain

business insights and store the aggregated data into Cassandra.

Background:
You have been provided with a Kafka topic named ads_data that contains
advertisement data in the following format:

ills
{
"ad_id": "12345",
"timestamp": "2023-08-23T12:01:05Z",

Sk
"clicks": 5,
"views": 10,
"cost": 50.75
}
a
at
The goal is to process this real-time data, compute business insights using
window-based aggregation, and write the aggregated results into a Cassandra
table. The aggregation key is ad_id, and aggregated values should update
D

previous values in the Cassandra table.

Tasks:
w

● Kafka setup and Mock data producer:

○ Set up Confluent Kafka on cloud or local
ro

○ Create topic named as ads_data

○ Write a python script which will use above mentioned data format
G

and keep on publishing random mock data in avro serialized form

into Kafka topic
● Reading Data from Kafka:
○ Set up a Spark Streaming application.
○ Use the Kafka connector to read data from the ads_data topic.
○ Parse & desearialize the incoming data into the appropriate
structure.

● Windowing Based Aggregation:

○ Perform a window-based aggregation over a window duration
(e.g., 1 minute) and sliding interval (e.g., 30 seconds).
○ Aggregate the following:
■ Total clicks per ad_id.
■ Total views per ad_id.
■ Average cost per view for each ad_id.

● Write Aggregated Data to Cassandra:

○ For each ad_id, check if an entry already exists in the Cassandra
table.

ills
○ If an entry exists, update the values:
■ Add new clicks/views to the existing counts.
■ Update the average cost per view.

Sk
● If an entry doesn't exist, create a new row with the aggregated
values.

Submission:
a
Submit your Spark Streaming application code, along with a brief report
detailing the results and any challenges faced during the assignment.
at
D
w
ro
G

Big Data Concepts - Spark & Streaming
No ratings yet
Big Data Concepts - Spark & Streaming
35 pages
DAV Chapter3
No ratings yet
DAV Chapter3
44 pages
SPARK
No ratings yet
SPARK
35 pages
Projectnew
No ratings yet
Projectnew
21 pages
Bài Giảng Spark Streaming
No ratings yet
Bài Giảng Spark Streaming
75 pages
Spark Kafka
No ratings yet
Spark Kafka
14 pages
Scenario-Based Questions On Integrating Data in A Cloud
No ratings yet
Scenario-Based Questions On Integrating Data in A Cloud
17 pages
Project - Traffic Data Analysis
No ratings yet
Project - Traffic Data Analysis
20 pages
(BigData) Lab04 - Streaming
No ratings yet
(BigData) Lab04 - Streaming
8 pages
Spark Streaming
No ratings yet
Spark Streaming
14 pages
Importing Data Into Oracle ERP Cloud Using Oracle Integration Cloud
No ratings yet
Importing Data Into Oracle ERP Cloud Using Oracle Integration Cloud
16 pages
4 Spark SBP
No ratings yet
4 Spark SBP
74 pages
Myntra 1751087310
No ratings yet
Myntra 1751087310
10 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
Stream Processing and Analytics Handout
No ratings yet
Stream Processing and Analytics Handout
8 pages
Lec 20
No ratings yet
Lec 20
25 pages
Business - Requirements 2nd Project
No ratings yet
Business - Requirements 2nd Project
6 pages
Data Lake 1
No ratings yet
Data Lake 1
19 pages
Bda Assignment-1
No ratings yet
Bda Assignment-1
3 pages
Python Applications
No ratings yet
Python Applications
8 pages
I Am Sharing 'Assignment Requirement For Data Engineer Level 3-2' With You
No ratings yet
I Am Sharing 'Assignment Requirement For Data Engineer Level 3-2' With You
2 pages
Deepak (Sr. Data Engineer)
No ratings yet
Deepak (Sr. Data Engineer)
10 pages
Big Data Assignment Notes
No ratings yet
Big Data Assignment Notes
13 pages
RealTime Data Analytics Project Checklist
No ratings yet
RealTime Data Analytics Project Checklist
2 pages
Assignment No. 3 For Business Data Analytics
No ratings yet
Assignment No. 3 For Business Data Analytics
16 pages
Deploy any website on google cloud platform
From Everand
Deploy any website on google cloud platform
AJ Books
No ratings yet
Eptar Reinforcement Userguide 11 2013
No ratings yet
Eptar Reinforcement Userguide 11 2013
65 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
26 pages
Scaladayslambda Architecture Spark Cassandra Akka Kafka 150609194508 Lva1 App6891 PDF
No ratings yet
Scaladayslambda Architecture Spark Cassandra Akka Kafka 150609194508 Lva1 App6891 PDF
100 pages
DSPL Casestidy
No ratings yet
DSPL Casestidy
3 pages
I Am Preparing For A Big Data Analytics University...
No ratings yet
I Am Preparing For A Big Data Analytics University...
15 pages
BigdatMid1 Shcema
No ratings yet
BigdatMid1 Shcema
7 pages
User Manual: WWW - Audac.eu
No ratings yet
User Manual: WWW - Audac.eu
60 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Module I
No ratings yet
Module I
85 pages
Data Analytics Assignment
No ratings yet
Data Analytics Assignment
20 pages
Informatica University Training
No ratings yet
Informatica University Training
1 page
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
From Everand
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
vivian njoroge
No ratings yet
Code Explanation
No ratings yet
Code Explanation
3 pages
Real-Time Big Data Analytics - Sample Chapter
100% (2)
Real-Time Big Data Analytics - Sample Chapter
30 pages
Spark Streaming: Tathagata "TD" Das
No ratings yet
Spark Streaming: Tathagata "TD" Das
28 pages
Big Data Analytics Practical Through Practice
No ratings yet
Big Data Analytics Practical Through Practice
4 pages
Ec8681-Microprocessors and Microcontrollers Laboratory-1053372192-Cse MPMC Lab Manual
No ratings yet
Ec8681-Microprocessors and Microcontrollers Laboratory-1053372192-Cse MPMC Lab Manual
116 pages
Bda 2M
No ratings yet
Bda 2M
10 pages
SPA Group 20
No ratings yet
SPA Group 20
16 pages
Correct Mark 1.00 Out of 1.00
No ratings yet
Correct Mark 1.00 Out of 1.00
45 pages
Decomposing SMACK Stack
No ratings yet
Decomposing SMACK Stack
62 pages
APACHE SPARK and Scala
No ratings yet
APACHE SPARK and Scala
49 pages
Ram Madhav Resume
No ratings yet
Ram Madhav Resume
6 pages
Big Data 3rd Assignment Answers
No ratings yet
Big Data 3rd Assignment Answers
8 pages
Big Data Analytics Application
No ratings yet
Big Data Analytics Application
6 pages
Uc PDF
No ratings yet
Uc PDF
10 pages
Real Time Data Streaming New Techniques
No ratings yet
Real Time Data Streaming New Techniques
5 pages
Putting Apache Kafka To Use!: Building A Real-Time Data Platform For Event Streams!
No ratings yet
Putting Apache Kafka To Use!: Building A Real-Time Data Platform For Event Streams!
48 pages
Venu Data Engineering Training in Hyderabad 1
No ratings yet
Venu Data Engineering Training in Hyderabad 1
8 pages
Azure Data Engineer Associate Syllabus
No ratings yet
Azure Data Engineer Associate Syllabus
4 pages
23CP309T BDA MSE Question Paper
No ratings yet
23CP309T BDA MSE Question Paper
2 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
Bda Assign2
No ratings yet
Bda Assign2
4 pages
Create An Spark Streaming App: 1. Architecture and Abstraction
No ratings yet
Create An Spark Streaming App: 1. Architecture and Abstraction
8 pages
4IR Accounting - Learning Guide - 1st Semester 2024
No ratings yet
4IR Accounting - Learning Guide - 1st Semester 2024
13 pages
D2K Imp Ques
100% (1)
D2K Imp Ques
22 pages
Oreillyfodooltweek 11675274112220
No ratings yet
Oreillyfodooltweek 11675274112220
45 pages
RDD
No ratings yet
RDD
4 pages
Apache Spark Streaming Presentation
100% (1)
Apache Spark Streaming Presentation
28 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
4 pages
Spark Scenario Based Interview Questions !! For Interview
No ratings yet
Spark Scenario Based Interview Questions !! For Interview
4 pages
Data Processing
No ratings yet
Data Processing
16 pages
OWASP LLM - GenAI Security Solutions Reference Guide v1.1.25
No ratings yet
OWASP LLM - GenAI Security Solutions Reference Guide v1.1.25
58 pages
Agile Methodologies Exam
No ratings yet
Agile Methodologies Exam
41 pages
Non-Vitrea Floating License Server Install Guide
No ratings yet
Non-Vitrea Floating License Server Install Guide
24 pages
Analyzing Real-Time Data With Spark
No ratings yet
Analyzing Real-Time Data With Spark
7 pages
Edpm Sba
No ratings yet
Edpm Sba
15 pages
Technical Proposal Hospital Management System
No ratings yet
Technical Proposal Hospital Management System
12 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
Assignment 1 JS
No ratings yet
Assignment 1 JS
7 pages
Unit 5 App Implementation in Cloud
No ratings yet
Unit 5 App Implementation in Cloud
10 pages
AVMS PPT JVGPBL
No ratings yet
AVMS PPT JVGPBL
28 pages
SE 2024 Assignment 0
No ratings yet
SE 2024 Assignment 0
5 pages
Accs Brochure
No ratings yet
Accs Brochure
5 pages
Cisco Workgroup Bridges
No ratings yet
Cisco Workgroup Bridges
6 pages
DS PDF
No ratings yet
DS PDF
4 pages
TRA Bulk Employees Uploading by Excel User Guide
No ratings yet
TRA Bulk Employees Uploading by Excel User Guide
13 pages
Nse Option Chain Indices
No ratings yet
Nse Option Chain Indices
12 pages
C Interview Questions in Order
No ratings yet
C Interview Questions in Order
2 pages
Your Appointment's Confirmation Number Is: 1610261
No ratings yet
Your Appointment's Confirmation Number Is: 1610261
2 pages
Top 100 AI Tools - January 2024
No ratings yet
Top 100 AI Tools - January 2024
3 pages
Integrated Grammar
No ratings yet
Integrated Grammar
6 pages
About Enron Modbus - Simply Modbus Software
No ratings yet
About Enron Modbus - Simply Modbus Software
3 pages
Call Center Template
No ratings yet
Call Center Template
8 pages

Spark Streaming Assignment

Uploaded by

Spark Streaming Assignment

Uploaded by

Spark Streaming Assignment: Real-Time Advertisement Data

Objective: Process real-time advertisement data using Spark Streaming to gain

previous values in the Cassandra table.

● Kafka setup and Mock data producer:

○ Create topic named as ads_data

and keep on publishing random mock data in avro serialized form

● Windowing Based Aggregation:

● Write Aggregated Data to Cassandra:

You might also like