Big Data PDF

This document introduces stream processing in Big Data, highlighting its necessity for real-time insights due to the continuous and time-sensitive nature of stream data. It contrasts batch processing with stream processing, outlines architectural patterns like Lambda and Kappa, and discusses frameworks such as Apache Kafka and Apache Flink. Additionally, it addresses scalability, fault tolerance, best practices, and emerging trends in stream data processing.

Uploaded by

Priyanka Arya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views10 pages

Big Data PDF

Uploaded by

Priyanka Arya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Introduction to

Stream Processing
in Big Data
In the age of Big Data, traditional batch processing techniques are no
longer sufficient to handle the massive volumes and rapid velocity of
data streams. This presentation explores the concepts and
techniques of stream processing, enabling real-time insights and
decision-making.

B y: Priyanka Arya
Understanding the
Concept of Stream Data
1 Continuous Data 2 Time Sensitivity
Flow
Stream data is often time-
Stream data is a sensitive, meaning that
continuous flow of data insights must be derived
points, arriving at high quickly to be actionable.
speeds. It is not processed
in batches, but rather in
real-time as it arrives.

3 Unbounded Nature
Stream data is unbounded, meaning that it can continue to
arrive indefinitely, requiring systems to handle continuous
processing.
Key Characteristics of
Stream Data
High Volume High Velocity
Stream data often arrives at Data points arrive at high
very high volumes, requiring speeds, requiring systems to
systems to process large process data quickly to keep
amounts of data in real-time. up with the flow.

Variety
Stream data can come from diverse sources, including sensor
data, social media feeds, and financial transactions.
Differences between Batch and Stream
Processing
Batch Processing Stream Processing

Processes data in batches, typically offline, allowing for Processes data continuously in real-time, focusing on
more complex calculations. speed and low latency.
Architectural Patterns for
Stream Processing
1 Lambda Architecture
Combines batch and stream processing, enabling both
immediate insights and historical analysis.

2 Kappa Architecture
Focuses solely on stream processing, providing real-time
insights with a unified approach.

3 Micro-Batching
Processes data in small batches at high frequencies,
bridging the gap between batch and stream processing.
Overview of Stream Processing Frameworks

Apache Kafka Apache Flink Apache Spark Streaming

A distributed streaming platform A powerful open-source stream A micro-batching engine built on
designed for high-throughput, real- processing framework that excels in Apache Spark, providing a flexible
time data ingestion and processing. both speed and scalability. approach to stream processing.
Fundamental Concepts:
Windows, Watermarks, and
Late Arrivals
Windows Group data into time-based
segments for processing, enabling
insights based on specific
intervals.
Watermarks Mark a point in time beyond
which the system considers data
as potentially late or out of order.

Late Arrivals Data points that arrive out of

order or after the watermark
threshold require handling to
maintain consistency.
Scalability and Fault Tolerance in Stream Processing
Distributed Processing
Stream processing frameworks leverage distributed architectures to handle high volumes of data.

Fault Tolerance
Redundancy and checkpointing mechanisms ensure data integrity and system uptime even in the event of failures.

Scalability
Stream processing systems can scale horizontally by adding more nodes to handle increasing data volumes and processing demands.
Best Practices for
Designing Efficient Stream
Processing Pipelines

Optimize Data Schema Code for Fault Tolerance

Design data schemas that are Implement robust error handling
efficient for processing and and recovery mechanisms.
storage.

Monitor Performance Ensure Security

Track key metrics like latency, Protect data confidentiality and
throughput, and resource usage. integrity throughout the pipeline.
Challenges and Future
Trends in Stream Data
Processing
1 Real-Time Analytics 2 Edge Computing
Advanced machine Stream processing is
learning and AI increasingly being
techniques are being deployed at the edge,
applied to stream enabling faster insights
processing for real-time from local data sources.
insights.

3 Serverless Stream Processing

Serverless architectures are simplifying the deployment and
management of stream processing applications.

I Puc Computer Science Lab Manual 2024-2025 - With - Flowcharts
89% (9)
I Puc Computer Science Lab Manual 2024-2025 - With - Flowcharts
61 pages
Workshop 1 - PRF192 - FPTU
No ratings yet
Workshop 1 - PRF192 - FPTU
4 pages
SA Unit 1 PPT 2
No ratings yet
SA Unit 1 PPT 2
27 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
Principles of Real-Time Data Streaming: Definitive Reference for Developers and Engineers
From Everand
Principles of Real-Time Data Streaming: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
6 - Streaming Part 1
No ratings yet
6 - Streaming Part 1
44 pages
Stream Processing
No ratings yet
Stream Processing
33 pages
BDA Lec10
No ratings yet
BDA Lec10
33 pages
Reference Guide To Stream Processing
No ratings yet
Reference Guide To Stream Processing
14 pages
ECS765P - W10 - Stream Processing
No ratings yet
ECS765P - W10 - Stream Processing
39 pages
Choose The Right Stream Processing Engine Whitepaper
No ratings yet
Choose The Right Stream Processing Engine Whitepaper
16 pages
Unit 2 BD Mining Data Streams
No ratings yet
Unit 2 BD Mining Data Streams
34 pages
7 - Streaming 2 - Calcite
No ratings yet
7 - Streaming 2 - Calcite
45 pages
TRabl StreamProcessing
No ratings yet
TRabl StreamProcessing
79 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
11 pages
Ade Mod 1 Incremental Processing With Spark Structured Streaming
No ratings yet
Ade Mod 1 Incremental Processing With Spark Structured Streaming
73 pages
Big Data IV Nit
No ratings yet
Big Data IV Nit
15 pages
What Is Stream Processing
No ratings yet
What Is Stream Processing
3 pages
A Deep Dive Into Data Stream Processing
No ratings yet
A Deep Dive Into Data Stream Processing
10 pages
Bigdata Unit-Ii
No ratings yet
Bigdata Unit-Ii
33 pages
Unit 3-6
No ratings yet
Unit 3-6
14 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
57 pages
DSPL Casestidy
No ratings yet
DSPL Casestidy
3 pages
Uint 4miningdatastream 230810162429 9d7c02a7
No ratings yet
Uint 4miningdatastream 230810162429 9d7c02a7
11 pages
RisingWave for Real-Time Data Processing: The Complete Guide for Developers and Engineers
From Everand
RisingWave for Real-Time Data Processing: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Storm Systems for Real-Time Data Processing: Definitive Reference for Developers and Engineers
From Everand
Storm Systems for Real-Time Data Processing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Batch Processing Vs Stream Processing
No ratings yet
Batch Processing Vs Stream Processing
3 pages
Ververica Platform Whitepaper Stream Processing For Real-Time Business, Powered by Apache Flink®
No ratings yet
Ververica Platform Whitepaper Stream Processing For Real-Time Business, Powered by Apache Flink®
22 pages
Ccunit 5
No ratings yet
Ccunit 5
4 pages
Lec 19
No ratings yet
Lec 19
24 pages
Kinesis Stream Processing Essentials: Definitive Reference for Developers and Engineers
From Everand
Kinesis Stream Processing Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Lec 19
No ratings yet
Lec 19
23 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
BDA Unit 3
No ratings yet
BDA Unit 3
42 pages
Data Engineering Concepts For Mid-to-Senior Professionals
No ratings yet
Data Engineering Concepts For Mid-to-Senior Professionals
27 pages
Big Data 3rd Unit
No ratings yet
Big Data 3rd Unit
16 pages
Stream Processing and Website Tracking
No ratings yet
Stream Processing and Website Tracking
2 pages
SA Unit 1 PPT 1
No ratings yet
SA Unit 1 PPT 1
19 pages
Lec 02
No ratings yet
Lec 02
13 pages
Stream Processing Techniques and Patterns: Definitive Reference for Developers and Engineers
From Everand
Stream Processing Techniques and Patterns: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Apache Flink ™: Stream and Batch Processing in A Single Engine
No ratings yet
Apache Flink ™: Stream and Batch Processing in A Single Engine
11 pages
Apache SD Papers
No ratings yet
Apache SD Papers
21 pages
Unit 3
No ratings yet
Unit 3
51 pages
SA Unit 1 PPT 5
No ratings yet
SA Unit 1 PPT 5
14 pages
Batch Processing Vs Stream Processing
No ratings yet
Batch Processing Vs Stream Processing
13 pages
High-Performance Stream Processing with Faust and Python: The Complete Guide for Developers and Engineers
From Everand
High-Performance Stream Processing with Faust and Python: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Streaming Graph Processing Unit5
No ratings yet
Streaming Graph Processing Unit5
7 pages
Unit 3 Data Analytics
No ratings yet
Unit 3 Data Analytics
15 pages
ITHome - Deep Dive Into Apache Flink - Gordon
No ratings yet
ITHome - Deep Dive Into Apache Flink - Gordon
44 pages
Bigdata
No ratings yet
Bigdata
3 pages
Continuous Processing With Apache Flink: Stephan Ewen @stephanewen
No ratings yet
Continuous Processing With Apache Flink: Stephan Ewen @stephanewen
41 pages
Lec 05
No ratings yet
Lec 05
10 pages
Compute Engine
No ratings yet
Compute Engine
49 pages
Big Data Stream Processing
No ratings yet
Big Data Stream Processing
25 pages
Getting Started With Real-Time Analytics With Kafka and Spark in Microsoft Azure - Joe Plumb.
No ratings yet
Getting Started With Real-Time Analytics With Kafka and Spark in Microsoft Azure - Joe Plumb.
44 pages
Real Time Data Streaming New Techniques
No ratings yet
Real Time Data Streaming New Techniques
5 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
14 pages
Lambda Architecture
No ratings yet
Lambda Architecture
20 pages
4 Building Blocks of A Streaming Data Architecture
No ratings yet
4 Building Blocks of A Streaming Data Architecture
11 pages
Practical Dataflow Engineering: Definitive Reference for Developers and Engineers
From Everand
Practical Dataflow Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Bigdata-Mining Data Streams
No ratings yet
Bigdata-Mining Data Streams
19 pages
b0m33bdt 7p Spark Databricks Streaming - 2023 - en
No ratings yet
b0m33bdt 7p Spark Databricks Streaming - 2023 - en
50 pages
AWS Certificate
No ratings yet
AWS Certificate
1 page
PRIYANKA ARYA Participant Certificate
No ratings yet
PRIYANKA ARYA Participant Certificate
1 page
Intellicus Shortlisted Candidates For TS and JFD
No ratings yet
Intellicus Shortlisted Candidates For TS and JFD
8 pages
AITR - Updated List - NQT 2025
No ratings yet
AITR - Updated List - NQT 2025
15 pages
Techincal Poster 2
No ratings yet
Techincal Poster 2
1 page
Webpage and Telegram Bot Controlled Home Automation System Using Raspberry Pi3
No ratings yet
Webpage and Telegram Bot Controlled Home Automation System Using Raspberry Pi3
5 pages
Congenital Heart Desease
100% (1)
Congenital Heart Desease
212 pages
Yuken A
No ratings yet
Yuken A
92 pages
06T Semihermetic Screw Compressor
No ratings yet
06T Semihermetic Screw Compressor
8 pages
AAA Minahil Saeed Assigment 1
No ratings yet
AAA Minahil Saeed Assigment 1
4 pages
Winding
No ratings yet
Winding
15 pages
Experimental Study On Bond Slip Relationship of Steel Sleeve
No ratings yet
Experimental Study On Bond Slip Relationship of Steel Sleeve
5 pages
Physics Asl Project (Edited)
No ratings yet
Physics Asl Project (Edited)
22 pages
Technical Information
No ratings yet
Technical Information
88 pages
D400 Research Proposal Format 2021
No ratings yet
D400 Research Proposal Format 2021
5 pages
The Water Colour Technique of Architectural Rendering PDF
75% (4)
The Water Colour Technique of Architectural Rendering PDF
40 pages
PLSQL
No ratings yet
PLSQL
48 pages
SAE AMS 5529j-2012
No ratings yet
SAE AMS 5529j-2012
5 pages
Through, From, Out, On, and At. A Prepositional Phrase Includes A Preposition A Noun
No ratings yet
Through, From, Out, On, and At. A Prepositional Phrase Includes A Preposition A Noun
2 pages
Lecture 6 - Disinfection
No ratings yet
Lecture 6 - Disinfection
91 pages
Emat Sensor Design
No ratings yet
Emat Sensor Design
20 pages
Fluo Lm2: "Translation of Original Instructions"
100% (1)
Fluo Lm2: "Translation of Original Instructions"
38 pages
Log
No ratings yet
Log
2 pages
ISC 5 Years Chemistry-1
No ratings yet
ISC 5 Years Chemistry-1
8 pages
Injection Moulding
No ratings yet
Injection Moulding
155 pages
HP95 Ai
No ratings yet
HP95 Ai
3 pages
Dsac 033
No ratings yet
Dsac 033
14 pages
8365 1 Question Paper June 2023
No ratings yet
8365 1 Question Paper June 2023
24 pages
SAT Subject Math Level 1 Practice Test 3
No ratings yet
SAT Subject Math Level 1 Practice Test 3
17 pages
Kelm102 PDF
No ratings yet
Kelm102 PDF
37 pages
Micro
No ratings yet
Micro
17 pages
Lecture 2 - Unit 1 - Types of Research
No ratings yet
Lecture 2 - Unit 1 - Types of Research
17 pages
Operating Systems CS-362 3 Dr. Iftikhar H. Shah
No ratings yet
Operating Systems CS-362 3 Dr. Iftikhar H. Shah
13 pages

Big Data PDF

Uploaded by

Big Data PDF

Uploaded by

Introduction to

Apache Kafka Apache Flink Apache Spark Streaming

Late Arrivals Data points that arrive out of

Optimize Data Schema Code for Fault Tolerance

Monitor Performance Ensure Security

3 Serverless Stream Processing

You might also like