0% found this document useful (0 votes)

18 views32 pages

BOSS16 Tutorial Flink

The document provides an introduction to stream processing with Apache Flink, highlighting its capabilities for high throughput, low latency, and fault tolerance. It covers key concepts such as event-time processing, windowed computations, and handling node failures. Additionally, it emphasizes the importance of continuous data processing and the growing community around Apache Flink.

Uploaded by

drivesankofa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views32 pages

BOSS16 Tutorial Flink

Uploaded by

drivesankofa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Introduction to Stream

Processing with Apache Flink®

Kostas Kloudas
Vasia Kalavri
Jonas Traub
Who are we?
 Kostas: software engineer @ data Artisans

 Vasia: PhD student @ KTH Stockholm

 Jonas: research associate @ TU Berlin

2
Overview
 What is Stream Processing?
 What is Apache Flink?
 Windowed computations over streams
 Handling time
 Handling node failures
 Handling planned downtime
 Handling code upgrades

3
Demo instructions…

Robust Stream Processing with Apache Flink®: A Simple Walkthrough

http://data-artisans.com/robust-stream-processing-flink-walkthrough/#more-1181

Make sure you download: Apache Flink 1.0.3

4
Stateless stream processing

5
Stateful stream processing

6
Why should you care?

Data production is and has always been a

continuous process.

Stream processing enables the obvious:

Continuous processing on data that is
continuously produced

7
What is Apache Flink?

8
A data processing engine

Apache Flink is an open source platform for

distributed stream and batch processing

9
The Apache Flink Ecosystem

SQL

SQL
10
What does Flink provide?
 High Throughput and Low Latency
• Yahoo! Benchmark : https://yahooeng.tumblr.com/post/135321837876/benchmarking-
streaming-computation-engines-at
• Extended by Data Artisans: http://data-artisans.com/extending-the-yahoo-streaming-benchmark/

11
What does Flink provide?
 High Throughput and Low Latency
 Event-time (out-of-order) processing
 Exactly-once semantics
 Flexible windowing
 Fault-Tolerance

12
Time for demo…

Robust Stream Processing with Apache Flink®: A Simple Walkthrough

http://data-artisans.com/robust-stream-processing-flink-walkthrough/#more-1181

13
Setup:

Sensor
Data

14
Windowed computations

15
Handling time

16
Handling time

The system has to respect the same clock

as the data.

17
Event Time vs Processing Time

Event Time
Episode Episode Episode Episode Episode Episode Episode
IV V VI I II III VII

1977 1980 1983 1999 2002 2005 2015

Processing Time

18
Handling time: Watermarks
 Special events generated by the sources.

 A watermark for time T states that event

time has progressed to T in that particular
stream (or partition).

 No events with a timestamp smaller than T

can arrive any more.

19
Handling time: Watermarks
Sources emit elements and watermarks….

…operators always emit the lowest watermark

20
Handling time: Watermarks

21
Handling node failures

22
Checkpoints
Sources emit elements and checkpoints….

23
Checkpoints

24
Handling planned downtime

25
Handling code upgrades

26
Is Apache Flink only that?

Apache Flink is an open source platform for

distributed stream and batch processing

27
Its lively community
Apache Flink Community Growth
Stars on Github Contributors Forks on Github
1800 250 1200
1600
200 1000
1400
1200 800
150
1000
600
800
100
600 400
400 50 200
200
0 0 0
Feb.15 Dec.15 Aug.16 Feb.15 Dec.15 Aug.16 Feb.15 Dec.15 Aug.16

 You can join:

• Follow: @ApacheFlink, @dataArtisans
• Read: flink.apache.org/blog, data-artisans.com/blog
• Subscribe: (news | user | dev) @ flink.apache.org

28
Its Users

…https://flink.apache.org/poweredby.html
29
All of them will meet at...
http://flink-forward.org/
All of them will meet at...
http://flink-forward.org/
Further Reading
 Event-time processing:
• The Dataflow Model: http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
• http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/

 Checkpointing and State:

• Distributed Snapshots: Determining Global States of Distributed Systems
http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy.pdf
• Lightweight Asynchronous Snapshots for Distributed Dataflows
https://arxiv.org/abs/1506.08603
• Working with State in Flink: https://ci.apache.org/projects/flink/flink-docs-
master/dev/state.html

 Savepoints:
• https://ci.apache.org/projects/flink/flink-docs-master/setup/savepoints.html

Stream Processing - Hands-On With Apache Flink (Giannis Polyzos) (Z-Library)
No ratings yet
Stream Processing - Hands-On With Apache Flink (Giannis Polyzos) (Z-Library)
234 pages
Apache Flink Introduction - Big Data Landscape
No ratings yet
Apache Flink Introduction - Big Data Landscape
26 pages
Math and Vocabulary For Civil Service Exams
97% (36)
Math and Vocabulary For Civil Service Exams
304 pages
Tour Commentary 1
100% (2)
Tour Commentary 1
22 pages
Chapter 7 Flink Stream and Batch Processing in A Single Engine
No ratings yet
Chapter 7 Flink Stream and Batch Processing in A Single Engine
45 pages
Apache Flink Tutorial
100% (1)
Apache Flink Tutorial
44 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
21 pages
Hyderabad Meetup Dec 7th 2024 - Diptiman - Confluent
No ratings yet
Hyderabad Meetup Dec 7th 2024 - Diptiman - Confluent
85 pages
Spring Kafka Reference
No ratings yet
Spring Kafka Reference
226 pages
ITHome - Deep Dive Into Apache Flink - Gordon
No ratings yet
ITHome - Deep Dive Into Apache Flink - Gordon
44 pages
ECS765P - W11 - Stream Processing II
No ratings yet
ECS765P - W11 - Stream Processing II
47 pages
Biochemistry Answer Key-PINK PACOP
100% (3)
Biochemistry Answer Key-PINK PACOP
29 pages
Flink HandsOn
No ratings yet
Flink HandsOn
39 pages
Boiler Automation Using Programmable Logic Control Final
92% (37)
Boiler Automation Using Programmable Logic Control Final
30 pages
Stream Processing Hands On With Apache Flink Free Lms Version
No ratings yet
Stream Processing Hands On With Apache Flink Free Lms Version
232 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
11 pages
Module 08 Flink - Stream Processing and Batch Processing Platform
No ratings yet
Module 08 Flink - Stream Processing and Batch Processing Platform
40 pages
Kubernetes and Real Time World Analytics Albert Lewandowski
No ratings yet
Kubernetes and Real Time World Analytics Albert Lewandowski
55 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
Apache Flink ™: Stream and Batch Processing in A Single Engine
No ratings yet
Apache Flink ™: Stream and Batch Processing in A Single Engine
11 pages
BOYSEN® Elasti-Kote™: Description Technical Data
100% (1)
BOYSEN® Elasti-Kote™: Description Technical Data
2 pages
Flink
No ratings yet
Flink
31 pages
Cessing
No ratings yet
Cessing
67 pages
6 - Streaming Part 1
No ratings yet
6 - Streaming Part 1
44 pages
SA Unit 1 PPT 2
No ratings yet
SA Unit 1 PPT 2
27 pages
Apache Flink.9443699.Powerpoint
No ratings yet
Apache Flink.9443699.Powerpoint
6 pages
Apache SD Papers
No ratings yet
Apache SD Papers
21 pages
Choose The Right Stream Processing Engine Whitepaper
No ratings yet
Choose The Right Stream Processing Engine Whitepaper
16 pages
Apache Flink
No ratings yet
Apache Flink
40 pages
Sony Hcd-Ex600 Ex700 Ex900 Ver.1.0
No ratings yet
Sony Hcd-Ex600 Ex700 Ex900 Ver.1.0
72 pages
5a - Streaming Data Analytics PDF
No ratings yet
5a - Streaming Data Analytics PDF
37 pages
Big Data PDF
No ratings yet
Big Data PDF
10 pages
Question Bank
No ratings yet
Question Bank
15 pages
Optimizing Flink For High-Throughput Machine Learning: Streaming Feature Engineering in Banking
No ratings yet
Optimizing Flink For High-Throughput Machine Learning: Streaming Feature Engineering in Banking
10 pages
Common Flink Mistakes
No ratings yet
Common Flink Mistakes
23 pages
TRabl StreamProcessing
No ratings yet
TRabl StreamProcessing
79 pages
Experiences Running Apache Flink at Very Large Scale: @stephanewen Berlin Buzzwords, 2017
No ratings yet
Experiences Running Apache Flink at Very Large Scale: @stephanewen Berlin Buzzwords, 2017
76 pages
Flink - Basics
No ratings yet
Flink - Basics
15 pages
02data Stream Processing With Apache Flink
No ratings yet
02data Stream Processing With Apache Flink
61 pages
Buyers Guide - Decoding The Top 4 Real-Time Data Platforms Powered by Apache Flink
No ratings yet
Buyers Guide - Decoding The Top 4 Real-Time Data Platforms Powered by Apache Flink
17 pages
Continuous Processing With Apache Flink: Stephan Ewen @stephanewen
No ratings yet
Continuous Processing With Apache Flink: Stephan Ewen @stephanewen
41 pages
Yearly Scheme of Work Year 1
No ratings yet
Yearly Scheme of Work Year 1
10 pages
Apache Flink® Training: Intro
No ratings yet
Apache Flink® Training: Intro
37 pages
Unit 2 BD Mining Data Streams
No ratings yet
Unit 2 BD Mining Data Streams
34 pages
Apache Flink
No ratings yet
Apache Flink
116 pages
TG Read Up 3
No ratings yet
TG Read Up 3
98 pages
Ververica Platform Whitepaper Stream Processing For Real-Time Business, Powered by Apache Flink®
No ratings yet
Ververica Platform Whitepaper Stream Processing For Real-Time Business, Powered by Apache Flink®
22 pages
Apache-Kafka Bernhard-H Oss 2018
No ratings yet
Apache-Kafka Bernhard-H Oss 2018
35 pages
Big Data IV Nit
No ratings yet
Big Data IV Nit
15 pages
Unified Batch and Real Time Stream Processing
No ratings yet
Unified Batch and Real Time Stream Processing
68 pages
BDA Notes (Unit-1)
No ratings yet
BDA Notes (Unit-1)
11 pages
Datastream Api: Fault Tolerance
No ratings yet
Datastream Api: Fault Tolerance
26 pages
234 Solution
No ratings yet
234 Solution
89 pages
Csa Overview
No ratings yet
Csa Overview
9 pages
Datastream Api: Time and Watermarks
No ratings yet
Datastream Api: Time and Watermarks
24 pages
Apache Flink On Confluent Cloud
No ratings yet
Apache Flink On Confluent Cloud
2 pages
Kafka Overview
No ratings yet
Kafka Overview
51 pages
Report
No ratings yet
Report
5 pages
Synopsis of Scientific Research Methods
No ratings yet
Synopsis of Scientific Research Methods
41 pages
Mawaporasirukinu
No ratings yet
Mawaporasirukinu
2 pages
02 - Docker
No ratings yet
02 - Docker
34 pages
Uint 4miningdatastream 230810162429 9d7c02a7
No ratings yet
Uint 4miningdatastream 230810162429 9d7c02a7
11 pages
Study of Siesmic Analysis of Multistorey Building With or Without Floating Columns
No ratings yet
Study of Siesmic Analysis of Multistorey Building With or Without Floating Columns
18 pages
Chapter 6 Spark and Flink Questions Answers
No ratings yet
Chapter 6 Spark and Flink Questions Answers
5 pages
Evaluation of Management Thought
67% (3)
Evaluation of Management Thought
5 pages
Lightweight Asynchronous Snapshots For Distributed Dataflows (Flink)
No ratings yet
Lightweight Asynchronous Snapshots For Distributed Dataflows (Flink)
8 pages
Flink: Big Data Huawei Course
No ratings yet
Flink: Big Data Huawei Course
22 pages
Apache Flink Is An Open-Source, Dis
No ratings yet
Apache Flink Is An Open-Source, Dis
2 pages
Company Profile-Polybond
No ratings yet
Company Profile-Polybond
40 pages
Solid Waste Management
No ratings yet
Solid Waste Management
16 pages
Electrostatic
No ratings yet
Electrostatic
77 pages
Flink: Another Data Stream Framework!
No ratings yet
Flink: Another Data Stream Framework!
7 pages
Kaizen Model in African Bank
No ratings yet
Kaizen Model in African Bank
107 pages
ECO2147 - Asgm1 - Summer2025V3 (1) - 1
No ratings yet
ECO2147 - Asgm1 - Summer2025V3 (1) - 1
8 pages
Morocco - Entrepreneurs & Venturing
No ratings yet
Morocco - Entrepreneurs & Venturing
72 pages
Anna Curtenius Roosevelt
No ratings yet
Anna Curtenius Roosevelt
6 pages
Lab Report #4
No ratings yet
Lab Report #4
6 pages
Solar Eme
No ratings yet
Solar Eme
7 pages
Final Quiz 1 - Attempt Review
No ratings yet
Final Quiz 1 - Attempt Review
6 pages
Present Perfect Resource Master
No ratings yet
Present Perfect Resource Master
13 pages
Hyrje Modelim
No ratings yet
Hyrje Modelim
19 pages
AV LÍNGUA INGLESA ASPECTOS MORFOSSINTÁTICOS - Prova 2 - AGOSTO
No ratings yet
AV LÍNGUA INGLESA ASPECTOS MORFOSSINTÁTICOS - Prova 2 - AGOSTO
5 pages
Apache Kafka-Flink Syllabus
No ratings yet
Apache Kafka-Flink Syllabus
2 pages
Prob. Distribution
No ratings yet
Prob. Distribution
24 pages
References
No ratings yet
References
13 pages
2021 General Studies Paper 3
No ratings yet
2021 General Studies Paper 3
5 pages
Carcassi Allegretto Op59 Werner
No ratings yet
Carcassi Allegretto Op59 Werner
1 page
Java Test
No ratings yet
Java Test
2 pages
Brochure - Fibra-Cel Disks Questions and Answers
No ratings yet
Brochure - Fibra-Cel Disks Questions and Answers
4 pages
Experiment WWC Wetted Wall Cooling Tower
No ratings yet
Experiment WWC Wetted Wall Cooling Tower
6 pages
Mastering Python: A Comprehensive Approach for Beginners and Beyond
From Everand
Mastering Python: A Comprehensive Approach for Beginners and Beyond
Williams Asiedu
No ratings yet
Mastering Python
From Everand
Mastering Python
Williams Asiedu
No ratings yet
The Apache Kafka® and Generative AI Handbook
From Everand
The Apache Kafka® and Generative AI Handbook
Joseph Matthew Stein
No ratings yet
FreeSWITCH 1.0.6
From Everand
FreeSWITCH 1.0.6
Anthony Minessale
No ratings yet
The Little Book of Sitecore® Tips: Volume 1
From Everand
The Little Book of Sitecore® Tips: Volume 1
Neil P Shack
No ratings yet
A Practical Guide Wireshark Forensics
From Everand
A Practical Guide Wireshark Forensics
alasdair gilchrist
5/5 (4)

BOSS16 Tutorial Flink

Uploaded by

BOSS16 Tutorial Flink

Uploaded by

Introduction to Stream

Processing with Apache Flink®

 Vasia: PhD student @ KTH Stockholm

 Jonas: research associate @ TU Berlin

Robust Stream Processing with Apache Flink®: A Simple Walkthrough

Make sure you download: Apache Flink 1.0.3

Data production is and has always been a

Stream processing enables the obvious:

Apache Flink is an open source platform for

Robust Stream Processing with Apache Flink®: A Simple Walkthrough

The system has to respect the same clock

1977 1980 1983 1999 2002 2005 2015

 A watermark for time T states that event

 No events with a timestamp smaller than T

…operators always emit the lowest watermark

Apache Flink is an open source platform for

 You can join:

 Checkpointing and State:

You might also like