0% found this document useful (0 votes)
18 views32 pages

BOSS16 Tutorial Flink

The document provides an introduction to stream processing with Apache Flink, highlighting its capabilities for high throughput, low latency, and fault tolerance. It covers key concepts such as event-time processing, windowed computations, and handling node failures. Additionally, it emphasizes the importance of continuous data processing and the growing community around Apache Flink.

Uploaded by

drivesankofa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views32 pages

BOSS16 Tutorial Flink

The document provides an introduction to stream processing with Apache Flink, highlighting its capabilities for high throughput, low latency, and fault tolerance. It covers key concepts such as event-time processing, windowed computations, and handling node failures. Additionally, it emphasizes the importance of continuous data processing and the growing community around Apache Flink.

Uploaded by

drivesankofa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Introduction to Stream

Processing with Apache Flink®


Kostas Kloudas
Vasia Kalavri
Jonas Traub
Who are we?
 Kostas: software engineer @ data Artisans

 Vasia: PhD student @ KTH Stockholm

 Jonas: research associate @ TU Berlin

2
Overview
 What is Stream Processing?
 What is Apache Flink?
 Windowed computations over streams
 Handling time
 Handling node failures
 Handling planned downtime
 Handling code upgrades

3
Demo instructions…

Robust Stream Processing with Apache Flink®: A Simple Walkthrough


http://data-artisans.com/robust-stream-processing-flink-walkthrough/#more-1181

Make sure you download: Apache Flink 1.0.3

4
Stateless stream processing

5
Stateful stream processing

6
Why should you care?

Data production is and has always been a


continuous process.

Stream processing enables the obvious:


Continuous processing on data that is
continuously produced

7
What is Apache Flink?

8
A data processing engine

Apache Flink is an open source platform for


distributed stream and batch processing

9
The Apache Flink Ecosystem

SQL

SQL
10
What does Flink provide?
 High Throughput and Low Latency
• Yahoo! Benchmark : https://yahooeng.tumblr.com/post/135321837876/benchmarking-
streaming-computation-engines-at
• Extended by Data Artisans: http://data-artisans.com/extending-the-yahoo-streaming-benchmark/

11
What does Flink provide?
 High Throughput and Low Latency
 Event-time (out-of-order) processing
 Exactly-once semantics
 Flexible windowing
 Fault-Tolerance

12
Time for demo…

Robust Stream Processing with Apache Flink®: A Simple Walkthrough


http://data-artisans.com/robust-stream-processing-flink-walkthrough/#more-1181

13
Setup:

Sensor
Data

14
Windowed computations

15
Handling time

16
Handling time

The system has to respect the same clock


as the data.

17
Event Time vs Processing Time

Event Time
Episode Episode Episode Episode Episode Episode Episode
IV V VI I II III VII

1977 1980 1983 1999 2002 2005 2015

Processing Time

18
Handling time: Watermarks
 Special events generated by the sources.

 A watermark for time T states that event


time has progressed to T in that particular
stream (or partition).

 No events with a timestamp smaller than T


can arrive any more.

19
Handling time: Watermarks
Sources emit elements and watermarks….

…operators always emit the lowest watermark

20
Handling time: Watermarks

21
Handling node failures

22
Checkpoints
Sources emit elements and checkpoints….

23
Checkpoints

24
Handling planned downtime

25
Handling code upgrades

26
Is Apache Flink only that?

Apache Flink is an open source platform for


distributed stream and batch processing

27
Its lively community
Apache Flink Community Growth
Stars on Github Contributors Forks on Github
1800 250 1200
1600
200 1000
1400
1200 800
150
1000
600
800
100
600 400
400 50 200
200
0 0 0
Feb.15 Dec.15 Aug.16 Feb.15 Dec.15 Aug.16 Feb.15 Dec.15 Aug.16

 You can join:


• Follow: @ApacheFlink, @dataArtisans
• Read: flink.apache.org/blog, data-artisans.com/blog
• Subscribe: (news | user | dev) @ flink.apache.org

28
Its Users

…https://flink.apache.org/poweredby.html
29
All of them will meet at...
http://flink-forward.org/
All of them will meet at...
http://flink-forward.org/
Further Reading
 Event-time processing:
• The Dataflow Model: http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
• http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/

 Checkpointing and State:


• Distributed Snapshots: Determining Global States of Distributed Systems
http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy.pdf
• Lightweight Asynchronous Snapshots for Distributed Dataflows
https://arxiv.org/abs/1506.08603
• Working with State in Flink: https://ci.apache.org/projects/flink/flink-docs-
master/dev/state.html

 Savepoints:
• https://ci.apache.org/projects/flink/flink-docs-master/setup/savepoints.html

32

You might also like