0% found this document useful (0 votes)
33 views23 pages

Common Flink Mistakes

Uploaded by

Sergio Bruno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views23 pages

Common Flink Mistakes

Uploaded by

Sergio Bruno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

The Top 5 Mistakes Deploying Robert Metzger

Decodable

Apache Flink
Eric Sammer
Decodable
Webinar

[email protected] @rmetzger_
[email protected] @esammer
Today’s Webinar
The Top 5 Mistakes Deploying Apache Flink
Common Stream Processing Patterns using SQL
Q&A
Common Flink
Mistakes
Robert Metzger
Staff Engineer @ decodable, Committer and PMC Chair @ Flink
#1 Mistake: Serialization is expensive

- Mistake: People use Java Maps, Sets etc. to store state or do network
transfers
- Serialization happens when
- transferring data over the network (between TaskManagers or from/to
Sources/Sinks)
- accessing state in RocksDB (even in-memory)
- Sending data between non-chained tasks locally
- Serialization costs a lot of CPU cycles
#1 Mistake: Serialization is expensive

Example:
package co.decodable.talks.flink.performance;
start lon:11 lat:22
private static class Location { end lon:88 lat:99
int lon;
int lat;
}

DataStream<HashMap< String, Location >> s1 = ...

2 start co.decodable.talks.flink.performance.Location 11 22 end co.decodable.talks.flink.performance.Location 88 99

Map size 1st entry value type


1st entry
2nd entry 2n entry value type 2nd entry ~120 bytes
4 bytes 46 bytes key fields 46 bytes value
value 3 bytes
1st entry fields fields
key 8 bytes 8 bytes
5 bytes
#1 Mistake: Serialization is expensive

Example:
public record OptimizedLocation (int startLon, int startLat, int endLon, int endLat)
{}

DataStream< OptimizedLocation > s2 = ...

Further reading:
11 22 88 99 16 bytes “Flink SerializationTuning
Vol. 1: Choosing your
→ 7.5x reduction in data Serializer — if you can”
Fewer object allocations = less CPU cycles https://flink.apache.org/news/2020/04/15/flink-ser
ialization-tuning-vol-1.html

Disclaimer: The actual binary representation used by Kryo might differ, this is for demonstration purposes only
#2 Mistake: Flink doesn’t always need to be
distributed
- Flink’s MiniCluster allows you to spin up a full-fledged Flink cluster
with everything known from distributed clusters (Rocksdb,
checkpointing, the web UI, SQL, …)

var clusterConfig = new MiniClusterConfiguration.Builder()


.setNumTaskManagers( 1)
.setNumSlotsPerTaskManager( 1)
.build();
var cluster = new MiniCluster(clusterConfig);
cluster.start();
var clusterAddress = cluster.getRestAddress().get();

var env = new RemoteStreamEnvironment(clusterAddress.getHost(),


clusterAddress.getPort());
#2 Mistake: Flink doesn’t always need to be
distributed
- Use-cases
- Local debugging and performance profiling: Step through the code as it
executes, sample most frequently used code paths
- Testing: make sure your Flink jobs work in end to end tests (together with
Kafka’s MiniCluster, minio as an S3 replacement). Check out
https://www.testcontainers.org/
- Processing small streams efficiently
#3 Advice: Deploy one job per cluster, use
standalone mode
… unless you have a good reason to do something else.
- Flink’s deployment options might seem confusing. Here’s a simple framework to think about it:
- Flink has 3 execution modes
- Session mode
- Per-job mode
- Application Mode (preferred)
- Flink has 2 deployment models
- Integrated (active): Native K8s, YARN, (Mesos)
- Flink requests resources from the resource manager as needed
- Standalone (passive): well suited for K8s, bare metal, local deployment, DIY
- Resources are provided to Flink from the outside world
#3 Execution Modes

Session Mode Application Mode Per-Job Mode

Multiple Jobs share a One Job per JobManager, One Job per JobManager,
JobManager planned on the JobManager planned outside the JobManager

JobManager JobManager JobManager


Job1

Job1 Job2 Job3 Recommended Job1


as default
#3 Deployment Options

Passive Deployment Active Deployment


Flink resources managed externally Flink actively manages resources
(“Standalone mode”)

→ Flink talks to a resource manager


→ “a bunch of JVMs” Implementations: Native Kubernetes,
Deployed on bare metal, Docker, Kubernetes YARN

Pros / Cons: Pros / cons:


+ Reactive Mode (“autoscaling”) + Automatically restarts failed resources
+ DIY scenarios + Allocates only required resources
+ Fast deployments - Requires a lot of K8s permissions
- Restart
#4 Mistake: Inappropriate Cluster sizing

- Mistake: Under or over-provisioning of clusters for a given workload


- Understand the amount of data you have incoming and outgoing
- How much network bandwidth do you have? How much throughput
does your Kafka have?
- Understand the amount of state you’ll need in Flink
- Which state backend do you use?
- How much memory / disk space do you have (per instance, in your
cluster) available?
- How fast is your connection to your state backup (e.g. S3)? This will
give you a baseline for the checkpointing times
Solution: Proper cluster sizing

- Do a back of the napkin calculation of your use-case in your environment


- … assuming normal operation (“baseline”). Include a buffer for spiky loads
(failure recovery, …)
Example: Proper cluster sizing

● Data: ● Hardware:
○ Message size: 2 KB ○ 5 machines, each running a TaskManager
○ Throughput: 1,000,000 msg/sec
○ Distinct keys: 500,000,000
(aggregation in window: 4 longs per key)
○ Checkpoint every minute

Sliding
Kafka keyBy Window Kafka
Source userId 5m size Sink
1m slide

RocksDB
Example: A machine’s perspective

TaskManager n
Kafka Source 400MB/s / 5 receivers =
Kafka: 400 MB/s
80MB/s
2 KB * 1,000,000 = 2GB/s 1 receiver is local, 4 remote:
2GB/s / 5 machines = 400 MB/s keyBy 4 * 80 = 320 MB/s out

80 MB/s
Shuffle: 320 MB/s
window
Shuffle: 320 MB/s

Kafka Sink
Kafka: 67 MB/s
Excursion: State & Checkpointing

How much state are we checkpointing?

per machine: 40 bytes * 5 windows * 100,000,000 keys = 20 GB


We checkpoint every minute, so: 20 GB / 60 seconds = 333 MB/s

How is the Window operator accessing state on disk?

For each key-value access, we need to retrieve 40 bytes from disk, update the
aggregates and put 40 bytes back

per machine: 40 bytes * 5 windows * 200,000 msg/sec = 40 MB/s


Example: A machine’s perspective

TaskManager n
Kafka Source
Kafka: 400 MB/s
Shuffle: 320 MB/s
keyBy
Shuffle: 320 MB/s Kafka: 67 MB/s

80 MB/s
window
Checkpoints: 333 MB/s
Kafka Sink
Total In: 720 MB/s Total Out: 720 MB/s
Cluster sizing: Conclusion

- This was just a “back of the napkin” approximation! Real world results will
differ!
- Ignored network factors
- Protocol overheads (Ethernet, IP, TCP, …)
- RPC (Flink‘s own RPC, Kafka, checkpoint store)
- Checkpointing causes network bursts
- A window emission causes bursts
- Other systems using the network
- CPU, memory, disk access speed have not been considered
#5 Advice: Ask for Help!

- Most problems have been solved already online


- Official, old-school way: [email protected] mailing list
- Indexed by Google, searchable through https://lists.apache.org/
- Stack Overflow: the apache-flink tag has 6300 questions!
- Apache Flink Slack instance
- Global meetup communities, Flink Forward (w/ training)
Any Flink deployment
& ops related
questions?
Get Started with Decodable
● Visit http://decodable.co
● Start Free http://app.decodable.co
● Read the docs http://docs.decodable.co
● Watch demos on our YouTube Channel
● Join our community Slack channel
● Join us for future Demo Days and
Webinars!
Thank you.
Build real-time data apps &
services. Fast.

decodable.co 2022

You might also like