0% found this document useful (0 votes)
26 views55 pages

Kubernetes and Real Time World Analytics Albert Lewandowski

Uploaded by

chenzexin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views55 pages

Kubernetes and Real Time World Analytics Albert Lewandowski

Uploaded by

chenzexin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Kubernetes and

real-time analytics
How to connect these two worlds with
Apache Flink?

Author: Albert Lewandowski


About me

● Big Data DevOps Engineer - GetInData


● Focused on infrastructure, cloud, Big Data, AI, scalable
web applications
● Certified Google Cloud Architect
● Certified Kubernetes Administrator

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Content

● Principles in the Big Data world on Kubernetes


● Why real-time data streaming?
● Different faces of Apache Flink.
● Flink and Kubernetes - real life scenarios.
● Observability of the platform.
● Quick start on your computer.

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Introduction to the
jungle
What is Kubernetes?

Open-source platform for managing


containerized workloads and services

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Kubernetes - Operators

Method of deploying and managing app

Automated provisioning of resources

One setup for multiple environments

Examples: pulsar-operator, postgres-operator, prometheus-operator

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Kubernetes - Custom Resource Definitions

Defining custom APIs as add-ons

Dynamic registration with Kubernetes API

CRDs can be accessed with kubectl

A CRD represents the desired state and an operator makes it


happen.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What is Apache Flink?

Flink is an open-source stream processing framework that


supports both batch processing and data streaming programs

State of the Flink job


A savepoint is a consistent image of the execution state of a streaming job

Flink’s Savepoints are different from Checkpoints in a similar way that


backups are different from recovery logs in traditional database systems.

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What is Apache Flink?

Job Diagram State of Flink job


Diagram

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Apache Flink Roadmap

Source: Roadmap - Apache Flink

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Perception

Idempotency
CI/CD

Business Data Ingestion


Monitoring
Serving
Reprocessing

Infrastructure

logic Explainability
Security Testing

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Reality

Idempotency
CI/CD

Data Ingestion
Monitoring Reprocessing
Business logic Serving
Infrastructure

Explainability
Security Testing

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Real-time
data streaming
Data Streaming vs. Batch

Batch

Events

1 2 3 4 5 6

Stream

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Use cases

Location data
User activity

Logistics
Fraud detection

Recommendations
Industrial IoT

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Use case - Kcell - Telecom company in Kazakhstan

2018 10M
Subscribers
165K 22.5
Events / s
TB / month

2020 10M
Subscribers
500K
Events / second
40
TB / month

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Use case - Kcell

Input Process Actions

SMS events
Voice usage events
Data usage events
Roaming events
Location events

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Use case - Kcell - some scenarios for Flink

Balance Top Up Case


If subscriber top-ups her balance too often in short period of time. We
can offer her a less expensive tariff or auto-payment services.

Fraud case in roaming


Send an email to the anti-fraud unit if subscriber registered in roaming
but his balance at the moment is equal to 0.
This situation is impossible in standard case.

Automatic SIM card activation


Send an email to the anti-fraud unit if subscriber registered in roaming
but his balance at the moment is equal to 0.
This situation is impossible in standard case.

Dealer Motivation Case


Trigger bonus for a dealer when we discover that purchase happened
attributable to him/her.

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Apache Flink
One tool, multiple versions
One tool, multiple languages

Java 8 or 11

Scala 2.11 or 2.12

SQL

Python

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Where should I install?

YARN cluster Kubernetes Standalone

● CICD process
● Service Discovery - monitoring with Prometheus
● Scalability
● Managing resources
● A/B Testing

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
High Availability of Flink

Storage level JobManager level Job Strategy

● High Availability of ● ZooKeeper ● Data reprocessing


storage to/from ● Kubernetes (beta) policy
which Flink ● How to deploy new
writes/reads job?
savepoints and
checkpoints
● Performance of
storage

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Kubernetes Operator for Native Kubernetes -
Flink K8S Operator Ververica Platform
Apache Flink Apache Flink

CRDs Yes Yes No No

CICD Kubernetes API Kubernetes API REST API or Web UI Kubernetes API

Helm chart or raw Helm chart or raw Helm chart or raw No need to install any
Installation
Kubernetes manifests Kubernetes manifests Kubernetes manifests component

SQL Editor No No Yes No

Persistence volume for


database
Dependencies No No No
Object storage for
artifactory

Status beta beta production beta


© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Flink + Kubernetes = ?
Overview in the article here
Why Flink on Kubernetes?

Simpler deployment process Flexible jobs management

Simple Service Discovery -


Flexible testing
Prometheus

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Installation & Configuration

Helm CICD tool


A package manager Example: Gitlab CI
for Kubernetes

Kubernetes API

Flink jobs
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Testing
savepoi
nt
Dedicated TaskManagers

Production Standard
job output
Incubating Mode Production Dedicated TaskManagers
data Job
Separated
Incubating
output
mode

Flink Job
Result #1
#1
Blue Green
Proxy
Deployment
Flink Job
Result #2
#2

A/B Testing

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Deployment process

Git Flow

Unit & Integration tests

Versioning images

Deployment process

Monitoring

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Kubernetes aspects

Resources
Dedicated namespaces

Network performance

Secured access to Flink (RBAC) Configuration files

Storage for Secrets


savepoints&checkpoints

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Self-healing and autoscaling

Flink restarts
Scale based
on metrics

External tool
for fixing

Automate
manual tasks
Re-create
cluster

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Job Cluster & Session Cluster

Job Cluster Session Cluster

Full set of Flink cluster for Standalone Flink cluster on


each individual job Kubernetes

Separate images for


Long running tasks Short running tasks Ad-hoc queries
different jobs

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Stories from production

● Automate in the beginning


● CICD pipeline is a must
● Verify JVM metrics
● Test different Flink configurations to get the best
performance and no restarts
● Secure access to Flink jobs
● Get logs from Flink TMs and JMs

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Local Setup
How to start locally?

● Minikube / Docker Desktop or any different local K8s env


● Ververica Platform
● Locally started Kafka cluster or use a Datagen

APACHE FLINK
KUBERNETES
STREAMING SQL

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Observability
Whitepaper - here
Observability

Observability is about measuring how well internal states of the


system can be inferred from knowledge of its external outputs
(according to the control theory).

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Part One: Metrics

Get metrics from environment and application - but how?

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Prometheus - Kubernetes-native solution

open-source systems
monitoring and alerting toolkit
joined the Cloud Native Computing
Foundation in 2016 as the second
hosted project, after Kubernetes
a lot of exporters
you can write your own easily

mature ecosystem
PushGateway, Blackbox, AlertManager, etc.

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Prometheus - simple or complex High Availability?

Simple Complex

Example solutions: Cortex (above), Thanos, M3DB

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Pull vs. push-based monitoring

Pull Push

Collector takes metrics Agents push metrics

Workload on central poller increases with the number of Polling task fully distributed among agents, resulting in
devices polled. linear scalability.

Polling protocol can potentially open up system to Push agents are inherently secure against remote
remote access and denial of service attacks. attacks since they do not listen for network connections.

Relatively inflexible: pre-determined, fixed set of


Flexible: poller can ask for any metric at any time.
measurements are periodically exported.

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Prometheus - Stories

service discovery
simple on k8s
limited security

archived data
how old data is required?

monitor monitoring

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Part Two: Logs analytics

1. Get logs from app or environment.


2. Save logs.
3. Query them.
4. Make your system self-healing and discover what’s
happening inside your platform.

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Logs analytics - which tool should I choose?

Logs Analytics for Developers Logs Analytics for Business

Loki ElasticSearch

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
ELK vs. Loki

ELK Loki + Promtail/Fluentd

Indexing Keys and content of each key Only labels

Query language Query DSL or Lucene QL LogQL

Tool for data visualisation Kibana Grafana

Query performances Faster due to indexed all the data Slower due to indexing only labels

Resource requirements Higher due to the need of indexing Lower due to index only labels

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What about alerts?

Alerts signify that


a human needs to take action
immediately
in response to something that is
either happening or about to
happen, in order to improve the
situation.

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Quick start
Flink - Complex Event Processing

Article.
Codebase for example.

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
DevOps best practises

Article.

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Kubernetes - first setup

● Minikube
● Kind
● Use Kubernetes service from public cloud provider like
AWS, GCP, Azure during free tier

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Kubernetes + Flink - Operator

Requirements: Kubernetes cluster, kubectl

$ kubectl create -f https://raw.githubusercontent.com/lyft/flinkk8soperator/v0.5.0/deploy/crd.yaml


$ kubectl create -f https://raw.githubusercontent.com/lyft/flinkk8soperator/v0.5.0/deploy/namespace.yaml
$ kubectl create -f https://raw.githubusercontent.com/lyft/flinkk8soperator/v0.5.0/deploy/role.yaml
$ kubectl create -f https://raw.githubusercontent.com/lyft/flinkk8soperator/v0.5.0/deploy/role-binding.yaml
$ kubectl create -f
https://raw.githubusercontent.com/lyft/flinkk8soperator/v0.5.0/deploy/flinkk8soperator.yaml

Verify if it works:
$ kubectl -n flink-operator get po

Run the example job:


$ kubectl create -f
https://raw.githubusercontent.com/lyft/flinkk8soperator/v0.5.0/examples/wordcount/flink-operator-custom-resou
rce.yaml

Verify if it is running and its status:


$ kubectl get flinkapplication.flink.k8s.io -n flink-operator wordcount-operator-example -o yaml

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Kubernetes + Ververica Platform

Requirements: Kubernetes cluster, kubectl, Helm


Install Ververica Platform locally with Helm
$ helm repo add ververica https://charts.ververica.com
$ helm install vvp ververica/ververica-platform
$ helm install vvp ververica/ververica-platform --set acceptCommunityEditionLicense=true

Verify if Ververica is up
$ kubectl get po

Access the web user interface and REST API


$ kubectl port-forward service/vvp-ververica-platform 8080:80

Do you want to test Flink SQL feature? Use Flink Faker (a data generator source connector)
https://github.com/knaufk/flink-faker/
It requires changing used image for vvp-gateway.

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Join Us!

Data Engineer
Spark, Kafka, Airflow, public cloud
Link

Backend Engineer
Java / Scala, microservices
Link

MLOps Engineer
MLOps tools, Python, public cloud
Link

DevOps / SRE
GCP, Terraform, Prometheus
Link

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Q&A

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Contact details

[email protected]

LinkedIn:
https://www.linkedin.com/in/albert-lewandowski

© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Thank you for your
attention!

You might also like