0% found this document useful (0 votes)
15 views35 pages

Apache-Kafka Bernhard-H Oss 2018

The document provides an overview of Apache Kafka, a distributed messaging system optimized for writing, developed by LinkedIn and open-sourced in 2011. It discusses Kafka's architecture, use cases, and features such as topics, consumer groups, and replication, as well as its comparison to traditional messaging systems. Additionally, it highlights the importance of Kafka in various industries and offers insights into installation and configuration using Docker and Ansible.

Uploaded by

drivesankofa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views35 pages

Apache-Kafka Bernhard-H Oss 2018

The document provides an overview of Apache Kafka, a distributed messaging system optimized for writing, developed by LinkedIn and open-sourced in 2011. It discusses Kafka's architecture, use cases, and features such as topics, consumer groups, and replication, as well as its comparison to traditional messaging systems. Additionally, it highlights the importance of Kafka in various industries and offers insights into installation and configuration using Docker and Ansible.

Uploaded by

drivesankofa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Apache Kafka...

...”a system optimized for writing”

Bernhard Hopfenmüller

23. Oktober 2018


whoami

Bernhard Hopfenmüller
IT Consultant @ ATIX AG

IRC: Fobhep
github.com/Fobhep

#atix #ossummit
whoarewe

The Linux & Open Source Company


Unterschleißheim @ München

over 15 years
datacenter automation, Linux
Consulting, Engineering, Support,
Training

#atix #ossummit
Kafka

Quora.com
What is the relation between Kafka, the writer, and Apache Kaf-
ka, the distributed messaging system?

Jay Kreps: I thought that since Kafka was a system optimized for
writing using a writer’s name would make sense. I had taken a lot
of lit classes in colleague and liked Franz Kafka. Plus the name
sounded cool for an OS project

#atix #ossummit
I developed by LinkedIn, Open Source since 2011

I 2014 foundation of Confluent

#atix #ossummit
Messaging-Systems

Why do we need a messaging system?

#atix #ossummit
Messaging-Systems

Why do we need a messaging system?

I Challenge 1: Sender not available


I Challenge 2: Sending too much
(DoS)
I Challenge 3: Receiver crash upon
processing

#atix #ossummit
Queues vs Topics

Supermarket vs Television Source[1]

Supermarket Wait until it’s your turn Television Choose what you want to
receive

#atix #ossummit
Kafka-Basic structure

#atix #ossummit
Use Cases

I Messaging (ActiveMQ or RabbitMQ)


I Website Activity Tracking
I Metrics
I Log Aggregation
I Stream Processing
I Apache Storm and Apache Samza.
I Commit Log

#atix #ossummit
Topics I

I core component of Kafka


I is filled by producer
I consists of one or more partitions

#atix #ossummit
Topics II

I producer can choose partition


I partition has running offset
I message is identified by offset

#atix #ossummit
Topics III

I messages are stored physically!


I key-value principle
I Clean-Up policies:

#atix #ossummit
Topics IV

I Clean-Up policies:
I default: Retention-time
(delete old data after x days)
I Retention-size
(delete old data if data
memory > x)

#atix #ossummit
Topics V

I Clean-Up policies:
I default: Retention-time
(delete old data after x days)
I Retention-size
(delete old data if data
memory > x)
I Log-Compaction
(replace old value to key with
new)

#atix #ossummit
Topic consumption

I topics are pulled! (no DoS)


I any existing data can be pulled

#atix #ossummit
Consumer Groups

I parallelism allows
high throughput
I never more consumers
than partitions
I Kafka features exactly-
once-semantics!

#atix #ossummit
Wait but who knows what’s read?

I Consumer
commit their
offset
I Upon failure
re-processing
possible

#atix #ossummit
Replication
implemented on partition level

Source[3]

#atix #ossummit
In and Out of Sync Replica

#atix #ossummit
Did somebody hear my message?

Producer decides if message was successfully sent


Configuration possibilities:
I as soon as sent
I as soon as received by first broker
I as soon as desired number of replica exist

#atix #ossummit
ZooKeeper

I distributed, hierachical file system


I management of znodes()
I HA via ensemble (=ZooKeeper
cluster)

Source[4]

#atix #ossummit
Broker and ZooKeeper

I Brokers are stateless!


I Which Broker is alive?
I Broker communication?
I → ZooKeeper!

#atix #ossummit
Talk to Kafka - Kafka Connect

I I/O for Kakfa


I Connect with external
systems
I Open Source by
Confluent

Source[7]

#atix #ossummit
Talk to Kafka - Schema Registry

I define standards
I version and store them
I Open Source by
Confluent

source: confluent

#atix #ossummit
TV or Netflix?

I live filtering of topics


I KSQL!
I Open Source by
Confluent

source: confluent

#atix #ossummit
Who likes Kafka?

I zalando - microservices
I Cisco Systems - security
I Airbnb - event pipeline
I Netflix (Monitoring!)
I The New York Times ( Kafka as data storage! Super awesome blog
post) [5][6]
I Audi - IoT
I Spotify
I Twitter
I Uber (Kafka = Backbone!!!)
I https://kafka.apache.org/powered-by

#atix #ossummit
Sources

1 https://www.informatik-aktuell.de/betrieb/verfuegbarkeit/apache-
kafka-eine-schluesselplattform-fuer-hochskalierbare-systeme.html
2 https://thecattlecrew.net/2017/09/28/apache-kafka-im-detail-teil-
1/ and
https://thecattlecrew.net/2017/09/28/apache-kafka-im-detail-teil-
2/
3 https://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-
in-operational-simplicity/
4 https://www.infoq.com/articles/apache-kafka
5 https://www.confluent.io/blog/okay-store-data-apache-kafka/
6 https://www.confluent.io/blog/publishing-apache-kafka-new-york-
times/
7 https://www.confluent.io/blog/simplest-useful-kafka-connect-data-
pipeline-world-thereabouts-part-1/

#atix #ossummit
Install Kafka with Docker/Ansible

I Run containers as services


I No SSL/SASL yet!
I have a look at playbooks and docker-compose files
I https://github.com/confluentinc/cp-ansible
I https://docs.confluent.io/current/installation/docker/docs/installa-
tion/index.html
I Wurstmeister: https://github.com/wurstmeister/kafka-docker

#atix #ossummit
Single Components
---
- name: Start zookeeper
docker_container:
name: zookeeper
image: "{{ images.zookeeper }}:{{ versions.kafka }}"
state: started
restart_policy: unless-stopped
ports:
- "{{ ports.zookeeper.client }}:2181"
- "{{ ports.zookeeper.peer }}:2888"
- "{{ ports.zookeeper.leader }}:2181"
volumes:
- "/zookeeper/data:/var/lib/zookeeper/data"
- "/zookeeper/log:/var/lib/zookeeper/log"
env:
ZOOKEEPER_SERVER_ID: "{{ zookeeper_server_id }}"
ZOOKEEPER_CLIENT_PORT: "2181"
ZOOKEEPER_SERVERS: "{{ lookup('template', 'sort_zookeeper.j2') }}"
ZOOKEEPER_DATA_DIR: "/var/lib/zookeeper/data"
ZOOKEEPER_LOG_DIR: "/var/lib/zookeeper/log"
#atix #ossummit ...
{% for host in groups['zookeeper'] %}
{% if inventory_hostname == hostvars[host]['inventory_hostna
0.0.0.0
{% else %}
{{ hostvars[host]['ansible_default_ipv4']['address'] }}
{% endif %}
{% if not index_loop.last %}
;
{% endif %}
{% endfor %}

#atix #ossummit
Check system health

---
- name : "Check Zookeeper Health"
command : docker run --rm -it confluentinc/zookeeper cub zk-re
register : output
until: output is success
retries: 3
...

#atix #ossummit
Configure via REST/uri

---
- name: create new topic
command: "{{ 'sudo docker run --rm confluentinc/cp-kafka
kafka-topics --create' ... }}"

- name: get information of current topic


uri:
url: "{{ restproxy_url ~ /topics/' + topic.name }}"
register: result

...

#atix #ossummit
whoami

Bernhard Hopfenmüller

IRC: Fobhep
github.com/Fobhep twitter.com/fobhep

#atix #ossummit
Kafka vs MQ

I Kafka has no P2P model!


I Messages are Persistent!
I Topic Partitioning!
I Message Sequencing: for one partition (send order=received order)
I Message reading: Choose where to read, Rewind, no FIFO!
I Loadbalancing: automatic distribution easier with metadata
I HA and failover implemented very easily

#atix #ossummit

You might also like