Module 11 Kafka - Distributed Message Subscription System
Module 11 Kafka - Distributed Message Subscription System
Kafka
www.huawei.com
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 2
Contents
1. Introduction to Kafka
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 3
Kafka Overview
Definition of Kafka: Kafka is a high-throughput, distributed, and
publishing-subscription messaging system. A large messaging
system can be established on low-cost servers with Kafka
technology.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 4
Kafka Overview
Application scenarios
Compared with other components, Kafka features message persistence, high throughput,
distributed processing and real-time processing. It applies to online and offline message
consumption and massive data collection scenarios, such as website active tracking, operation
data monitoring of the aggregation statistics system, and log collection, etc.
Frontend Backend
Producer Producer
Storm
Flume Kafka
Spark
Hadoop Farmer
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 5
Position of Kafka in FusionInsight
Application service layer
OpenAPI/SDK REST/SNMP/Syslog
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 6
Contents
1. Introduction to Kafka
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Kafka Topology
(Kafka)
Broker Broker Broker ZooKeeper
Zookeeper
Zookeeper
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 8
Kafka Topics
Consumer group 1
A consumer uses offsets to record and
Consumer group 2
read location information.
Kafka cleans up old messages
based on the time and size.
Kafka topic
。。。 new
Producer 2
...
Producer N
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 9
Kafka Partition
Each topic contains one or more partitions. Each partition is an
ordered and immutable sequence of messages. Partitions ensure high
throughput capabilities of Kafka.
Partition 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Partition 1 0 1 2 3 4 5 6 7 8 9 Writes
Partition 2 0 1 2 3 4 5 6 7 8 9 10 11 12
Old New
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 10
Kafka Partition
Consumer group A has two consumers to read data from four partitions
Consumer group B has four consumers to read data from four partitions.
Kafka Cluster
Server 1 Server 2
P0 P3 P1 P2
C1 C2 C3 C4 C5 C6
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 11
Kafka Partition Offset
The location of a message in a log file is called offset, which is a long integer
that uniquely identifies a message. Consumers use offsets, partitions, and
topics to track records.
Consumer
group C1
Partition 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Partition 1 0 1 2 3 4 5 6 7 8 9 Writes
Partition 2 0 1 2 3 4 5 6 7 8 9 10 11 12
Old New
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 12
Kafka Partition Replica (1)
Kafka Cluster
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 13
Kafka Partition Replica (2)
Follower->Leader
Pulls data
ReplicaFetcherThread
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7
writes
old new old new
ack
Producer
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 14
Kafka Partition Replica (3)
ReplicaFetherThread
Broker 1 Broker 2 Broker 3
Leader Follower Follower
Partition-0 Partition-0 Partition-0
… … …
ReplicaFetherThread
ReplicaFetherThread-1
Broker 1 Broker 2 Broker 3
Leader
Leader Follower
Partiton-0
Partition-1 Partition-0
… Follower
…
Partition-1
ReplicaFetherThread-2
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 15
Kafka Logs (1)
A large file in a partition is split into multiple small segments. These
segments facilitate periodical clearing or deletion of consumed files to
reduce disk usage.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 16
Kafka Logs (2)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 17
Kafka Logs (3)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 18
Kafka Log Cleanup (1)
Log cleanup modes: delete and compact.
Threshold for deleting logs: retention time limit and size of all logs in
a partition.
Default
Parameter Description Range
Value
Delete or
log.cleanup.policy delete Outdated log cleanup policy.
compact
Maximum retention time of log
log.retention.hours 168 1 ~ 2147483647
files. Unit: hour.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Kafka Log Cleanup (2)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 21
Kafka Data Reliability
All Kafka messages are stored in hard disks and topic partition
replication is performed to ensure data reliability.
How data reliability is ensured during message delivery?
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 22
Message Delivery Semantics
There are three data delivery modes:
At Most Once
Messages may be lost.
At Lease Once
Messages are never lost.
Exactly Once
Messages are never lost.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 23
Kafka Message Delivery
Messages are delivered in different modes to ensure reliability in different application
scenarios.
Asynchronou
Asynchronou
Synchronous Asynchronous s delivery
Synchronous s delivery
delivery delivery with
delivery with with
without without confirmation
confirmation confirmation
confirmation confirmation but no
and retries
retries
Synchronous
replication At most once At least once
At least once At most once At least once
(leader and
followers)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 24
Kafka Cluster Mirroring
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 25
Contents
1. Introduction to Kafka
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 26
Write Data by Producer
Data Create
Data Message
Publish
Producer Message
Message
Kafka Cluster
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 27
Contents
1. Introduction to Kafka
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 28
Read Data by Consumer
Overall process:
Process
Data
A consumer connects to Message
Kafka Cluster
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 29
Summary
This module describes the following information about Kafka:
basic concepts and application scenarios, system architecture
and key processes.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 30
Quiz
1. Which of the following are features of Kafka? ( )
A. High throughput
B. Distributed
C. Data persistence
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 31
Quiz
2. What is the component that Kafka directly depends on for running?
( )
A. HDFS
B. ZooKeeper
C. HBase
D. Spark
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 32
Quiz
1. How is Kafka data reliability ensured?
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 33
More Information
Training materials:
http://support.huawei.com/learning/Certificate!showCertificate?lang=en&pbiPath=term100002
5450&id=Node1000011796
Exam outline:
http://support.huawei.com/learning/Certificate!toExamOutlineDetail?lang=en&nodeId=Node10
00011797
Mock exam:
http://support.huawei.com/learning/Certificate!toSimExamDetail?lang=en&nodeId=Node10000
11798
Authentication process:
http://support.huawei.com/learning/NavigationAction!createNavi#navi[id]=_40
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 34
Thank You
www.huawei.com
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 35