0% found this document useful (0 votes)
4 views

Module 11 Kafka - Distributed Message Subscription System

The document provides an overview of Kafka, a high-throughput distributed messaging system, detailing its architecture, application scenarios, and key processes. It covers concepts such as message persistence, partitioning, and data reliability, as well as the roles of producers and consumers in data handling. Additionally, it outlines Kafka's message delivery semantics and log management strategies.

Uploaded by

Lucas Oliveira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Module 11 Kafka - Distributed Message Subscription System

The document provides an overview of Kafka, a high-throughput distributed messaging system, detailing its architecture, application scenarios, and key processes. It covers concepts such as message persistence, partitioning, and data reliability, as well as the roles of producers and consumers in data handling. Additionally, it outlines Kafka's message delivery semantics and log management strategies.

Uploaded by

Lucas Oliveira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Technical Principles of

Kafka

www.huawei.com

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved.


Objectives
 Upon completion of this course, you will be able to know:
 Basic concepts and application scenarios of Kafka
 System architecture of Kafka
 Key processes of Kafka

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 2
Contents
1. Introduction to Kafka

2. Architecture and Functions of Kafka

3. Key Processes of Kafka

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 3
Kafka Overview
 Definition of Kafka: Kafka is a high-throughput, distributed, and
publishing-subscription messaging system. A large messaging
system can be established on low-cost servers with Kafka
technology.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 4
Kafka Overview
 Application scenarios
 Compared with other components, Kafka features message persistence, high throughput,
distributed processing and real-time processing. It applies to online and offline message
consumption and massive data collection scenarios, such as website active tracking, operation
data monitoring of the aggregation statistics system, and log collection, etc.

Frontend Backend
Producer Producer

Storm
Flume Kafka

Spark

Hadoop Farmer

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 5
Position of Kafka in FusionInsight
Application service layer
OpenAPI/SDK REST/SNMP/Syslog

Data Information Knowledge Wisdom


DataFarm Porter Miner Farmer Manager
System
management
Hadoop API Plugin API
Service
governance
M/R Hive Kafka Spark Streaming Solr
Hadoop LibrA
YARN/Zookeeper Security
management
HDFS/HBase

Kafka is a distributed messaging system that supports online and offline


message processing and provides Java APIs for other components.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 6
Contents
1. Introduction to Kafka

2. Architecture and Functions of Kafka

3. Key Processes of Kafka

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Kafka Topology

(Producer) Front End Front End Front End Service

(Push) (Push) (Push) (Push)

(Kafka)
Broker Broker Broker ZooKeeper
Zookeeper
Zookeeper

(Pull) (Pull) (Pull) (Pull)

Hadoop Real-time Other Data


(Consumer)
Cluster Monitoring Service Warehouse

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 8
Kafka Topics

Consumer group 1
A consumer uses offsets to record and
Consumer group 2
read location information.
Kafka cleans up old messages
based on the time and size.

Kafka topic

。。。 new

Older msgs Newer msgs Producer 1

Producer 2

...

Producer N

Producer appends messages at the


end of a topic.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 9
Kafka Partition
 Each topic contains one or more partitions. Each partition is an
ordered and immutable sequence of messages. Partitions ensure high
throughput capabilities of Kafka.

Partition 0 0 1 2 3 4 5 6 7 8 9 10 11 12

Partition 1 0 1 2 3 4 5 6 7 8 9 Writes

Partition 2 0 1 2 3 4 5 6 7 8 9 10 11 12

Old New

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 10
Kafka Partition
 Consumer group A has two consumers to read data from four partitions

 Consumer group B has four consumers to read data from four partitions.

Kafka Cluster
Server 1 Server 2

P0 P3 P1 P2

C1 C2 C3 C4 C5 C6

Consumer group A Consumer group B

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 11
Kafka Partition Offset
 The location of a message in a log file is called offset, which is a long integer
that uniquely identifies a message. Consumers use offsets, partitions, and
topics to track records.

Consumer
group C1

Partition 0 0 1 2 3 4 5 6 7 8 9 10 11 12

Partition 1 0 1 2 3 4 5 6 7 8 9 Writes

Partition 2 0 1 2 3 4 5 6 7 8 9 10 11 12

Old New

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 12
Kafka Partition Replica (1)

Kafka Cluster

Broker 1 Broker 2 Broker 3 Broker 4

Partition-0 Partition-1 Partition-2 Partition-3

Partition-3 Partition-0 Partition-1 Partition-2

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 13
Kafka Partition Replica (2)
Follower->Leader
Pulls data

ReplicaFetcherThread

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7

writes
old new old new

Leader Partition Follower Partition

ack

Producer

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 14
Kafka Partition Replica (3)
ReplicaFetherThread
Broker 1 Broker 2 Broker 3
Leader Follower Follower
Partition-0 Partition-0 Partition-0

Leader Follower Follower


Partition-1 Partition-1 Partition-1

… … …

ReplicaFetherThread

ReplicaFetherThread-1
Broker 1 Broker 2 Broker 3
Leader
Leader Follower
Partiton-0
Partition-1 Partition-0

… Follower

Partition-1

ReplicaFetherThread-2

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 15
Kafka Logs (1)
 A large file in a partition is split into multiple small segments. These
segments facilitate periodical clearing or deletion of consumed files to
reduce disk usage.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 16
Kafka Logs (2)

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 17
Kafka Logs (3)

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 18
Kafka Log Cleanup (1)
 Log cleanup modes: delete and compact.

 Threshold for deleting logs: retention time limit and size of all logs in
a partition.

Default
Parameter Description Range
Value
Delete or
log.cleanup.policy delete Outdated log cleanup policy.
compact
Maximum retention time of log
log.retention.hours 168 1 ~ 2147483647
files. Unit: hour.

Maximum size of log data in a -1 ~


log.retention.bytes -1 partition. By default, the value is 9223372036854
not restricted. Unit: byte. 775807

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Kafka Log Cleanup (2)

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 21
Kafka Data Reliability
 All Kafka messages are stored in hard disks and topic partition
replication is performed to ensure data reliability.
 How data reliability is ensured during message delivery?

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 22
Message Delivery Semantics
 There are three data delivery modes:
 At Most Once
 Messages may be lost.

 Messages are never redelivered or reprocessed.

 At Lease Once
 Messages are never lost.

 Messages may be redelivered and reprocessed.

 Exactly Once
 Messages are never lost.

 Messages are processed only once.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 23
Kafka Message Delivery
 Messages are delivered in different modes to ensure reliability in different application
scenarios.
Asynchronou
Asynchronou
Synchronous Asynchronous s delivery
Synchronous s delivery
delivery delivery with
delivery with with
without without confirmation
confirmation confirmation
confirmation confirmation but no
and retries
retries

At most once At least once


No replicas At most once At least once At least once

Synchronous
replication At most once At least once
At least once At most once At least once
(leader and
followers)

Asynchronous Messages may Messages may Messages may


replication At most once be lost or At most once be lost or be lost or
(leader) repeated. repeated. repeated.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 24
Kafka Cluster Mirroring

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 25
Contents
1. Introduction to Kafka

2. Architecture and Functions of Kafka

3. Key Processes of Kafka


 Kafka Write Process
 Kafka Read Process

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 26
Write Data by Producer

Data Create
Data Message

Publish
Producer Message

Message

Kafka Cluster

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 27
Contents
1. Introduction to Kafka

2. Architecture and Functions of Kafka

3. Key Processes of Kafka


 Kafka Write Process
 Kafka Read Process

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 28
Read Data by Consumer
 Overall process:
Process
Data
 A consumer connects to Message

the leader broker where Subscribe


Message
the specified topic Consumer

partition is located and


pulls messages from
Kafka logs. Message

Kafka Cluster

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 29
Summary
 This module describes the following information about Kafka:
basic concepts and application scenarios, system architecture
and key processes.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 30
Quiz
1. Which of the following are features of Kafka? ( )

A. High throughput

B. Distributed

C. Data persistence

D. Random message read

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 31
Quiz
2. What is the component that Kafka directly depends on for running?

( )

A. HDFS

B. ZooKeeper

C. HBase

D. Spark

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 32
Quiz
1. How is Kafka data reliability ensured?

2. What operations can the shell commands provided by the Kafka


client be used to perform on the topics?

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 33
More Information
 Training materials:
 http://support.huawei.com/learning/Certificate!showCertificate?lang=en&pbiPath=term100002
5450&id=Node1000011796
 Exam outline:
 http://support.huawei.com/learning/Certificate!toExamOutlineDetail?lang=en&nodeId=Node10
00011797
 Mock exam:
 http://support.huawei.com/learning/Certificate!toSimExamDetail?lang=en&nodeId=Node10000
11798
 Authentication process:
 http://support.huawei.com/learning/NavigationAction!createNavi#navi[id]=_40

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 34
Thank You
www.huawei.com

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 35

You might also like