0% found this document useful (0 votes)
18 views

Kafka Learning

Kafka has three delivery methods: at least once, at most once, and exactly once. It discusses topics, partitions, producers, consumers, brokers and clusters. The document also covers schema, good practices for replicas, consumer offsets, stream processing, line readers with Avro, MapReduce, and broker/topic configurations.

Uploaded by

alexmecner
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Kafka Learning

Kafka has three delivery methods: at least once, at most once, and exactly once. It discusses topics, partitions, producers, consumers, brokers and clusters. The document also covers schema, good practices for replicas, consumer offsets, stream processing, line readers with Avro, MapReduce, and broker/topic configurations.

Uploaded by

alexmecner
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 4

Kafka has 3 different delivery methods

- Atleast Once: producer can send the message more than once. If message fails or
is not ack, the produceer can send the message again. **Consumer may have to
eliminate duplicate messages**
- Atmost Once: A producer may send a message once and never retry. If the message
fails or is not ack, the producer will never send the message again. => Good to
display the traffic to webpage
- Exactly Once: Even if a producer send a message more than once, the consumer will
only recieve the message once => transaction cordination on each topic which
ensures single delivery
Multiple consumers may/can access the same message which allows to scale infinitely

Schema:
Allowing to what content and format of the data is in.

Topics and Partitions:


- Stream of messages in Kafka are called Topics
- Partitions are part of a topic
- Partition contains the message
- Each message inside an partition will get an offset
- Each Offset is unique inside a partition
- Offset expire after a week
- Message are randomly assigned (in roundrobin) to a partition until a key is
provided as to which partition
- No. of Partition in a topic is declared when a topic is created.
- Once partition is created and offset 0 is written there will be no going back to
0 it will always increase

Producer and Consumers:


- Consumer read messages in order they are produced.
- They keep track of the messages consumed based on the offset id.
- Written to disk and replicated
- Producer wait for the acks from consumer before moving forwards. No. of acks can
be configured
- Consumer group are created to make sure message is only consumed once in the
group. #Other consumer/consumer group can still consume the same message#

Brokers and Clusters:


- Brokers are responsible for selecting a leader and replicating the data across
the cluster
- Zookeeper behaves as orchestrator
- Broker ID(in cluster setup)???

-Only in-sync replica takeover as leader

Good practice for replicas:


1. Spread evenly amongst broker
2. Each partition is on different broker
3. Put replicas on diff racks

__consumer_offsets: is a kafka created topic to store the offset count read of a


consumer group

## Line reader and Avro


## Map Reduce
All clusters must be in a single timezone
## Stream based design pattern

Stream processing is not ideal it data is not flowing real time 24x7
https://www.beyondthelines.net/computing/kafka-patterns/

# Broker Configs:
- auto.leader.rebalance.enable (leader.imbalance.check.interval.seconds,
leader.imbalance.per.broker.percentage)
- background.threads
- compression.type
- delete.topic.enable
- message.max.bytes
- min.insync.replicas
- replica.fetch.min.bytes
- replica.fetch.wait.max.ms
- unclean.leader.election.enable
- zookeeper.max.in.flight.requests
- broker.rack

# Topic Configs:

# We can enable replica reading from 2.4 but would require things related to racks

./kafka-topics.sh --bootstrap-server localhost:9090 --create --topic Test.Default


--partitions 3 --replication-factor 1
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
Test.Default --num-records 5000000 --record-size 1024 --throughput -1 --producer-
props acks=-1 bootstrap.servers=localhost:9090,localhost:9094,localhost:9096
batch.size=8196
5000000 records sent, 156774.213777 records/sec (153.10 MB/sec), 101.09 ms avg
latency, 2088.00 ms max latency, 44 ms 50th, 340 ms 95th, 777 ms 99th, 1901 ms
99.9th.

- KAFKA_CFG_BACKGROUND_THREADS=40
./kafka-topics.sh --bootstrap-server localhost:9090 --create --topic
Test.BackgroundThread --partitions 3 --replication-factor 1
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
Test.BackgroundThread --num-records 5000000 --record-size 1024 --throughput -1 --
producer-props acks=-1
bootstrap.servers=localhost:9090,localhost:9094,localhost:9096 batch.size=8196
5000000 records sent, 156118.275205 records/sec (152.46 MB/sec), 88.40 ms avg
latency, 2214.00 ms max latency, 19 ms 50th, 420 ms 95th, 938 ms 99th, 1969 ms
99.9th.

./kafka-topics.sh --bootstrap-server localhost:9090 --create --topic


Test.Compression --partitions 3 --replication-factor 1
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
Test.Compression --num-records 6000000 --record-size 6144 --throughput -1 --
producer-props acks=all
bootstrap.servers=localhost:9090,localhost:9094,localhost:9096
compression.type=none
6000000 records sent, 36123.904994 records/sec (211.66 MB/sec), 9.07 ms avg
latency, 1340.00 ms max latency, 0 ms 50th, 43 ms 95th, 190 ms 99th, 857 ms 99.9th.
Size = 37.2GB

./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic


Test.Compression --num-records 6000000 --record-size 6144 --throughput -1 --
producer-props acks=all
bootstrap.servers=localhost:9090,localhost:9094,localhost:9096
compression.type=gzip
6000000 records sent, 9275.720801 records/sec (54.35 MB/sec), 3.67 ms avg latency,
866.00 ms max latency, 1 ms 50th, 2 ms 95th, 3 ms 99th, 698 ms 99.9th. Size =
23.4GB

./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic


Test.Compression --num-records 6000000 --record-size 6144 --throughput -1 --
producer-props acks=all
bootstrap.servers=localhost:9090,localhost:9094,localhost:9096
compression.type=snappy
6000000 records sent, 35531.762435 records/sec (208.19 MB/sec), 8.21 ms avg
latency, 1643.00 ms max latency, 0 ms 50th, 6 ms 95th, 250 ms 99th, 1144 ms 99.9th.
Size = 37.3GB

./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic


Test.Compression --num-records 6000000 --record-size 6144 --throughput -1 --
producer-props acks=all
bootstrap.servers=localhost:9090,localhost:9094,localhost:9096 compression.type=lz4
6000000 records sent, 20745.740208 records/sec (121.56 MB/sec), 8.20 ms avg
latency, 1627.00 ms max latency, 0 ms 50th, 2 ms 95th, 316 ms 99th, 1105 ms 99.9th.
Size = 37.3 GB

./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic


Test.Compression --num-records 6000000 --record-size 6144 --throughput -1 --
producer-props acks=all
bootstrap.servers=localhost:9090,localhost:9094,localhost:9096
compression.type=zstd
6000000 records sent, 26989.946245 records/sec (158.14 MB/sec), 9.63 ms avg
latency, 1792.00 ms max latency, 0 ms 50th, 2 ms 95th, 303 ms 99th, 1501 ms 99.9th.
Size = 22.4 GB

./kafka-topics.sh --bootstrap-server localhost:9090 --create --topic


Test.Compression --partitions 3 --replication-factor 3

6000000 records sent, 16809.029811 records/sec (98.49 MB/sec), 243.13 ms avg


latency, 2976.00 ms max latency, 213 ms 50th, 545 ms 95th, 592 ms 99th, 2205 ms
99.9th. Size = 111GB

6000000 records sent, 9834.517846 records/sec (57.62 MB/sec), 12.34 ms avg latency,
2899.00 ms max latency, 1 ms 50th, 2 ms 95th, 11 ms 99th, 2617 ms 99.9th. Size =
70.2GB

6000000 records sent, 14366.268003 records/sec (84.18 MB/sec), 284.33 ms avg


latency, 2821.00 ms max latency, 135 ms 50th, 754 ms 95th, 899 ms 99th, 1584 ms
99.9th. Size = 112GB

6000000 records sent, 16001.109410 records/sec (93.76 MB/sec), 255.40 ms avg


latency, 3150.00 ms max latency, 262 ms 50th, 430 ms 95th, 506 ms 99th, 2053 ms
99.9th. Size=111GB

6000000 records sent, 18951.058890 records/sec (111.04 MB/sec), 306.84 ms avg


latency, 3467.00 ms max latency, 66 ms 50th, 781 ms 95th, 874 ms 99th, 2607 ms
99.9th. Size=67.3GB

./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic


Test.Compression --num-records 6000000 --record-size 1638 --throughput -1 --
producer-props acks=all
bootstrap.servers=localhost:9090,localhost:9094,localhost:9096
compression.type=zstd batch.size=262144

You might also like