Some Special Terms in Kafka
Some Special Terms in Kafka
describe patterns, behaviors, or mechanisms. Some examples, including poison pill and
tombstone record, are:
1. Poison Pill
2. Tombstone Record
3. Compacted Topic
A Kafka topic with log compaction enabled, where only the latest message for a
given key is retained.
Useful for cases where consumers need the latest state rather than the complete
history, such as a configuration store.
4. Consumer Lag
The difference between the latest offset in a partition and the offset the
consumer has read.
Indicates how far behind a consumer is in processing messages.
5. Offset
A unique identifier assigned to a message within a Kafka partition.
Tracks the position of a consumer in the stream, enabling it to resume from
where it left off.
A special Kafka topic where messages that fail processing are sent for further
analysis or retries.
Common in error-handling strategies to isolate problematic records.
A property ensuring that producing the same message multiple times has the
same effect as producing it once.
Kafka producers can use idempotent producer mode to avoid duplicates when
retrying sends.
8. Consumer Group
Leader: The partition copy that handles read and write operations.
Replica: Additional copies of the partition that provide fault tolerance and are
synchronized with the leader.
A mechanism that retains only the latest message for each key in a topic,
deleting older records.
Used to reduce storage usage while maintaining the latest state.
These terms reflect Kafka's architecture and its focus on reliable, scalable, and fault-
tolerant messaging systems. Understanding them is key to using Kafka effectively in real-
world applications.
how about backpressure ? you have not mentioned anything about this ..
ChatGPT said:
ChatGPT
You're absolutely right! Backpressure is another important concept, particularly in
streaming systems like Kafka. Here's an explanation:
Backpressure
Backpressure in Kafka
o When a consumer processes messages slowly, lag builds up. While this
doesn’t immediately apply backpressure to producers, monitoring
consumer lag is crucial to prevent storage issues on the broker.
3. Disk Retention:
o If consumers are too slow and storage limits are reached, Kafka may start
deleting old messages based on retention policies. This acts as a
"backpressure signal" for system administrators to scale up resources.
Threading Model:
o Kafka Streams uses a fixed number of threads for processing, which
naturally limits throughput.
o If the downstream operator is slower, processing in upstream stages
slows down, propagating backpressure.
Buffer Sizes:
Mitigation Strategies
Scaling:
o Add partitions to distribute load across brokers.
o Increase the number of consumer instances within the consumer group.
Rate Limiting:
4o
You said:
Any other terms that you have missed or wanted me to know ?
ChatGPT said:
ChatGPT
Absolutely! Kafka's ecosystem has a rich vocabulary tied to its functionality and design.
Here are additional key terms and concepts worth knowing:
1. Partition
A partition is a subset of a Kafka topic, and it’s where messages are stored
sequentially.
Each partition has an ordered log of messages, providing scalability by allowing
parallel processing.
2. Broker
A broker is a Kafka server that stores data and serves client requests for data
reads and writes.
Multiple brokers form a Kafka cluster, which allows load distribution and fault
tolerance.
3. Topic
A topic is a category or feed name to which records are sent by producers and
from which records are consumed.
Topics are divided into partitions for scalability.
4. Retention Policy
Kafka retains messages for a configurable time or until a storage size limit is
reached, even after they're consumed.
Policies include:
o Time-based Retention: Retain messages for a set period (e.g., 7 days).
o Size-based Retention: Retain messages until the total size exceeds a
configured limit.
5. ZooKeeper
Originally used for managing Kafka’s metadata, brokers, and leader election.
Being phased out in favor of KRaft (Kafka Raft), which replaces ZooKeeper for
metadata management in newer Kafka versions.
7. Leader Election
8. Offset Commit
9. Acknowledgments (ACKs)
Control message durability at the producer level.
Common ACK settings:
o ACKS=0: No acknowledgment; faster but unreliable.
o ACKS=1: Acknowledged by the leader broker.
o ACKS=all: Acknowledged by all in-sync replicas, ensuring durability.
Part of the Confluent ecosystem, the Schema Registry stores and validates
schemas (e.g., Avro, JSON, Protobuf).
Ensures compatibility between producers and consumers by enforcing schema
evolution rules.
16. Connector
Kafka Connect provides source connectors (ingest data into Kafka) and sink
connectors (export data from Kafka).
Example: Connecting Kafka to databases, file systems, or cloud services.
17. Rebalance
18. Watermarking
A situation where one partition gets disproportionately more data than others,
causing load imbalances.
Mitigation includes re-partitioning data or using message keys effectively.
21. Metadata
A partition's log is split into segments, enabling efficient log retention and
compaction.
Old segments are deleted or compacted based on the configured retention
policy.
These terms collectively provide a robust understanding of Kafka's core concepts and
advanced features, giving you the tools to design and operate Kafka-based systems
effectively!