0% found this document useful (0 votes)
22 views10 pages

Some Special Terms in Kafka

Uploaded by

Krishnaraghava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views10 pages

Some Special Terms in Kafka

Uploaded by

Krishnaraghava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 10

In Kafka and similar distributed systems, there are various specialized terms that

describe patterns, behaviors, or mechanisms. Some examples, including poison pill and
tombstone record, are:

1. Poison Pill

 A message deliberately sent to stop a consumer or signal the end of a stream.


 Often used in testing or as part of a controlled shutdown process. For example, a
producer might send a special message (e.g., STOP) that consumers recognize as a
signal to terminate gracefully.

2. Tombstone Record

 A special message with a null value used to represent a "delete" operation in a


Kafka topic.
 Commonly used in compacted topics where Kafka retains only the latest key-
value pair for each key. The tombstone ensures the key is removed when
compacted.
 Example use case: Deleting a record in a log of key-value pairs, such as user
profiles.

3. Compacted Topic

 A Kafka topic with log compaction enabled, where only the latest message for a
given key is retained.
 Useful for cases where consumers need the latest state rather than the complete
history, such as a configuration store.

4. Consumer Lag

 The difference between the latest offset in a partition and the offset the
consumer has read.
 Indicates how far behind a consumer is in processing messages.

5. Offset
 A unique identifier assigned to a message within a Kafka partition.
 Tracks the position of a consumer in the stream, enabling it to resume from
where it left off.

6. Dead Letter Queue (DLQ)

 A special Kafka topic where messages that fail processing are sent for further
analysis or retries.
 Common in error-handling strategies to isolate problematic records.

7. Idempotence (edempotence or idempotence)

 A property ensuring that producing the same message multiple times has the
same effect as producing it once.
 Kafka producers can use idempotent producer mode to avoid duplicates when
retrying sends.

8. Consumer Group

 A group of consumers coordinating to read from partitions of a topic collectively.


 Ensures messages are processed only once across the group.

9. Leader and Replica

 Leader: The partition copy that handles read and write operations.
 Replica: Additional copies of the partition that provide fault tolerance and are
synchronized with the leader.

10. Exactly Once Semantics (EOS)

 A guarantee in Kafka that a message is processed exactly once by a consumer,


even in the presence of failures.
 Achieved through idempotent producers and transactions.
11. Watermark

 A marker or timestamp that helps manage event-time processing in streaming


systems.
 Helps handle late-arriving messages and establish time windows for analytics.

12. Log Compaction

 A mechanism that retains only the latest message for each key in a topic,
deleting older records.
 Used to reduce storage usage while maintaining the latest state.

13. High-Water Mark (HWM)

 The maximum offset of messages successfully replicated to all in-sync replicas.


 Consumers can only read up to this mark, ensuring durability.

14. In-Sync Replica (ISR)

 A replica that is up-to-date with the leader partition.


 Critical for maintaining Kafka's fault tolerance.

These terms reflect Kafka's architecture and its focus on reliable, scalable, and fault-
tolerant messaging systems. Understanding them is key to using Kafka effectively in real-
world applications.

how about backpressure ? you have not mentioned anything about this ..

ChatGPT said:
ChatGPT
You're absolutely right! Backpressure is another important concept, particularly in
streaming systems like Kafka. Here's an explanation:

Backpressure

 Definition: A mechanism to handle scenarios where producers generate data


faster than consumers can process it.
 It ensures the system doesn't get overwhelmed, leading to stability and better
resource management.

Backpressure in Kafka

Kafka, as a distributed messaging system, indirectly supports backpressure through its


design. Here’s how it handles related scenarios:

1. Consumer Pull Model:


o Kafka uses a pull-based consumption model, where consumers decide
when to request data.
o This naturally prevents the broker from overwhelming consumers, as
data is only sent when requested.
2. Consumer Lag:

o When a consumer processes messages slowly, lag builds up. While this
doesn’t immediately apply backpressure to producers, monitoring
consumer lag is crucial to prevent storage issues on the broker.
3. Disk Retention:

o If consumers are too slow and storage limits are reached, Kafka may start
deleting old messages based on retention policies. This acts as a
"backpressure signal" for system administrators to scale up resources.

Backpressure in Kafka Streams

In Kafka Streams (a stream-processing library), backpressure is handled more explicitly:

 Threading Model:
o Kafka Streams uses a fixed number of threads for processing, which
naturally limits throughput.
o If the downstream operator is slower, processing in upstream stages
slows down, propagating backpressure.
 Buffer Sizes:

o Each processing stage can buffer a limited number of messages. When


buffers are full, upstream stages block until space becomes available.

How Backpressure Works in Practice

Backpressure can occur when:

1. Producers Overwhelm Brokers:


o If brokers cannot keep up, producers may experience timeouts or need to
retry.
o Producers can slow down based on acknowledgment (ACK)
configurations.
2. Consumers Fall Behind:

o If consumers can’t keep up, messages accumulate in brokers.


o The system relies on monitoring (e.g., consumer lag) to detect
bottlenecks.
3. Network or Disk IO Limitations:

o Limited network or disk throughput in Kafka can indirectly throttle


producers and consumers.

Mitigation Strategies

 Scaling:
o Add partitions to distribute load across brokers.
o Increase the number of consumer instances within the consumer group.
 Rate Limiting:

o Implement producer-side throttling to match consumer speed.


 Buffer Tuning:

o Adjust buffer sizes in producers, consumers, and brokers.


 Monitoring and Alerts:

o Monitor consumer lag, broker disk utilization, and producer retries.


Kafka’s architecture and tools like Kafka Streams provide mechanisms to manage
backpressure efficiently, ensuring system stability and reliability even under high loads.

4o

You said:
Any other terms that you have missed or wanted me to know ?

ChatGPT said:
ChatGPT

Absolutely! Kafka's ecosystem has a rich vocabulary tied to its functionality and design.
Here are additional key terms and concepts worth knowing:

1. Partition

 A partition is a subset of a Kafka topic, and it’s where messages are stored
sequentially.
 Each partition has an ordered log of messages, providing scalability by allowing
parallel processing.

2. Broker

 A broker is a Kafka server that stores data and serves client requests for data
reads and writes.
 Multiple brokers form a Kafka cluster, which allows load distribution and fault
tolerance.

3. Topic

 A topic is a category or feed name to which records are sent by producers and
from which records are consumed.
 Topics are divided into partitions for scalability.
4. Retention Policy

 Kafka retains messages for a configurable time or until a storage size limit is
reached, even after they're consumed.
 Policies include:
o Time-based Retention: Retain messages for a set period (e.g., 7 days).
o Size-based Retention: Retain messages until the total size exceeds a
configured limit.

5. ZooKeeper

 Originally used for managing Kafka’s metadata, brokers, and leader election.
 Being phased out in favor of KRaft (Kafka Raft), which replaces ZooKeeper for
metadata management in newer Kafka versions.

6. KRaft (Kafka Raft Metadata Mode)

 Kafka's new built-in consensus protocol to manage metadata and configuration.


 It simplifies operations by removing the dependency on ZooKeeper.

7. Leader Election

 A process where a broker is chosen as the leader for a partition.


 The leader handles all read and write requests for the partition, with replicas
acting as backups.

8. Offset Commit

 Consumers commit offsets to track the last read message.


 There are two types:
o Automatic Commit: Kafka commits offsets at regular intervals.
o Manual Commit: Application-controlled to ensure accurate processing.

9. Acknowledgments (ACKs)
 Control message durability at the producer level.
 Common ACK settings:
o ACKS=0: No acknowledgment; faster but unreliable.
o ACKS=1: Acknowledged by the leader broker.
o ACKS=all: Acknowledged by all in-sync replicas, ensuring durability.

10. Message Key

 Used to route messages to specific partitions within a topic.


 Ensures messages with the same key always go to the same partition, enabling
order preservation.

11. Schema Registry

 Part of the Confluent ecosystem, the Schema Registry stores and validates
schemas (e.g., Avro, JSON, Protobuf).
 Ensures compatibility between producers and consumers by enforcing schema
evolution rules.

12. Replication Factor

 The number of copies of a partition stored across brokers.


 Higher replication factors improve fault tolerance but increase storage costs.

13. ISR (In-Sync Replicas)

 A set of replicas that are up-to-date with the leader.


 Only replicas in the ISR are eligible to become the new leader if the current
leader fails.

14. Transactional Messages

 Kafka supports transactions to achieve atomic writes across multiple partitions


or topics.
 Enables exactly-once semantics (EOS) for stream processing applications.

15. Batch Size

 Kafka producers send messages in batches to improve efficiency.


 Larger batch sizes reduce network overhead but increase latency.

16. Connector

 Kafka Connect provides source connectors (ingest data into Kafka) and sink
connectors (export data from Kafka).
 Example: Connecting Kafka to databases, file systems, or cloud services.

17. Rebalance

 Happens when consumers join or leave a consumer group, or partitions are


reassigned.
 Kafka redistributes partitions among consumers, temporarily halting processing.

18. Watermarking

 A mechanism in stream processing to handle event-time semantics and late-


arriving messages.
 Used in windowed computations (e.g., processing events within a 5-minute
window).

19. Out-of-Order Events

 In distributed systems, events might arrive at consumers in an order different


from when they were produced.
 Kafka Streams and other libraries handle this with event-time processing.
20. Skewed Partition

 A situation where one partition gets disproportionately more data than others,
causing load imbalances.
 Mitigation includes re-partitioning data or using message keys effectively.

21. Metadata

 Metadata in Kafka includes information about brokers, topics, partitions, and


offsets.
 It’s used by clients to locate leaders and fetch data.

22. Log Segment

 A partition's log is split into segments, enabling efficient log retention and
compaction.
 Old segments are deleted or compacted based on the configured retention
policy.

23. Event Sourcing

 A pattern where changes to application state are logged as an immutable


sequence of events.
 Kafka is commonly used as the event store in such architectures.

These terms collectively provide a robust understanding of Kafka's core concepts and
advanced features, giving you the tools to design and operate Kafka-based systems
effectively!

You might also like