0% found this document useful (0 votes)
23 views3 pages

Kafka Interview Problems Clean

The document outlines strategies for ensuring exactly-once semantics in Kafka-based pipelines, emphasizing the use of idempotent producers, transactional APIs, and the Outbox Pattern for cross-system consistency. It also addresses identifying and resolving consumer group lag during traffic spikes, designing a topic strategy for multi-tenant platforms, handling out-of-order messages, and implementing message replay for debugging. Key recommendations include optimizing consumer logic, using shared domain topics, and ensuring deterministic consumers for effective message processing.

Uploaded by

pbecic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views3 pages

Kafka Interview Problems Clean

The document outlines strategies for ensuring exactly-once semantics in Kafka-based pipelines, emphasizing the use of idempotent producers, transactional APIs, and the Outbox Pattern for cross-system consistency. It also addresses identifying and resolving consumer group lag during traffic spikes, designing a topic strategy for multi-tenant platforms, handling out-of-order messages, and implementing message replay for debugging. Key recommendations include optimizing consumer logic, using shared domain topics, and ensuring deterministic consumers for effective message processing.

Uploaded by

pbecic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Kafka

How would you ensure exactly-once semantics in a Kafka-based pipeline


involving multiple services and databases?
 Kafka provides exactly-once semantics (EOS) using idempotent producers and
transactional APIs, but only within Kafka (e.g., writing to topics).
 For cross-system EOS (e.g., Kafka + DB):
 Begin a Kafka transaction.
 Produce the message to Kafka.
 Execute the local DB write (which must be idempotent).
 On success, commit the Kafka transaction.
 On failure, abort the transaction to prevent duplicate delivery.
 A better alternative for microservices is the Outbox Pattern:
 Service writes the event to both its DB and an outbox table in a single transaction.
 A CDC tool (e.g., Debezium) reads from the outbox and publishes to Kafka.
 This decouples DB consistency from Kafka publishing logic and improves reliability.
 To ensure correctness on the consumer side:
 Use idempotent consumers (deduplicate by message ID or use upserts).
 For stream processing frameworks like Kafka Streams or Flink:
 Use transactional sinks or checkpointed state to ensure atomic writes and reprocessing
guarantees.

A consumer group lags behind during traffic spikes. How would you identify and
resolve the bottleneck?
 First, diagnose the source of lag:
 Use kafka-consumer-groups.sh to inspect lag per partition.
 Determine whether lag is due to message processing time, limited parallelism, or
misconfigured consumer settings.
 Common bottlenecks include:
 Slow downstream systems (e.g., DB writes, HTTP calls).
 Too few partitions to allow parallelism.
 Offsets not being committed properly (causing reprocessing).
 High GC pressure or threading issues on the consumer app.
 To fix the issue:
 Optimize the consumer processing logic (e.g., async I/O, batching DB calls).
 Increase the number of partitions to allow horizontal scaling.
 Tune settings like max.poll.records, fetch.min.bytes, or increase thread pool size.
 Introduce monitoring and auto-scaling mechanisms using Prometheus/Grafana.
How would you design a Kafka topic strategy for a multi-tenant platform with
millions of users and dozens of data domains?
 Avoid creating a topic per user — this would overload Kafka's broker metadata.
 Instead, design shared domain topics:
 Embed tenant/user ID in the key (e.g., tenantId:userId) to maintain partition-level
ordering.
 Example: user.activity.events topic with key-based partitioning by tenant.
 Determine topic granularity based on domain context:
 Use logically grouped topics like billing.events, profile.updates, etc.
 Balance data volume, retention requirements, and consumer access patterns.
 Implement schema management with Avro/Protobuf and Schema Registry.
 Enforce access control (e.g., ACLs) if external consumers consume topics.
 Ensure even partition distribution using consistent hashing or composite keys.

A Kafka topic has out-of-order messages. What could be the cause and how
would you fix it?
 Kafka guarantees order only within a single partition.
 Common causes of out-of-order delivery:
 Improper use of keys — same logical message stream split across partitions.
 Multiple producers with inconsistent keying or without keys.
 Retries or replays that delay certain messages.
 To resolve:
 Use a consistent partition key (e.g., user ID) to ensure message locality.
 Configure producers for FIFO delivery:
 Enable enable.idempotence=true.
 Set acks=all and max.in.flight.requests.per.connection=1.
 Add sequence numbers to the message payload to allow downstream reordering if
needed.
 If strict ordering across keys is required, use Kafka Streams with windowed or stateful
logic (with caution).

How would you implement a Kafka-based system that supports message replay
for debugging or reprocessing?
 Start with long-retention or compacted Kafka topics.
 Ensure all consumers are deterministic and idempotent.
 Common replay strategies:
 Run a custom consumer with auto.offset.reset=earliest.
 Externally manage offsets (e.g., store checkpoints in DB).
 Use a Dead Letter Topic (DLT) to isolate and replay failures.
 Advanced replay architectures:
 Mirror events to a dedicated 'replay' topic.
 Use timestamp-based offset seeking with Kafka's API.
 Tooling and support:
 Persist historical data to S3/Elasticsearch using Kafka Connect.
 Use Kafka Streams to rebuild derived states from event history.
 Expose a UI (e.g., AKHQ, Kafka UI) to allow selective or targeted replay by operators.

You might also like