Log Compaction in Kafka
Log Compaction in Kafka
com/share/6747f489-211c-8009-bfa1-bc6dc51f3bfb
Log compaction in Apache Kafka is a mechanism to retain only the most recent value for
each key in a topic, while deleting older values. This feature is particularly useful for
topics where the latest state of a key is sufficient, such as in scenarios involving
databases, caches, or configurations.
A message with a key and a null value (known as a tombstone) marks the
o
deletion of that key.
o Tombstone records are eventually removed during compaction, once all
older records with that key have been purged.
4. Compaction Process:
1. State Preservation: Provides a compact view of the latest state of each key.
2. Efficient Storage: Reduces storage requirements by deleting redundant or
outdated messages.
3. System Recovery: Allows consumers to rebuild state from compacted topics
efficiently without needing a full historical log.
1. Database Change Streams: Storing change events for a database, where only the
latest update for each primary key is relevant.
2. Configuration Management: Topics holding configuration updates for
distributed systems.
3. Cache Invalidation: Topics used to update caches with only the latest value.
To enable log compaction for a topic, set the cleanup.policy configuration to compact.
This can be done either during topic creation or through an update:
bash
Copy code
bin/kafka-topics.sh --create --topic my-topic --partitions 3 --
replication-factor 2 --config cleanup.policy=compact
You can also have a mixed retention policy (both compact and delete) by setting
cleanup.policy=compact,delete.
Log compaction is an advanced feature in Kafka that enables efficient data storage and
ensures that topics reflect the latest state of data keys.