0% found this document useful (0 votes)
17 views2 pages

Log Compaction in Kafka

Uploaded by

Krishnaraghava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views2 pages

Log Compaction in Kafka

Uploaded by

Krishnaraghava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 2

https://chatgpt.

com/share/6747f489-211c-8009-bfa1-bc6dc51f3bfb

What is Log Compaction in Kafka

Log compaction in Apache Kafka is a mechanism to retain only the most recent value for
each key in a topic, while deleting older values. This feature is particularly useful for
topics where the latest state of a key is sufficient, such as in scenarios involving
databases, caches, or configurations.

How Log Compaction Works:

1. Key-Value Messages: Kafka topics configured for log compaction store


messages as key-value pairs. A key is required for log compaction to work.
2. Retention Policy: Instead of retaining messages based solely on time or size (like
the default retention policy), log compaction ensures that:

o The most recent message for each key is always retained.


o Older messages with the same key can be deleted, except the latest
message.
3. Special Case: Tombstone Records:

A message with a key and a null value (known as a tombstone) marks the
o
deletion of that key.
o Tombstone records are eventually removed during compaction, once all
older records with that key have been purged.
4. Compaction Process:

o Compaction is performed in the background by Kafka’s log cleaner


thread.
o The cleaner scans log segments and removes older messages with
duplicate keys, keeping only the latest version.
o This ensures the compacted topic maintains the latest state of each key.

Benefits of Log Compaction:

1. State Preservation: Provides a compact view of the latest state of each key.
2. Efficient Storage: Reduces storage requirements by deleting redundant or
outdated messages.
3. System Recovery: Allows consumers to rebuild state from compacted topics
efficiently without needing a full historical log.

Use Cases for Log Compaction:

1. Database Change Streams: Storing change events for a database, where only the
latest update for each primary key is relevant.
2. Configuration Management: Topics holding configuration updates for
distributed systems.
3. Cache Invalidation: Topics used to update caches with only the latest value.

Configuring Log Compaction:

To enable log compaction for a topic, set the cleanup.policy configuration to compact.
This can be done either during topic creation or through an update:

bash
Copy code
bin/kafka-topics.sh --create --topic my-topic --partitions 3 --
replication-factor 2 --config cleanup.policy=compact

You can also have a mixed retention policy (both compact and delete) by setting
cleanup.policy=compact,delete.

Log compaction is an advanced feature in Kafka that enables efficient data storage and
ensures that topics reflect the latest state of data keys.

You might also like