Apache Kafka
Apache Kafka
Apache Kafka was originally developed by LinkedIn in 2010, and later it was donated to the Apache Software
Foundation. Currently, it is maintained by Confluent under Apache Software Foundation.
The typical example of this system is an Order Processing System, where each order will be processed by one Order
Processor, but Multiple Order Processors can work as well at the same time. The following diagram depicts the
structure.
Messaging System
2. Publish-Subscribe System
Messages are persisted in a topic. Unlike point-to-point system, consumers can subscribe to one or more topic and
consume all the messages in that topic. In the Publish-Subscribe system, message producers are called publishers
and message consumers are called subscribers.
A real-life example is Dish TV, which publishes different channels like sports, movies, music, etc., and anyone can
subscribe to their own set of channels and get them whenever their subscribed channels are available.
Apache Kafka as a Messaging System
Apache Kafka Architecture
Apache Kafka Architecture Cont...
Kafka is a distributed, replicated commit log. Kafka does not have the concept of a queue which might seem strange at
first, given that it is primary used as a messaging system. Queues have been synonymous with messaging systems for a
long time. Let’s break down “distributed, replicated commit log” a bit:
Distributed because Kafka is deployed as a cluster of nodes, for both fault tolerance and scale
Replicated because messages are usually replicated across multiple nodes (servers).
Kafka is so powerful regarding throughput and scalability that it allow you to handle continuous stream of messages.
Commit Log because messages are stored in partitioned, append only logs which are called Topics. This concept of a log
is the principal killer feature of Kafka.
Apache Kafka Work Flow
Following is the step wise workflow of the Pub-Sub Messaging −
• Producers send message to a topic at regular intervals.
• Kafka broker stores all messages in the partitions configured for that particular topic. It ensures the messages
are equally shared between partitions. If the producer sends two messages and there are two partitions, Kafka
will store one message in the first partition and the second message in the second partition.
• Consumer subscribes to a specific topic.
• Once the consumer subscribes to a topic, Kafka will provide the current offset of the topic to the consumer and
also saves the offset in the Zookeeper.
• Consumer will request the Kafka in a regular interval (like 100 Ms) for new messages.
• Once Kafka receives the messages from producers, it forwards these messages to the consumers.
• Consumer will receive the message and process it.
• Once the messages are processed, consumer will send an acknowledgement to the Kafka broker.
• Once Kafka receives an acknowledgement, it changes the offset to the new value and updates it in the
Zookeeper. Since offsets are maintained in the Zookeeper.
• This above flow will repeat until the consumer stops the request.
• Consumer has the option to rewind/skip to the desired offset of a topic at any time and read all the subsequent
messages
Apache Kafka Core API
1. Topic
Basically, A Topic is a unique name for Kafka Stream. Topic is a category or feed name to which records are
published, and stores messages. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or
many consumers that subscribe to the data written to it.
The Kafka cluster durably persists all published records—whether or not they have been consumed—
using a configurable retention period. For example, if the retention policy is set to two days, then for the
two days after a record is published, it is available for consumption, after which it will be discarded to free
up space. Kafka's performance is effectively constant with respect to data size so storing data for a
long time is not a problem.
This is one of the biggest difference between RabbitMQ/ActiveMQ and Kafka.
Kafka Components
2. Kafka Producer
It publishes messages to a Kafka topic. The producer is responsible for choosing which record to assign to which
partition within the topic.
3. Kafka Consumer
This component subscribes to a topic(s), reads and processes messages from the topic(s).
4. Kafka Broker
Kafka Broker manages the storage of messages in the topic(s). If Kafka has more than one broker, that is what we
call a Kafka cluster.
5. Kafka Zookeeper
To offer the brokers with metadata about the processes running in the system and to facilitate health checking and
managing and coordinating, Kafka uses Kafka zookeeper.
Kafka Components
• Partitions for the same topic are distributed
across multiple brokers in the cluster.
There are several use Cases of Kafka that show why we actually use Apache Kafka.
Messaging
For a more traditional message broker, Kafka works well as a replacement. We can say Kafka has better throughput,
built-in partitioning, replication, and fault-tolerance which makes it a good solution for large-scale message
processing applications.
Metrics
For operational monitoring data, Kafka finds the good application. It includes aggregating statistics from distributed
applications to produce centralized feeds of operational data.
Event Sourcing
Since it supports very large stored log data, that means Kafka is an excellent backend for applications of event
sourcing.
RabbitMQ Vs Kafka
Let’s see how they differ from one another:
i. Features
Apache Kafka– Basically, Kafka is distributed. Also, with guaranteed durability and availability, the data is shared and
replicated.
RabbitMQ– It offers relatively less support for these features.
iii. Processing
Apache Kafka — It allows reliable log distributed processing. Also, stream processing semantics built into the Kafka
Streams.
RabbitMQ — Here, the consumer is just FIFO based, reading from the HEAD and processing 1 by 1.
iv. Replay
When your application needs access to stream history, delivered in partitioned order at least once. Kafka is a durable
message store and clients can get a “replay” of the event stream on demand, as opposed to more traditional message
brokers where once a message has been delivered, it is removed from the queue.
Implementation of Kafka
Dependency uses:
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
Implementation of Kafka
Define the KafkaSender class to send message to the kafka topic named as developervisits-topic:
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Service;
@Service
public class KafkaSender {
@Autowired
private KafkaTemplate<String, String> kafkaTemplate;
kafkaTemplate.send(kafkaTopic, message);
}
}
Implementation of Kafka
Define a Controller which will pass the message and trigger the send message to the Kafka Topic
using the KafkaSender class.
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import com.hcl.service.KafkaSender;
@RestController
@RequestMapping(value = "/developervisits-kafka/")
public class ApacheKafkaWebController {
@Autowired
KafkaSender kafkaSender;
@GetMapping(value = "/producer")
public String producer(@RequestParam("message") String message) {
kafkaSender.send(message);
}
Implementation of Kafka
Finally Define the Spring Boot Class with @SpringBootApplication annotation
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class SpringBootHelloWorldApplication {
SpringApplication.run(new Object[] {
SpringBootHelloWorldApplication.class
}, args);
}
}
Implementation of Kafka
We are done with the required Java code. Now lets start Apache Kafka. As we had explained in detail
in the Getting started with Apache Kafka perform the following.
https://kafka.apache.org/downloads
zookeeper-server-start.bat c:\shareData\development\appachekafka\kafka_2.12-2.0.0\config\
zookeeper.properties
kafka-server-start.bat c:\shareData\development\appachekafka\kafka_2.12-2.0.0\config\
server.properties
Implementation of Kafka
Next start the Spring Boot Application by running it as a Java Application.
PRODUCER
kafka-console-producer.bat --broker-list localhost:9092 --topic developervisits-topic
-OR-
http://localhost:8080/developervisits-kafka/producer?message="test"
CONSUMER
kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic developervisits-topic --
from-beginning