Hello
Hello
LIST OF EXPERIMENTS:
1. Install Apache Kafka on a single node.
2. Demonstrate setting up a single-node, single-broker Kafka cluster and show basic operations
such as creating topics and producing/consuming messages.
3. Extend the cluster to multiple brokers on a single node.
4. Write a simple Java program to create a Kafka producer and Produce messages to a topic.
5. Implement sending messages both synchronously and asynchronously in the producer.
6. Develop a Java program to create a Kafka consumer and subscribe to a topic and consume
messages.
7. Write a script to create a topic with specific partition and replication factor settings.
8. Simulate fault tolerance by shutting down one broker and observing the cluster behavior.
9. Implement operations such as listing topics, modifying configurations, and deleting topics.
10. Introduce Kafka Connect and demonstrate how to use connectors to integrate with external
systems.
11. Implement a simple word count stream processing application using Kafka Stream
12. Implement Kafka integration with the Hadoop ecosystem.
1.Install Apache Kafka on a single node.
Apache Kafka can be run on all platforms supported by Java. In order to set up Kafka on the
Ubuntu system, you need to install java first. As we know, Oracle java is now commercially
Download the Apache Kafka binary files from its official download website. You can also
wget https://downloads.apache.org/kafka/3.4.0/kafka_2.12-3.4.0.tgz
tarxzf kafka_2.12-3.4.0.tgz
sudomv kafka_2.12-3.4.0 /usr/local/kafka
Step 3 — Creating System Unit Files
Now, you need to create system unit files for the Zookeeper and Kafka services. Which will
nano /etc/systemd/system/zookeeper.service
[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
ExecStart=/usr/local/kafka/bin/zookeeper-server-start.sh
/usr/local/kafka/config/zookeeper.properties
ExecStop=/usr/local/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
nano /etc/systemd/system/kafka.service
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64"
ExecStart=/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
ExecStop=/usr/local/kafka/bin/kafka-server-stop.sh
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
First, you need to start the ZooKeeper service and then start Kafka. Use the systemctl
sudosystemctlstart zookeeper
Now start the Kafka server and view the running status:
Kafka provides multiple pre-built shell scripts to work on it. First, create a topic named
cd /usr/local/kafka
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --
topic myTopic
The replication factor describes how many copies of data will be created. As we are running
with a single instance keep this value 1. Set the partition options as the number of brokers you
want your data to be split between. As we are running with a single broker keep this value 1.
You can create multiple topics by running the same command as above.
After that, you can see the created topics on Kafka by the running below command:
bin/kafka-topics.sh--list--bootstrap-server localhost:9092
To set up a Kafka cluster, you will need to follow these general steps:
1. Install Kafka on all nodes of the cluster. You can download Kafka from the Apache
Kafka website.
2. Configure the server.properties file on each node to specify the broker ID, the
ZooKeeper connection string, and other properties.
3. Start the ZooKeeper service on each node. This is required for Kafka to function.
4. Start the Kafka brokers on each node by running the kafka-server-start command and
specifying the location of the server.properties file.
5. Test the cluster by creating a topic, producing and consuming messages, and verifying
that they are replicated across all nodes.
1. Install Kafka on all nodes of the cluster. You can download Kafka from the Apache
Kafka website.
2. Configure the server.properties file on each node to specify the broker ID, the
ZooKeeper connection string, and other properties. For example, here is a configuration
for a simple Kafka cluster with three brokers:
broker.id=1listeners=PLAINTEXT://localhost:9092num.partitions=3
log.dirs=/tmp/kafka-logs-1zookeeper.connect=localhost:2181broker.id=2
listeners=PLAINTEXT://localhost:9093 num.partitions=3 log.dirs=/tmp/kafka-logs-2
zookeeper.connect=localhost:2181 broker.id=3 listeners=PLAINTEXT://localhost:9094
num.partitions=3 log.dirs=/tmp/kafka-logs-3 zookeeper.connect=localhost:2181
In this example, each broker has a unique broker.id and listens on a different port for client
connections. The num.partitions property specifies the default number of partitions for new
topics, and log.dirs specifies the directory where Kafka should store its data on disk.
zookeeper.connect specifies the ZooKeeper connection string, which should point to the
ZooKeeper ensemble.
1. Start the ZooKeeper service on each node. This is required for Kafka to function. You
can start ZooKeeper by running the following command:
bin/zookeeper-server-start.shconfig/zookeeper.properties
This will start a single-node ZooKeeper instance using the default configuration.
1. Start the Kafka brokers on each node by running the kafka-server-start command and
specifying the location of the server.properties file. For example:
bin/kafka-server-start.shconfig/server.properties
This will start the Kafka broker on the default port (9092) using the configuration in
config/server.properties.
1. Test the cluster by creating a topic, producing and consuming messages, and verifying
that they are replicated across all nodes. You can use the kafka-topics, kafka-console-
producer, and kafka-console-consumer command-line tools to perform these tasks.
For example:
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --
partitions 3 --topic my-topic bin/kafka-console-producer.sh --broker-list
localhost:9092,localhost:9093,localhost:9094 --topic my-topic bin/kafka-console-
consumer.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --topic my-
topic --from-beginning
These commands will create a topic with three partitions and three replicas, produce messages
to the topic, and consume them from all three brokers. You can verify that the messages are
replicated across all nodes by stopping one of the brokers and observing that the other brokers
continue to serve messages.
Example: server.properties
broker.id=1
listeners=PLAINTEXT://localhost:9093
log.dirs=c:/kafka/kafka-logs-1
auto.create.topics.enable=false (optional)
Creating new Broker-1
Follow these steps to add a new broker.
Do the following changes in the file.
1. change id to 1
message sent: Hi
Instantiate a new Consumer to receive the messages.
.\bin\windows\kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic test-
topic-replicated --from-beginning
message received: Hi
Now whatever message we have sent is received to console consumers. Now the interesting
part is that we have 3 new Kafka folders right? Let’s go ahead and check that what we have in
it.
Log directories
• close the producer console now and you know have created a kafka-logs-1 and kafka-
logs-2 directories are created.
• Now each broker got a new folder and that is where it is actually persisting all the
messages that are produced to a particular broker. So we have three different directories
for each and every broker.
Conclusion: we have successfully a Kafka Cluster with 3 brokers and created a topic in the
cluster and successfully produced and consumed the messages into the Kafka cluster.
o Maven
o Gradle
• Complete Kafka Producer
• Complete Kafka Consumer
• Kafka dependencies
• Logging dependencies
Follow these steps to create a Java project with the above dependencies.
The build tool Maven contains a pom.xml file. The pom.xml is a default XML file that
carries all the information regarding the GroupID, ArtifactID, as well as the Version
values. The user needs to define all the necessary project dependencies in
the pom.xml file. Go to the pom.xml file.
pom.xl
5
6
10
11
12
13
<project>
...
<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.0.0</version>
</dependency>
</dependencies>
</project>
If the version number appears red in color, it means the user missed to enable the
'Auto-Import' option. If so, go to View > Tool Windows > Maven. A Maven
Projects Window will appear on the right side of the screen. Click on the 'Refresh'
button appearing right there. This will enable the missed Auto-Import Maven
Projects. If the color changes to black, it means the missed dependency is
downloaded.
Add another dependency for logging. This will enable us to print diagnostic logs
while our application runs.
10
11
12
13
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.32</version>
</dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.32</version>
</dependency>
Now, we have set all the required dependencies. Let's try the Simple Hello
World example.
While creating the java package, follow the package naming conventions. Finally,
create the sample application program as shown below.
10
11
12
packageio.conduktor.demos.kafka;
importorg.slf4j.Logger;
importorg.slf4j.LoggerFactory;
publicclassHelloWorld{
publicstaticvoidmain(String[]args){
log.info("Hello World");
Run the application (the play green button on line 9 in the screenshot below) and
verify that it runs and prints the message, and exits with code 0. This means that your
Java application has run successfully.
Expand the 'External Libraries' on the Project panel and verify that it displays the
dependencies that we added for the project in pom.xml.
We have created a sample Java project that includes all the needed dependencies.
This will form the basis for creating Java producers and consumers next.
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.1.0</version>
</dependency>
import java.util.Properties;
import java.util.concurrent.ExecutionException;
OUTPUT
6.Develop a Java program to create a Kafka consumer and subscribe
to a topic and consume messages
PROGRAM
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
records.forEach(record -> {
System.out.printf("Consumed message: key=%s, value=%s%n", record.key(),
record.value());
});
}
} finally {
consumer.close();
}
}
}
OUTPUT
7. Write a script to create a topic with specific partition and replication factor settings.
Below is a script written in Scala for creating a Kafka topic with specific partition and replication
factor settings. This script can be executed in IntelliJ IDEA with the Kafka dependencies
included in the project.
Program
import java.util.Properties
import org.apache.kafka.clients.admin.{AdminClient, NewTopic}
import scala.collection.JavaConverters._
object KafkaTopicCreator {
def main(args: Array[String]): Unit = {
// Kafka broker properties
val props = new Properties()
props.put("bootstrap.servers", "localhost:9092")
// Create AdminClient
valadminClient = AdminClient.create(props)
// Create topic
val results = adminClient.createTopics(List(newTopic).asJava)
results.values().asScala.foreach { (topicName, future) =>
try {
future.get()
println(s"Topic $topicName created successfully.")
} catch {
case e: Exception =>
println(s"Failed to create topic $topicName: ${e.getMessage}")
}
}
// Close AdminClient
adminClient.close()
}
}
3. Create a new Scala file (e.g., KafkaTopicCreator.scala) and paste the script into it.
4. Make sure your Kafka broker is running on localhost:9092.
5. Run the KafkaTopicCreator object in IntelliJ IDEA.
OUTPUT
We should see the output indicating whether the topic creation was successful or not.
9. Simulate fault tolerance by shutting down one broker and observing the cluster behavior.
To simulate fault tolerance by shutting down one broker and observing the cluster behavior in
IntelliJ, you'll need to set up a Kafka cluster and create a sample producer and consumer
application. Then, you'll shut down one of the brokers to observe the behavior. Here's a step-
by-step example:
1. Set Up Kafka Cluster:
• Ensure you have Kafka installed and configured with multiple brokers. You can
refer to the Kafka documentation for detailed instructions.
2. Create a Topic:
• Let's assume we have a topic named "test_topic" with a replication factor of 3
and 3 partitions. Run the following command in your Kafka installation
directory:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 3 --topic
test_topic
3. Create IntelliJ Project:
• Create a new Maven or Gradle project in IntelliJ.
• Add Kafka dependencies to your pom.xml or build.gradle.
4. Producer Application:
• Create a Java class for the producer application. This application will send
messages to the Kafka topic.
Producer program
import org.apache.kafka.clients.producer.*;
import java.util.Properties;
try {
for (int i = 0; i< 10; i++) {
String message = "Message " + i;
producer.send(new ProducerRecord<>(topic, Integer.toString(i), message));
System.out.println("Sent message: " + message);
Thread.sleep(1000);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
producer.close();
}
}
}
5. Consumer Application:
• Create a Java class for the consumer application. This application will consume
messages from the Kafka topic.
Consumer code
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
consumer.subscribe(Collections.singletonList(topic));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String>record : records) {
System.out.printf("Received message: offset = %d, key = %s, value = %s%n",
record.offset(), record.key(), record.value());
}
}
} finally {
consumer.close();
}
}
}
6. Run Applications:
• Run the producer application and then the consumer application in IntelliJ.
7. Observe Behavior:
• While both producer and consumer are running, shut down one of the Kafka
brokers in your Kafka cluster. You can do this by stopping the Kafka process
associated with that broker.
Output:
Observe how the consumer continues to receive messages without interruption despite the
broker shutdown. Kafka automatically handles the fault tolerance by reassigning partitions to
the remaining brokers.
We can monitor the logs in IntelliJ to see how Kafka handles the failure and reassignment of
partitions.
To implement operations such as listing topics, modifying configurations, and deleting topics
in IntelliJ IDEA, you would typically interact with Apache Kafka, a distributed streaming
platform. Here's a step-by-step guide on how to perform these operations using the Kafka
command line tools (kafka-topics.sh) within IntelliJ IDEA:
1. Setting up Kafka in IntelliJ IDEA:
• First, make sure you have Apache Kafka installed and running on your local
machine or on a server accessible from IntelliJ IDEA.
• Open your IntelliJ IDEA project.
2. Create a new Kotlin/Java file:
• Right-click on your project folder in the project explorer.
• Select "New" -> "Kotlin File/Java Class" to create a new Kotlin/Java file.
3. List Topics:
• To list topics, you can use the Kafka command-line tool kafka-topics.sh.
• Execute the following Kotlin/Java code to list topics:
Listing Topic
import java.io.BufferedReader
import java.io.InputStreamReader
fun main() {
val runtime = Runtime.getRuntime()
val process = runtime.exec("/path/to/kafka/bin/kafka-topics.sh --list --bootstrap-server
localhost:9092")
import java.io.BufferedReader
import java.io.InputStreamReader
fun main() {
valtopicName = "your_topic_name"
valconfigKey = "compression.type"
valconfigValue = "gzip"
Replace your_topic_name with the name of the topic you want to modify, and
/path/to/kafka/bin/kafka-configs.sh with the actual path to kafka-configs.sh script.
5. Delete Topics:
• To delete topics, you can use the Kafka command-line tool kafka-topics.sh.
• Execute the following Kotlin/Java code to delete topics:
Delete Topics
import java.io.BufferedReader
import java.io.InputStreamReader
fun main() {
valtopicName = "your_topic_name"
Replace your_topic_name with the name of the topic you want to delete, and
/path/to/kafka/bin/kafka-topics.sh with the actual path to kafka-topics.sh script.
6. Run the code:
• Run the Kotlin/Java file in IntelliJ IDEA.
• You should see the output in the console showing the list of topics,
configuration modification status, or topic deletion status.
Make sure we have appropriate permissions and the Kafka server is running when executing
these commands. Additionally, replace placeholders such as /path/to/kafka/bin/ and
localhost:9092 with actual paths and addresses relevant to your Kafka setup.
10. Introduce Kafka Connect and demonstrate how to use connectors to integrate with
externalsystems.
Kafka Connect is a framework that provides scalable and reliable streaming data integration
between Apache Kafka and other systems. It simplifies the process of building and managing
connectors to move data in and out of Kafka.
To demonstrate how to use Kafka Connect and connectors to integrate with external systems,
let's walk through an example of setting up a simple connector to move data from a CSV file
to a Kafka topic. We'll use IntelliJ IDEA as our IDE.
Step 1: Setup Kafka and Kafka Connect
Ensure you have Apache Kafka installed and running on your local machine. Additionally,
you'll need to have Kafka Connect installed. You can find installation instructions in the
Apache Kafka documentation.
Step 2: Create a Kafka Connector Configuration File
Create a JSON configuration file for your Kafka connector. For this example, let's call it csv-
Program
source-connector.json:
{
"name": "csv-source-connector",
"config": {
"connector.class": "FileStreamSource",
"tasks.max": "1",
"file": "<path_to_your_csv_file>",
"topic": "csv-topic"
}
}
This command assumes you're using the standalone mode of Kafka Connect. Adjust the paths
as necessary for your setup.
OUTPUT
11. Implement a simple word count stream processing application using Kafka Stream
To implement a simple word count stream processing application using Kafka Streams in
IntelliJ IDEA, you'll first need to set up a Kafka cluster. You can use Docker to set up a local
Kafka cluster quickly. Then, you'll create a Maven project in IntelliJ IDEA and add the necessary
Program:
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.Grouped;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Produced;
import java.util.Arrays;
import java.util.Properties;
wordCounts.to("word-count-output",
Produced.with(org.apache.kafka.common.serialization.Serdes.String(),
org.apache.kafka.common.serialization.Serdes.Long()));
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
}
Output
12. Implement Kafka integration with the Hadoop ecosystem.
Procedure
Integrating Kafka with the Hadoop ecosystem allows for efficient ingestion, storage, and
analysis of streaming data. The common components involved in this integration include
Kafka, Flume, HDFS (Hadoop Distributed File System), HBase, Hive, and Spark. Here's a
high-level overview and some steps to set up Kafka integration with Hadoop:
1. Kafka Setup
First, set up Kafka by downloading it from the official website and installing it on your
system. Start the Kafka server and the ZooKeeper instance that Kafka relies on.
wget https://archive.apache.org/dist/kafka/2.8.0/kafka_2.12-2.8.0.tgz
tar -xzf kafka_2.12-2.8.0.tgz
cd kafka_2.12-2.8.0
2. Start ZooKeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
3. Start Kafka:
bin/kafka-server-start.sh config/server.properties
Apache Flume can be used to collect, aggregate, and move large amounts of log data from
different sources to a centralized data store. Flume can be configured to act as a Kafka
consumer, reading data from Kafka topics and writing it to HDFS or HBase.
Example configuration:
agent.sources = kafka-source
agent.sinks = hdfs-sink
agent.channels = mem-channel
agent.sources.kafka-source.type =
org.apache.flume.source.kafka.KafkaSource
agent.sources.kafka-source.kafka.bootstrap.servers = localhost:9092
agent.sources.kafka-source.kafka.topics = my-topic
agent.sources.kafka-source.kafka.consumer.group.id = flume-consumer-
group
agent.sources.kafka-source.kafka.consumer.auto.offset.reset =
earliest
agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.hdfs.path =
hdfs://namenode_host:8020/user/flume/events
agent.sinks.hdfs-sink.hdfs.fileType = DataStream
agent.sinks.hdfs-sink.hdfs.writeFormat = Text
agent.channels.mem-channel.type = memory
agent.channels.mem-channel.capacity = 10000
agent.channels.mem-channel.transactionCapacity = 1000
agent.sources.kafka-source.channels = mem-channel
agent.sinks.hdfs-sink.channel = mem-channel
3. Kafka to HBase
HBase can be used to store real-time data from Kafka. You can write a custom consumer or
use existing tools like Apache Storm or Spark Streaming to process the data and write it to
HBase.
1. Spark Streaming Setup: Use Spark Streaming to read data from Kafka and write it
to HBase.
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.kafka010._
ssc.start()
ssc.awaitTermination()
Hive can be used to query data stored in HDFS. You can create an external table in Hive to
point to the data location.
• Kafka Monitoring: Use tools like Kafka Manager, Burrow, or Grafana with Prometheus for
monitoring.
• HDFS and HBase Monitoring: Use Hadoop’s built-in monitoring tools or third-party tools.
• Log Management: Regularly check logs for any issues.