0% found this document useful (0 votes)

31 views33 pages

Hello

Uploaded by

sadhulamadhu12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views33 pages

Hello

Uploaded by

sadhulamadhu12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

DS506PC: ETL- KAFKA/TALEND

LIST OF EXPERIMENTS:
1. Install Apache Kafka on a single node.
2. Demonstrate setting up a single-node, single-broker Kafka cluster and show basic operations
such as creating topics and producing/consuming messages.
3. Extend the cluster to multiple brokers on a single node.
4. Write a simple Java program to create a Kafka producer and Produce messages to a topic.
5. Implement sending messages both synchronously and asynchronously in the producer.
6. Develop a Java program to create a Kafka consumer and subscribe to a topic and consume
messages.
7. Write a script to create a topic with specific partition and replication factor settings.
8. Simulate fault tolerance by shutting down one broker and observing the cluster behavior.
9. Implement operations such as listing topics, modifying configurations, and deleting topics.
10. Introduce Kafka Connect and demonstrate how to use connectors to integrate with external
systems.
11. Implement a simple word count stream processing application using Kafka Stream
12. Implement Kafka integration with the Hadoop ecosystem.
1.Install Apache Kafka on a single node.

Aim: ApacheInstalling Kafka into ubantu system

Program:

step 1 — Installing Java

Apache Kafka can be run on all platforms supported by Java. In order to set up Kafka on the
Ubuntu system, you need to install java first. As we know, Oracle java is now commercially

available, So we are using its open-source version OpenJDK.

sudo apt update

sudo apt install default-jdk
java --version

openjdk version "11.0.9.1"2020-11-04

OpenJDK Runtime Environment (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04)
OpenJDK64-Bit Server VM (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)

Step 2 — Download Latest Apache Kafka

Download the Apache Kafka binary files from its official download website. You can also

select any nearby mirror to download.

wget https://downloads.apache.org/kafka/3.4.0/kafka_2.12-3.4.0.tgz

Then extract the archive file

tarxzf kafka_2.12-3.4.0.tgz
sudomv kafka_2.12-3.4.0 /usr/local/kafka
Step 3 — Creating System Unit Files

Now, you need to create system unit files for the Zookeeper and Kafka services. Which will

help you to start/stop the Kafka service in an easy way.

nano /etc/systemd/system/zookeeper.service

And add the following content:

[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
ExecStart=/usr/local/kafka/bin/zookeeper-server-start.sh
/usr/local/kafka/config/zookeeper.properties
ExecStop=/usr/local/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save the file and close it.

Next, to create a system unit file for the Kafka service:

nano /etc/systemd/system/kafka.service
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service

[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64"
ExecStart=/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
ExecStop=/usr/local/kafka/bin/kafka-server-stop.sh

[Install]
WantedBy=multi-user.target

Reload the systemd daemon to apply new changes.

systemctl daemon-reload

Step 4 — Start Kafka and Zookeeper Service

First, you need to start the ZooKeeper service and then start Kafka. Use the systemctl

command to start a single-node ZooKeeper instance.

sudosystemctlstart zookeeper

Now start the Kafka server and view the running status:

sudosystemctl start kafka

sudosystemctlstatuskafka
All done. The Kafka installation has been successfully completed. The part of this tutorial will

help you to work with the Kafka server.

Step 5 — Create a Topic in Kafka

Kafka provides multiple pre-built shell scripts to work on it. First, create a topic named

“myTopic” with a single partition with a single replica:

cd /usr/local/kafka
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --
topic myTopic

Created topic myTopic.

The replication factor describes how many copies of data will be created. As we are running

with a single instance keep this value 1. Set the partition options as the number of brokers you

want your data to be split between. As we are running with a single broker keep this value 1.

You can create multiple topics by running the same command as above.

After that, you can see the created topics on Kafka by the running below command:

bin/kafka-topics.sh--list--bootstrap-server localhost:9092

2.Demonstrate setting up a single-node, single-broker Kafka cluster and

show basic operations such as creating topics and producing/consuming
messages.
program

To set up a Kafka cluster, you will need to follow these general steps:

1. Install Kafka on all nodes of the cluster. You can download Kafka from the Apache
Kafka website.
2. Configure the server.properties file on each node to specify the broker ID, the
ZooKeeper connection string, and other properties.
3. Start the ZooKeeper service on each node. This is required for Kafka to function.
4. Start the Kafka brokers on each node by running the kafka-server-start command and
specifying the location of the server.properties file.
5. Test the cluster by creating a topic, producing and consuming messages, and verifying
that they are replicated across all nodes.

Here is a more detailed guide to follow:

broker.id=1listeners=PLAINTEXT://localhost:9092num.partitions=3
log.dirs=/tmp/kafka-logs-1zookeeper.connect=localhost:2181broker.id=2
listeners=PLAINTEXT://localhost:9093 num.partitions=3 log.dirs=/tmp/kafka-logs-2
zookeeper.connect=localhost:2181 broker.id=3 listeners=PLAINTEXT://localhost:9094
num.partitions=3 log.dirs=/tmp/kafka-logs-3 zookeeper.connect=localhost:2181

In this example, each broker has a unique broker.id and listens on a different port for client
connections. The num.partitions property specifies the default number of partitions for new
topics, and log.dirs specifies the directory where Kafka should store its data on disk.
zookeeper.connect specifies the ZooKeeper connection string, which should point to the
ZooKeeper ensemble.

1. Start the ZooKeeper service on each node. This is required for Kafka to function. You
can start ZooKeeper by running the following command:

bin/zookeeper-server-start.shconfig/zookeeper.properties

This will start a single-node ZooKeeper instance using the default configuration.

1. Start the Kafka brokers on each node by running the kafka-server-start command and
specifying the location of the server.properties file. For example:

bin/kafka-server-start.shconfig/server.properties

This will start the Kafka broker on the default port (9092) using the configuration in
config/server.properties.

1. Test the cluster by creating a topic, producing and consuming messages, and verifying
that they are replicated across all nodes. You can use the kafka-topics, kafka-console-
producer, and kafka-console-consumer command-line tools to perform these tasks.
For example:
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --
partitions 3 --topic my-topic bin/kafka-console-producer.sh --broker-list
localhost:9092,localhost:9093,localhost:9094 --topic my-topic bin/kafka-console-
consumer.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --topic my-
topic --from-beginning

These commands will create a topic with three partitions and three replicas, produce messages
to the topic, and consume them from all three brokers. You can verify that the messages are
replicated across all nodes by stopping one of the brokers and observing that the other brokers
continue to serve messages.

3. Extend the cluster to multiple brokers on a single node.

Multiple Brokers in Kafka
→ To start a multiple what needs to be done.
• Create new server.properties files for every new broker.
Example: Previous port no. was 9092 and broker-id was 0, Kafka log directory was kafka-
logs
Setting up a cluster (configuration)
• New server.properties files with the new broker details.

Example: server.properties
broker.id=1
listeners=PLAINTEXT://localhost:9093
log.dirs=c:/kafka/kafka-logs-1
auto.create.topics.enable=false (optional)
Creating new Broker-1
Follow these steps to add a new broker.
Do the following changes in the file.
1. change id to 1

2. Changing port no. to 9093 and auto-create to false

3. change log directory to Kafka-log-1

Creating new Broker-2

Please follow to set up a new broker-2
Edit: server-2.properties
broker.id=2
listeners=PLAINTEXT://localhost:9094
log.dirs=c:/kafka/kafka-logs-2
auto.create.topics.enable=false
Starting up these 2 Kafka brokers
Note: Please keep your existing Kafka broker and Zookeeper running.
1. starting the first broker
.\bin\windows\kafka-server-start.bat .\config\server-1.properties
2. starting the second broker
.\bin\windows\kafka-server-start.bat .\config\server-2.properties
Kafka Cluster
→ So we have successfully started 3 Kafka brokers and now we have a Kafka cluster that is
up and running in our machine with 3 brokers.

Running 3 brokers simultaneously.

Creating new Topic
It's time to create a new topic, then we will produce and consume the messages with our new
cluster setup.
.\bin\windows\kafka-topics.bat --create --topic test-topic-replicated -zookeeper localhost:2181
--replication-factor 3 --partitions 3

a new topic is created

• The --replication-factor 3 is used here and normally it is recommended to use if you are
using a cluster setup and this value will be either equal to or less than brokers that you
have in a Kafka cluster. Here we have 3 brokers right? So I am going to replication-
factor 3
• We will topic name to test-topic-replicated from test-topic
• The partition we will keep is partitions 3 , we are just keeping in sync with the numbers
of brokers that we have. It doesn't matter you can have n number of values you have, I
am just giving the partition value 3 here.
Produce the messages using console producer.
.\bin\windows\kafka-console-producer.bat --broker-list localhost:9092 --topic test-topic-
replicated

message sent: Hi
Instantiate a new Consumer to receive the messages.
.\bin\windows\kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic test-
topic-replicated --from-beginning

message received: Hi
Now whatever message we have sent is received to console consumers. Now the interesting
part is that we have 3 new Kafka folders right? Let’s go ahead and check that what we have in
it.
Log directories
• close the producer console now and you know have created a kafka-logs-1 and kafka-
logs-2 directories are created.

• Now each broker got a new folder and that is where it is actually persisting all the
messages that are produced to a particular broker. So we have three different directories
for each and every broker.
Conclusion: we have successfully a Kafka Cluster with 3 brokers and created a topic in the
cluster and successfully produced and consumed the messages into the Kafka cluster.

4. Write a simple Java program to create a Kafka producer and Produce

messages to a topic.

re-requisites for Kafka Programming with Java

• Installing Kafka (including the part about installing the Java 11 JDK)
• Preferred: install IntelliJ Community Edition

Kafka Programming Activities

In this section, we'll use Java programming language to programmatically replicate
what we were able to achieve with Kafka CLI.

The following tutorials are recommended:

• Creating a Kafka Project base with (whichever you prefer):

o Maven
o Gradle
• Complete Kafka Producer
• Complete Kafka Consumer

Maven is a popular choice for Kafka projects in Java.

Before developing Kafka producers and consumers in Java, we'll have to set up a
simple Kafka Java project that includes common dependencies that we'll need,
namely:

• Kafka dependencies

• Logging dependencies

Follow these steps to create a Java project with the above dependencies.

Creating a Maven project with pom.xml and setting up dependencies

In IntelliJ IDEA, create a new Java maven project (File > New > Project)
Then add your Maven project attributes

The build tool Maven contains a pom.xml file. The pom.xml is a default XML file that
carries all the information regarding the GroupID, ArtifactID, as well as the Version
values. The user needs to define all the necessary project dependencies in
the pom.xml file. Go to the pom.xml file.
pom.xl

Define the Kafka Dependencies. Create a <dependencies>...</dependencies> block

within which we will define the required dependencies.
Add a dependency for Kafka client as shown below

5
6

...

<groupId>org.apache.kafka</groupId>

<artifactId>kafka-clients</artifactId>

</dependency>

</dependencies>

</project>

If the version number appears red in color, it means the user missed to enable the
'Auto-Import' option. If so, go to View > Tool Windows > Maven. A Maven
Projects Window will appear on the right side of the screen. Click on the 'Refresh'
button appearing right there. This will enable the missed Auto-Import Maven
Projects. If the color changes to black, it means the missed dependency is
downloaded.
Add another dependency for logging. This will enable us to print diagnostic logs
while our application runs.

</dependency>

<artifactId>slf4j-simple</artifactId>

</dependency>

Now, we have set all the required dependencies. Let's try the Simple Hello
World example.

Creating your first class

Create a java package say, io.conduktor.demos.kafka.HelloWorld

While creating the java package, follow the package naming conventions. Finally,
create the sample application program as shown below.

10
11

packageio.conduktor.demos.kafka;

importorg.slf4j.Logger;

importorg.slf4j.LoggerFactory;

publicclassHelloWorld{

privatestaticfinalLogger log =LoggerFactory.getLogger(HelloWorld.class);

publicstaticvoidmain(String[]args){

log.info("Hello World");

Run the application (the play green button on line 9 in the screenshot below) and
verify that it runs and prints the message, and exits with code 0. This means that your
Java application has run successfully.
Expand the 'External Libraries' on the Project panel and verify that it displays the
dependencies that we added for the project in pom.xml.
We have created a sample Java project that includes all the needed dependencies.
This will form the basis for creating Java producers and consumers next.

5. Implement sending messages both synchronously and asynchronously in the

producer.
To implement sending messages both synchronously and asynchronously in a Kafka producer
using IntelliJ IDEA, you would first need to set up a Maven project with the Kafka dependency.
Then, you can create a Java class for the producer and implement both synchronous and
asynchronous message sending. Below is an example with step-by-step instructions:
Step 1: Set up a Maven project in IntelliJ IDEA.
• Open IntelliJ IDEA and create a new project.
• Choose "Maven" as the project type.
• Configure the project settings and click "Finish".
Step 2: Add Kafka dependency to your Maven pom.xml file.
• Add the following dependency for Kafka in the pom.xml file:

<groupId>org.apache.kafka</groupId>

<artifactId>kafka-clients</artifactId>

</dependency>

Step 3: Create a Java class for the Kafka producer.

• Right-click on the src/main/java directory in IntelliJ IDEA.
• Select "New" > "Java Class" and name it KafkaProducerExample.
Step 4: Implement synchronous and asynchronous message sending in the
KafkaProducerExample class
PROGRAM
import org.apache.kafka.clients.producer.*;

import java.util.Properties;
import java.util.concurrent.ExecutionException;

public class KafkaProducerExample {

private static final String TOPIC_NAME = "test-topic";

private static final String BOOTSTRAP_SERVERS = "localhost:9092";

public static void main(String[] args) {

Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
BOOTSTRAP_SERVERS);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer");

// Create KafkaProducer instance

KafkaProducer<String, String> producer = new KafkaProducer<>(props);

String message = "Hello, Kafka!";

// Synchronous message sending

try {
ProducerRecord<String, String> record = new ProducerRecord<>(TOPIC_NAME,
message);
producer.send(record).get(); // Wait for acknowledgment
System.out.println("Message sent synchronously successfully.");
} catch (InterruptedException | ExecutionException e) {
System.err.println("Error sending message synchronously: " + e.getMessage());
}

// Asynchronous message sending

ProducerRecord<String, String> record = new ProducerRecord<>(TOPIC_NAME,
message);
producer.send(record, new Callback() {
@Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception == null) {
System.out.println("Message sent asynchronously successfully.");
} else {
System.err.println("Error sending message asynchronously: " +
exception.getMessage());
}
}
});

// Flush and close the producer

producer.flush();
producer.close();
}
}

OUTPUT
6.Develop a Java program to create a Kafka consumer and subscribe
to a topic and consume messages

DEPENDENCIES TO ADD IN MAVEN

<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.0.0</version>
</dependency>

PROGRAM
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Collections;
import java.util.Properties;

public class KafkaConsumerExample {

public static void main(String[] args) {

Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "test-group");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

consumer.subscribe(Collections.singletonList("test-topic"));

try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
records.forEach(record -> {
System.out.printf("Consumed message: key=%s, value=%s%n", record.key(),
record.value());
});
}
} finally {
consumer.close();
}
}
}

OUTPUT

7. Write a script to create a topic with specific partition and replication factor settings.

Below is a script written in Scala for creating a Kafka topic with specific partition and replication
factor settings. This script can be executed in IntelliJ IDEA with the Kafka dependencies
included in the project.

Program

import java.util.Properties
import org.apache.kafka.clients.admin.{AdminClient, NewTopic}
import scala.collection.JavaConverters._

object KafkaTopicCreator {
def main(args: Array[String]): Unit = {
// Kafka broker properties
val props = new Properties()
props.put("bootstrap.servers", "localhost:9092")

// Create AdminClient
valadminClient = AdminClient.create(props)

// Define topic configurations

valtopicName = "test-topic"
val partitions = 3
valreplicationFactor = 2

// Create NewTopic instance

valnewTopic = new NewTopic(topicName, partitions, replicationFactor.toShort)

// Create topic
val results = adminClient.createTopics(List(newTopic).asJava)
results.values().asScala.foreach { (topicName, future) =>
try {
future.get()
println(s"Topic $topicName created successfully.")
} catch {
case e: Exception =>
println(s"Failed to create topic $topicName: ${e.getMessage}")
}
}

// Close AdminClient
adminClient.close()
}
}

To run this script in IntelliJ IDEA, follow these steps:

1. Open IntelliJ IDEA and create a new Scala project.
2. Add Kafka dependencies to your project. You can do this by adding the following lines
to your build.sbt file:

libraryDependencies += "org.apache.kafka" % "kafka-clients" % "3.0.0"

libraryDependencies += "org.apache.kafka" % "kafka-streams" % "3.0.0"

3. Create a new Scala file (e.g., KafkaTopicCreator.scala) and paste the script into it.
4. Make sure your Kafka broker is running on localhost:9092.
5. Run the KafkaTopicCreator object in IntelliJ IDEA.

OUTPUT

We should see the output indicating whether the topic creation was successful or not.

9. Simulate fault tolerance by shutting down one broker and observing the cluster behavior.

To simulate fault tolerance by shutting down one broker and observing the cluster behavior in
IntelliJ, you'll need to set up a Kafka cluster and create a sample producer and consumer
application. Then, you'll shut down one of the brokers to observe the behavior. Here's a step-
by-step example:
1. Set Up Kafka Cluster:
• Ensure you have Kafka installed and configured with multiple brokers. You can
refer to the Kafka documentation for detailed instructions.
2. Create a Topic:
• Let's assume we have a topic named "test_topic" with a replication factor of 3
and 3 partitions. Run the following command in your Kafka installation
directory:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 3 --topic
test_topic
3. Create IntelliJ Project:
• Create a new Maven or Gradle project in IntelliJ.
• Add Kafka dependencies to your pom.xml or build.gradle.
4. Producer Application:
• Create a Java class for the producer application. This application will send
messages to the Kafka topic.

Producer program

import org.apache.kafka.clients.producer.*;

import java.util.Properties;

public class KafkaProducerExample {

public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");

Producer<String, String> producer = new KafkaProducer<>(props);

String topic = "test_topic";

try {
for (int i = 0; i< 10; i++) {
String message = "Message " + i;
producer.send(new ProducerRecord<>(topic, Integer.toString(i), message));
System.out.println("Sent message: " + message);
Thread.sleep(1000);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
producer.close();
}
}
}

5. Consumer Application:
• Create a Java class for the consumer application. This application will consume
messages from the Kafka topic.

Consumer code
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Collections;
import java.util.Properties;

public class KafkaConsumerExample {

public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("group.id", "test_group");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

String topic = "test_topic";

consumer.subscribe(Collections.singletonList(topic));

try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String>record : records) {
System.out.printf("Received message: offset = %d, key = %s, value = %s%n",
record.offset(), record.key(), record.value());
}
}
} finally {
consumer.close();
}
}
}
6. Run Applications:
• Run the producer application and then the consumer application in IntelliJ.
7. Observe Behavior:
• While both producer and consumer are running, shut down one of the Kafka
brokers in your Kafka cluster. You can do this by stopping the Kafka process
associated with that broker.
Output:

Observe how the consumer continues to receive messages without interruption despite the
broker shutdown. Kafka automatically handles the fault tolerance by reassigning partitions to
the remaining brokers.
We can monitor the logs in IntelliJ to see how Kafka handles the failure and reassignment of
partitions.

8. Implement operations such as listing topics, modifying configurations,

and deleting topics.

To implement operations such as listing topics, modifying configurations, and deleting topics
in IntelliJ IDEA, you would typically interact with Apache Kafka, a distributed streaming
platform. Here's a step-by-step guide on how to perform these operations using the Kafka
command line tools (kafka-topics.sh) within IntelliJ IDEA:
1. Setting up Kafka in IntelliJ IDEA:
• First, make sure you have Apache Kafka installed and running on your local
machine or on a server accessible from IntelliJ IDEA.
• Open your IntelliJ IDEA project.
2. Create a new Kotlin/Java file:
• Right-click on your project folder in the project explorer.
• Select "New" -> "Kotlin File/Java Class" to create a new Kotlin/Java file.
3. List Topics:
• To list topics, you can use the Kafka command-line tool kafka-topics.sh.
• Execute the following Kotlin/Java code to list topics:

Listing Topic

import java.io.BufferedReader
import java.io.InputStreamReader

fun main() {
val runtime = Runtime.getRuntime()
val process = runtime.exec("/path/to/kafka/bin/kafka-topics.sh --list --bootstrap-server
localhost:9092")

val reader = BufferedReader(InputStreamReader(process.inputStream))

var line: String?

while (reader.readLine().also { line = it } != null) {

println(line)
}
}

Replace /path/to/kafka/bin/kafka-topics.sh with the actual path to kafka-topics.sh script in your

Kafka installation directory.
4. Modify Configurations:
• To modify configurations, you can use the Kafka command-line tool kafka-
configs.sh.
• Execute the following Kotlin/Java code to modify configurations:
Modify Configaration:

import java.io.BufferedReader
import java.io.InputStreamReader

fun main() {
valtopicName = "your_topic_name"
valconfigKey = "compression.type"
valconfigValue = "gzip"

val runtime = Runtime.getRuntime()

val process = runtime.exec("/path/to/kafka/bin/kafka-configs.sh --bootstrap-server
localhost:9092 --entity-type topics --entity-name $topicName --alter --add-config
$configKey=$configValue")

val reader = BufferedReader(InputStreamReader(process.inputStream))

var line: String?

while (reader.readLine().also { line = it } != null) {

println(line)
}
}

Replace your_topic_name with the name of the topic you want to modify, and
/path/to/kafka/bin/kafka-configs.sh with the actual path to kafka-configs.sh script.
5. Delete Topics:
• To delete topics, you can use the Kafka command-line tool kafka-topics.sh.
• Execute the following Kotlin/Java code to delete topics:

Delete Topics

import java.io.BufferedReader
import java.io.InputStreamReader

fun main() {
valtopicName = "your_topic_name"

val runtime = Runtime.getRuntime()

val process = runtime.exec("/path/to/kafka/bin/kafka-topics.sh --bootstrap-server
localhost:9092 --delete --topic $topicName")

val reader = BufferedReader(InputStreamReader(process.inputStream))

var line: String?

while (reader.readLine().also { line = it } != null) {

println(line)
}
}

Replace your_topic_name with the name of the topic you want to delete, and
/path/to/kafka/bin/kafka-topics.sh with the actual path to kafka-topics.sh script.
6. Run the code:
• Run the Kotlin/Java file in IntelliJ IDEA.
• You should see the output in the console showing the list of topics,
configuration modification status, or topic deletion status.

Make sure we have appropriate permissions and the Kafka server is running when executing
these commands. Additionally, replace placeholders such as /path/to/kafka/bin/ and
localhost:9092 with actual paths and addresses relevant to your Kafka setup.

10. Introduce Kafka Connect and demonstrate how to use connectors to integrate with
externalsystems.

Kafka Connect is a framework that provides scalable and reliable streaming data integration
between Apache Kafka and other systems. It simplifies the process of building and managing
connectors to move data in and out of Kafka.
To demonstrate how to use Kafka Connect and connectors to integrate with external systems,
let's walk through an example of setting up a simple connector to move data from a CSV file
to a Kafka topic. We'll use IntelliJ IDEA as our IDE.
Step 1: Setup Kafka and Kafka Connect
Ensure you have Apache Kafka installed and running on your local machine. Additionally,
you'll need to have Kafka Connect installed. You can find installation instructions in the
Apache Kafka documentation.
Step 2: Create a Kafka Connector Configuration File
Create a JSON configuration file for your Kafka connector. For this example, let's call it csv-

Program

source-connector.json:

{
"name": "csv-source-connector",
"config": {
"connector.class": "FileStreamSource",
"tasks.max": "1",
"file": "<path_to_your_csv_file>",
"topic": "csv-topic"
}
}

Replace <path_to_your_csv_file> with the path to your CSV file.

Step 3: Start Kafka Connect

Start Kafka Connect with the following command:

./bin/connect-standalone.sh config/connect-standalone.properties csv-source-connector.json

./bin/connect-standalone.sh config/connect-standalone.properties csv-source-con

This command assumes you're using the standalone mode of Kafka Connect. Adjust the paths
as necessary for your setup.

Step 4: Verify Connector Status

Once Kafka Connect is running, you can verify the status of your connector using the following
command:
curl localhost:8083/connectors/csv-source-connector/status

tep5: Produce and Consume Data

Now, let's produce some data to the CSV file and consume it from the Kafka topic.
Example CSV File (data.csv):

OUTPUT

11. Implement a simple word count stream processing application using Kafka Stream

To implement a simple word count stream processing application using Kafka Streams in

IntelliJ IDEA, you'll first need to set up a Kafka cluster. You can use Docker to set up a local

Kafka cluster quickly. Then, you'll create a Maven project in IntelliJ IDEA and add the necessary

dependencies for Kafka Streams.

Program:
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.Grouped;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Produced;
import java.util.Arrays;
import java.util.Properties;

public class WordCountApp {

public static void main(String[] args) {
Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, "word-count-application");
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");

StreamsBuilder builder = new StreamsBuilder();

KStream<String, String>textLines = builder.stream("word-count-topic",
Consumed.with(
org.apache.kafka.common.serialization.Serdes.String(),
org.apache.kafka.common.serialization.Serdes.String()
));
KStream<String, Long>wordCounts = textLines
.flatMapValues(value ->Arrays.asList(value.toLowerCase().split("\\W+")))
.groupBy((key, word) -> word)
.count(Grouped.with(org.apache.kafka.common.serialization.Serdes.String(),
org.apache.kafka.common.serialization.Serdes.Long()))
.toStream();

wordCounts.to("word-count-output",
Produced.with(org.apache.kafka.common.serialization.Serdes.String(),
org.apache.kafka.common.serialization.Serdes.Long()));

KafkaStreams streams = new KafkaStreams(builder.build(), config);

streams.start();

Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
}

Output
12. Implement Kafka integration with the Hadoop ecosystem.

Procedure

Integrating Kafka with the Hadoop ecosystem allows for efficient ingestion, storage, and
analysis of streaming data. The common components involved in this integration include
Kafka, Flume, HDFS (Hadoop Distributed File System), HBase, Hive, and Spark. Here's a
high-level overview and some steps to set up Kafka integration with Hadoop:

1. Kafka Setup

First, set up Kafka by downloading it from the official website and installing it on your
system. Start the Kafka server and the ZooKeeper instance that Kafka relies on.

1. Download Kafka and Extract it:

wget https://archive.apache.org/dist/kafka/2.8.0/kafka_2.12-2.8.0.tgz
tar -xzf kafka_2.12-2.8.0.tgz
cd kafka_2.12-2.8.0

2. Start ZooKeeper:

bin/zookeeper-server-start.sh config/zookeeper.properties

3. Start Kafka:

bin/kafka-server-start.sh config/server.properties

2. Data Ingestion with Flume

Apache Flume can be used to collect, aggregate, and move large amounts of log data from
different sources to a centralized data store. Flume can be configured to act as a Kafka
consumer, reading data from Kafka topics and writing it to HDFS or HBase.

Flume Configuration for Kafka to HDFS

1. Flume Agent Configuration: Create a Flume configuration file (flume.conf) to

define sources, sinks, and channels.

Example configuration:

agent.sources = kafka-source
agent.sinks = hdfs-sink
agent.channels = mem-channel

agent.sources.kafka-source.type =
org.apache.flume.source.kafka.KafkaSource
agent.sources.kafka-source.kafka.bootstrap.servers = localhost:9092
agent.sources.kafka-source.kafka.topics = my-topic
agent.sources.kafka-source.kafka.consumer.group.id = flume-consumer-
group
agent.sources.kafka-source.kafka.consumer.auto.offset.reset =
earliest

agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.hdfs.path =
hdfs://namenode_host:8020/user/flume/events
agent.sinks.hdfs-sink.hdfs.fileType = DataStream
agent.sinks.hdfs-sink.hdfs.writeFormat = Text

agent.channels.mem-channel.type = memory
agent.channels.mem-channel.capacity = 10000
agent.channels.mem-channel.transactionCapacity = 1000

agent.sources.kafka-source.channels = mem-channel
agent.sinks.hdfs-sink.channel = mem-channel

2. Start the Flume Agent:

bin/flume-ng agent --conf conf --conf-file flume.conf --name agent

3. Kafka to HBase

HBase can be used to store real-time data from Kafka. You can write a custom consumer or
use existing tools like Apache Storm or Spark Streaming to process the data and write it to
HBase.

Using Spark Streaming

1. Spark Streaming Setup: Use Spark Streaming to read data from Kafka and write it
to HBase.

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.kafka010._

val conf = new

SparkConf().setAppName("KafkaToHBase").setMaster("local[*]")
val ssc = new StreamingContext(conf, Seconds(10))

val kafkaParams = Map[String, Object](

"bootstrap.servers" -> "localhost:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "spark-consumer-group",
"auto.offset.reset" -> "latest"
)

val topics = Array("my-topic")

val stream = KafkaUtils.createDirectStream[String, String](
ssc,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe[String, String](topics, kafkaParams)
)

stream.foreachRDD { rdd =>

rdd.foreach { record =>
// Process each record and write to HBase
}
}

ssc.start()
ssc.awaitTermination()

4. Querying Data with Hive

Hive can be used to query data stored in HDFS. You can create an external table in Hive to
point to the data location.

1. Create Hive Table:

CREATE EXTERNAL TABLE kafka_events (

key STRING,
value STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION 'hdfs://namenode_host:8020/user/flume/events';

2. Query the Data:

SELECT * FROM kafka_events WHERE ...;

5. Monitoring and Maintenance

• Kafka Monitoring: Use tools like Kafka Manager, Burrow, or Grafana with Prometheus for
monitoring.
• HDFS and HBase Monitoring: Use Hadoop’s built-in monitoring tools or third-party tools.
• Log Management: Regularly check logs for any issues.

GRADE 4 TERM 1 TEST MATHEMATICS MEMO (Final)
100% (4)
GRADE 4 TERM 1 TEST MATHEMATICS MEMO (Final)
5 pages
Apache Kafka Tutorial
100% (3)
Apache Kafka Tutorial
61 pages
Principles of Programming Languages: Eliezer A. Albacea
No ratings yet
Principles of Programming Languages: Eliezer A. Albacea
68 pages
Apache Kafka
No ratings yet
Apache Kafka
32 pages
Apache Kafka Cookbook - Sample Chapter
100% (1)
Apache Kafka Cookbook - Sample Chapter
14 pages
Apache Kafka Tutorial
No ratings yet
Apache Kafka Tutorial
6 pages
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
100% (1)
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
23 pages
Kafka Secuirty
No ratings yet
Kafka Secuirty
4 pages
Apache Kafka - Basic Operations
No ratings yet
Apache Kafka - Basic Operations
6 pages
Real Time Analytics With Apache Kafka and Spark: Rahul Jain
No ratings yet
Real Time Analytics With Apache Kafka and Spark: Rahul Jain
54 pages
Automatic Transfer System Explained in Details Part 1
100% (1)
Automatic Transfer System Explained in Details Part 1
4 pages
Kafka Zookeeper Setup
No ratings yet
Kafka Zookeeper Setup
9 pages
Apache Kafka 101
No ratings yet
Apache Kafka 101
25 pages
Apache Kafka Tutorial PDF
No ratings yet
Apache Kafka Tutorial PDF
13 pages
Getting Started With Apache Kafka in Python - Towards Data Science PDF
No ratings yet
Getting Started With Apache Kafka in Python - Towards Data Science PDF
17 pages
Automated Warehouse PDF
No ratings yet
Automated Warehouse PDF
345 pages
Metrics Commands
No ratings yet
Metrics Commands
5 pages
Kafka
No ratings yet
Kafka
10 pages
and Change Required Properties
No ratings yet
and Change Required Properties
2 pages
Apache Kafka Installation
No ratings yet
Apache Kafka Installation
3 pages
Apache Kafka Installation: Step 1: Download The Code
No ratings yet
Apache Kafka Installation: Step 1: Download The Code
3 pages
Apache Kafka Confluent Enterprise Ref Architecture
No ratings yet
Apache Kafka Confluent Enterprise Ref Architecture
17 pages
Confluent Platform Reference Architecture Kubernetes
No ratings yet
Confluent Platform Reference Architecture Kubernetes
10 pages
DSLab1-Kafka Introduction New
No ratings yet
DSLab1-Kafka Introduction New
14 pages
Kafka Deployment Aks
No ratings yet
Kafka Deployment Aks
4 pages
Apache Kafka
No ratings yet
Apache Kafka
13 pages
Lab-Kafka Administration VI
100% (1)
Lab-Kafka Administration VI
197 pages
L3 PM Project Planning
No ratings yet
L3 PM Project Planning
10 pages
Final Manual Practical 10 de
No ratings yet
Final Manual Practical 10 de
2 pages
EC2 Instance Setup For Kafka
No ratings yet
EC2 Instance Setup For Kafka
25 pages
Kafka Lab Manual - 3 Experiments
No ratings yet
Kafka Lab Manual - 3 Experiments
15 pages
Apache Kafka
No ratings yet
Apache Kafka
38 pages
Kafka Cluster
No ratings yet
Kafka Cluster
11 pages
BDA Lab A7
No ratings yet
BDA Lab A7
10 pages
Kafka 1
No ratings yet
Kafka 1
10 pages
Learn Cyber Security
No ratings yet
Learn Cyber Security
1 page
1.6.2 Packet Tracer - Configure Basic Router Settings - Physical Mode
No ratings yet
1.6.2 Packet Tracer - Configure Basic Router Settings - Physical Mode
5 pages
Slide 13 - Kafka
No ratings yet
Slide 13 - Kafka
109 pages
Datasheet - LPS8 - LoRaWAN Pico Station
No ratings yet
Datasheet - LPS8 - LoRaWAN Pico Station
1 page
Chapter 6
No ratings yet
Chapter 6
16 pages
Publish Subscribe
No ratings yet
Publish Subscribe
19 pages
Lab Manual - ETL-KAFKA-TALEND
No ratings yet
Lab Manual - ETL-KAFKA-TALEND
7 pages
How To Install Kafka On Windows Machine
No ratings yet
How To Install Kafka On Windows Machine
7 pages
Kafka Notes
No ratings yet
Kafka Notes
7 pages
Kafka Config
No ratings yet
Kafka Config
11 pages
AK
No ratings yet
AK
22 pages
Kafka Installation
No ratings yet
Kafka Installation
3 pages
List of SOD Conflicts4
No ratings yet
List of SOD Conflicts4
1 page
Part4 F
No ratings yet
Part4 F
26 pages
How To Install Apache Kafka Using Docker - The Easy Way - by Dario Radečić - Sep, 2021 - Towards Data Science
No ratings yet
How To Install Apache Kafka Using Docker - The Easy Way - by Dario Radečić - Sep, 2021 - Towards Data Science
13 pages
Kafka
No ratings yet
Kafka
33 pages
Etransfer KPESE Manual Version 1.0 PDF
No ratings yet
Etransfer KPESE Manual Version 1.0 PDF
16 pages
Levelling Activity - Represent Shape (SOLUTIONS)
No ratings yet
Levelling Activity - Represent Shape (SOLUTIONS)
9 pages
Asignment Kafka 24025010
No ratings yet
Asignment Kafka 24025010
3 pages
Session 11
No ratings yet
Session 11
18 pages
Solution Analyzing A Forecasting Data Source
100% (1)
Solution Analyzing A Forecasting Data Source
5 pages
KAFKAExample 2
No ratings yet
KAFKAExample 2
12 pages
CCS 3201 Software Engineering
No ratings yet
CCS 3201 Software Engineering
3 pages
IT Reviewer
No ratings yet
IT Reviewer
13 pages
Group 4 Review 1-1
No ratings yet
Group 4 Review 1-1
14 pages
Lab 07
No ratings yet
Lab 07
4 pages
Apache Kafka - Setup
No ratings yet
Apache Kafka - Setup
5 pages
1646412329504-CCDAK Study Guide
No ratings yet
1646412329504-CCDAK Study Guide
56 pages
User-Defined Functions in C
No ratings yet
User-Defined Functions in C
6 pages
Judge Eloida R. de Leon-Diaz v. Atty. Ronaldo Antonio v. Calayan Almacen PDF Attorney-Client Privilege Notary Public
No ratings yet
Judge Eloida R. de Leon-Diaz v. Atty. Ronaldo Antonio v. Calayan Almacen PDF Attorney-Client Privilege Notary Public
1 page
Kafka Installation Documentation
No ratings yet
Kafka Installation Documentation
3 pages
Scout Engineering 3
No ratings yet
Scout Engineering 3
14 pages
Zen Kafka
No ratings yet
Zen Kafka
14 pages
Automatic Plant Irrigation System
No ratings yet
Automatic Plant Irrigation System
7 pages
Kafka
No ratings yet
Kafka
15 pages
CIS612 Kafka Installation Ubuntu
No ratings yet
CIS612 Kafka Installation Ubuntu
14 pages
Kafka Interview Guide
No ratings yet
Kafka Interview Guide
4 pages
w17 Kafka Runningnotes-210309-183000
No ratings yet
w17 Kafka Runningnotes-210309-183000
20 pages
AWS Helper
No ratings yet
AWS Helper
67 pages
YUER™ NEW Mini LED Moving Head Light 150W Beam+S
No ratings yet
YUER™ NEW Mini LED Moving Head Light 150W Beam+S
1 page
Apache Kafka Quickstart
No ratings yet
Apache Kafka Quickstart
9 pages
安装Kafka md
No ratings yet
安装Kafka md
2 pages
Kafka
No ratings yet
Kafka
20 pages
Aakriti Thakur Resume
No ratings yet
Aakriti Thakur Resume
1 page
Kafka Commands
No ratings yet
Kafka Commands
1 page
Mad Micro Project
No ratings yet
Mad Micro Project
16 pages
Analysis of Netcode, Latency, and Packet-Loss in Online Multiplayer Games
No ratings yet
Analysis of Netcode, Latency, and Packet-Loss in Online Multiplayer Games
5 pages
W2-EX RA0 6 Solutions
No ratings yet
W2-EX RA0 6 Solutions
24 pages
07 ApacheKafkaonAzure
No ratings yet
07 ApacheKafkaonAzure
2 pages
Publisher Subscriber Based Messaging System - Demo
No ratings yet
Publisher Subscriber Based Messaging System - Demo
12 pages
Section 1 Technical Specifications - Electrical SMP - Rev.6
No ratings yet
Section 1 Technical Specifications - Electrical SMP - Rev.6
147 pages
BOBCAT S205 SKID STEER LOADER Service Repair Manual Instant Download (SN 530511001-530559999)
No ratings yet
BOBCAT S205 SKID STEER LOADER Service Repair Manual Instant Download (SN 530511001-530559999)
32 pages