0% found this document useful (0 votes)

201 views62 pages

Lab-Confluence Kafka KSQL II

The document discusses setting up Kafka, KSQL, and related tools. It covers installing Confluent Platform locally, generating sample data using Kafka Connect, and interacting with topics and streams using KSQL. Example commands and configurations are provided to stand up the environment and demonstrate functionality like creating topics, streams and tables, and writing queries in KSQL.

Uploaded by

Vaibhav Marathe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

201 views62 pages

Lab-Confluence Kafka KSQL II

Uploaded by

Vaibhav Marathe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

1 Kafka – Dev Ops

1. Prerequisite ......................................................................................................................... 3
2. Install KSQL DB – 60 Minutes(D) ................................................................................... 4
4. Installing Confluent Kafka (Local) – 60 Minutes ...........................................................10
5. Workflow using KSQL - CLI – 90 Minutes(D) ............................................................. 26
6. Errors .............................................................................................................................. 46
I. LEADER_NOT_AVAILABLE ........................................................................................ 46
java.util.concurrent.ExecutionException: ........................................................................... 46
7. Annexure Code: .............................................................................................................. 50
II. DumplogSegment ........................................................................................................ 50
III. Data Generator – JSON .............................................................................................. 52
IV. Resources ..................................................................................................................... 62

First Round Done – Need some find tuning

Date : 23th Oct 2022.
2 Kafka – Dev Ops

https://docs.confluent.io/current/ksql/docs/tutorials/examples.html#ksql-examples
3 Kafka – Dev Ops

1. Prerequisite

JDk : 1.11
Confluent : 7.3.3
Kafka : 3.11
Registry & KSQL DB
4 Kafka – Dev Ops

2. Install KSQL DB – 60 Minutes(D)

Prerequisite: Kafka Node installation.

And kafka registry, required for Avro integration.
Update kafka server.properties with the following entries.

#vi /opt/kafka/config/server.properties
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
offsets.topic.replication.factor=1

Restart the kafka broker.

Get standalone ksqlDB

Since ksqlDB runs natively on Apache Kafka®, you'll need to have a Kafka cluster that ksqlDB is
configured to use. Use the steps to the right to install the latest release of ksqlDB.
# Download the archive and its signature

curl http://ksqldb-packages.s3.amazonaws.com/archive/0.23/confluent-ksqldb-0.23.1.tar.gz --output

confluent-ksqldb-0.23.1.tar.gz
5 Kafka – Dev Ops

# Extract the tarball to the directory of your choice

#tar -xf confluent-ksqldb-0.23.1.tar.gz -C /opt/

#mv confluent-ksq* ksqldb

Configure ksqlDB server

Ensure your ksqlDB server has network connectivity to Kafka.

Edit the highlighted line in /opt/ksqldb/etc/ksqldb/ksql-server.properties to match your Kafka

hostname and port.

#------ Kafka -------

# The set of Kafka brokers to bootstrap Kafka cluster information from:

bootstrap.servers=kafka0:9092

# Enable snappy compression for the Kafka producers

compression.type=snappy

To enable Schema Registry Add the following line at the end of the configuration file.
#------ Schema Registry -------
6 Kafka – Dev Ops

# Uncomment and complete the following to enable KSQL's integration to the Confluent
Schema Registry:
ksql.schema.registry.url=http://kafka0:8081

Start ksqlDB's server

ksqlDB is packaged with a startup script for development use. We'll use that here.
When you're ready to run it as a service, you'll want to manage ksqlDB with something like systemd.

#/opt/ksqldb/bin/ksql-server-start /opt/ksqldb/etc/ksqldb/ksql-server.properties

if any issue in start up because of jar.

Download and store in the following folder.
#cd /opt/ksqldb/share/java/ksqldb
#wget https://repo1.maven.org/maven2/io/netty/netty-all/4.1.30.Final/netty-all-4.1.30.Final.jar
7 Kafka – Dev Ops

Start ksqlDB's interactive CLI

ksqlDB runs as a server which clients connect to in order to issue queries.

Run this command to connect to the ksqlDB server and enter an interactive command-line interface
(CLI) session.

#/opt/ksqldb/bin/ksql http://0.0.0.0:8088
8 Kafka – Dev Ops

#show topics;
9 Kafka – Dev Ops

----------------------------------Lab Ends Here --------------------------------

10 Kafka – Dev Ops

4. Installing Confluent Kafka (Local) – 60 Minutes

Demonstrates both the basic and most powerful capabilities of Confluent Platform,
including using Control Center for topic management and event stream processing using
KSQL. In this quick start you create Apache Kafka® topics, use Kafka Connect to generate
mock data to those topics, and create KSQL streaming queries on those topics. You then go
to Control Center to monitor and analyze the streaming queries.
You need to install java before installing zookeeper and Kafka.
Installing Java
#tar -xvf jdk-8u45-linux-x64.tar.gz -C /opt
Set in the path variable and JAVA_HOME
[ex:
export JAVA_HOME=/opt/jdk
export PATH=$PATH:$JAVA_HOME/bin
]

Include in the profile as follow

11 Kafka – Dev Ops

Installing a Kafka Broker

The following example installs Confluence Kafka in /apps.
Installing and Configuring Confluent CLI
Inflate the confluent kafka compress file as shown below:
#tar -xvf confluent-5.5.1-2.12.tar -C /apps
Rename the folder.
#mv /apps/confluent* /apps/confluent

Set the environment variable for the Confluent Platform directory (<path-to-confluent>).
12 Kafka – Dev Ops

export CONFLUENT_HOME=/apps/confluent

Set your PATH variable:

# vi ~/.bashrc

export PATH=/apps/confluent/bin:${PATH};
13 Kafka – Dev Ops

After decompressing the file. You should have the following directories:

Install the Kafka Connect Datagen source connector using the Confluent Hub client. This
connector generates mock data for demonstration purposes and is not suitable for
production. Confluent Hub is an online library of pre-packaged and ready-to-install
extensions or add-ons for Confluent Platform and Kafka.
#confluent-hub install --no-prompt confluentinc/kafka-connect-datagen:latest
14 Kafka – Dev Ops

Start Confluent Platform using the Confluent CLI confluent local start command. This
command starts all of the Confluent Platform components; including Kafka, ZooKeeper,
Schema Registry, HTTP REST Proxy for Kafka, Kafka Connect, KSQL, and Control Center.
15 Kafka – Dev Ops

#export CONFLUENT_CURRENT=/opt/data/ckafka
#confluent local services start

Navigate to the Control Center web interface at http://localhost:9021/.

16 Kafka – Dev Ops
17 Kafka – Dev Ops

Install a Kafka Connector and Generate Sample Data

In this step, you use Kafka Connect to run a demo source connector called kafka-connect-
datagen that creates sample data for the Kafka topics pageviews and users.
Run one instance of the Kafka Connect Datagen connector to produce Kafka data to
the pageviews topic in AVRO format.

Management à Add connector. Or Connectors à Add Connector

Find the DatagenConnector tile and click Connect.

Name the connector datagen-pageviews . After naming the connector, new fields appear.
Scroll down and specify the following configuration values:

• Tasks max : 1
• In the Key converter class field,
type org.apache.kafka.connect.storage.StringConverter .
• In the kafka.topic field, type pageviews .
• In the max.interval field, type 100 .
• In the iterations field, type 1000000000 .
• In the quickstart field, type pageviews .
•

1. Click Continue.
18 Kafka – Dev Ops

2. Review the connector configuration and click Launch.

Run another instance of the Kafka Connect Datagen connector to produce Kafka data to
the users topic in AVRO format.
Click Add connector.
Find the DatagenConnector tile and click Connect.
19 Kafka – Dev Ops

Name the connector datagen-users . After naming the connector, new fields appear. Scroll
down and specify the following configuration values:
• Max Task : 1
• In the Key converter class field,
type org.apache.kafka.connect.storage.StringConverter .
• In the kafka.topic field, type users .
• In the max.interval field, type 1000 .
• In the iterations field, type 1000000000 .
• In the quickstart field, type users .
• Click Continue.
• Review the connector configuration and click Launch.

At the end of this.

20 Kafka – Dev Ops
21 Kafka – Dev Ops

Verify the messages in the both the topics:

Using the control centers:

Topics -> pageviews -> Messages:

22 Kafka – Dev Ops

Topics -> Users -> Messages:

23 Kafka – Dev Ops

Ensure that ksql DB services is up.

If there is any issue, verify the status and configuration as shown below:
#confluent local services status
24 Kafka – Dev Ops

If unable to connect in 8088 port. Verify that the KSQL listeners IP and port are specify
correctly in the configuration files.
/apps/confluent/etc/ksqldb/ksql-server.properties

listeners=http://localhost:8088 or
listeners=http://0.0.0.0:8088
25 Kafka – Dev Ops

Restart after any modification.

confluent local services ksql-server status
confluent local services ksql-server stop
confluent local services ksql-server start
confluent local services ksql-server status
After that verify the listening port.
lsof -i:8088

It means, the KSQL server is running.

-----------------------------------Lab Installation completes End here. ----------------------------
26 Kafka – Dev Ops

5. Workflow using KSQL - CLI – 90 Minutes(D)

Following features will be demonstrated.

• Create Topics and Produce Data

• Create and produce data to the Kafka topics pageviews and users.
• Inspect Kafka Topics by Using SHOW and PRINT Statements
• Create a Stream and Table
• Write Queries
This tutorial demonstrates a simple workflow using KSQL to write streaming queries
against messages in Kafka.

To get started, you must start a Kafka cluster, including ZooKeeper and a Kafka broker.
KSQL will then query messages from this Kafka cluster. KSQL is installed in the Confluent
Platform by default.
27 Kafka – Dev Ops

Create Topics and Produce Data

Create and produce data to the Kafka topics pageviews and users . These steps use the
KSQL datagen that is included Confluent Platform.

1. Create the pageviews topic and produce data using the data generator. The following
example continuously generates data with a value in DELIMITED format.
ksql-datagen bootstrap-server=kafka0:9092 quickstart=pageviews format=json topic=pageviews
maxInterval=500

2. Produce Kafka data to the users topic using the data generator. The following example
continuously generates data with a value in JSON format.
28 Kafka – Dev Ops

$ ksql-datagen bootstrap-server=kafka0:9092 quickstart=users format=json topic=users

maxInterval=100

Tip
You can also produce Kafka data using the kafka-console-producer CLI provided with
Confluent Platform.

Launch the KSQL CLI

To launch the CLI, run the following command. It will route the CLI logs to
the ./ksql_logs directory, relative to your current directory. By default, the CLI will look for
a KSQL Server running at http://localhost:8088 .
29 Kafka – Dev Ops

$ LOG_DIR=./ksql_logs ksql

Important
By default KSQL attempts to store its logs in a directory called logs that is relative to the
location of the ksql executable. For example, if ksql is installed at /usr/local/bin/ksql ,
then it would attempt to store its logs in /usr/local/logs . If you are running ksql from the
default Confluent Platform location, <path-to-confluent>/bin , you must override this
default behavior by using the LOG_DIR variable.
After KSQL is started, your terminal should resemble this.
30 Kafka – Dev Ops

Inspect Kafka Topics By Using SHOW and PRINT Statements

KSQL enables inspecting Kafka topics and messages in real time.

• Use the SHOW TOPICS statement to list the available topics in the Kafka cluster.
• Use the PRINT statement to see a topic's messages as they arrive.
In the KSQL CLI, run the following statement:

SHOW TOPICS;

Your output should resemble:

Kafka Topic | Registered | Partitions | Partition Replicas | Consumers | ConsumerGrou

ps
------------------------------------------------------------------------------------------------
_confluent-metrics | false | 12 |1 |0 |0
_schemas | false | 1 |1 |0 |0
pageviews | false | 1 |1 |0 |0
users | false | 1 |1 |0 |0
------------------------------------------------------------------------------------------------
Inspect the users topic by using the PRINT statement:

PRINT 'users';
31 Kafka – Dev Ops

Your output should resemble:

Format:JSON
{"ROWTIME":1540254230041,"ROWKEY":"User_1","registertime":1516754966866,"useri
d":"User_1","regionid":"Region_9","gender":"MALE"}
{"ROWTIME":1540254230081,"ROWKEY":"User_3","registertime":1491558386780,"useri
d":"User_3","regionid":"Region_2","gender":"MALE"}
{"ROWTIME":1540254230091,"ROWKEY":"User_7","registertime":1514374073235,"useri
d":"User_7","regionid":"Region_2","gender":"OTHER"}
^C{"ROWTIME":1540254232442,"ROWKEY":"User_4","registertime":1510034151376,"us
erid":"User_4","regionid":"Region_8","gender":"FEMALE"}
Topic printing ceased
Press CTRL+C to stop printing messages.

Inspect the pageviews topic by using the PRINT statement:

PRINT 'pageviews';

Your output should resemble:

Format:STRING
10/23/18 12:24:03 AM UTC , 9461 , 1540254243183,User_9,Page_20
10/23/18 12:24:03 AM UTC , 9471 , 1540254243617,User_7,Page_47
10/23/18 12:24:03 AM UTC , 9481 , 1540254243888,User_4,Page_27
32 Kafka – Dev Ops

^C10/23/18 12:24:05 AM UTC , 9521 , 1540254245161,User_9,Page_62

Topic printing ceased
ksql>

Press CTRL+C to stop printing messages.

Create a Stream and Table

These examples query messages from Kafka topics called pageviews and users using the
following schemas:
33 Kafka – Dev Ops

1. Create a stream, named pageviews_original , from the pageviews Kafka topic, specifying
the value_format of DELIMITED .
CREATE STREAM pageviews_original (viewtime bigint, userid varchar, pageid varchar) WITH
(kafka_topic='pageviews', value_format='JSON');

Your output should resemble:

Tip
You can run DESCRIBE pageviews_original; to see the schema for the stream. Notice
that KSQL created two additional columns, named ROWTIME , which corresponds with
the Kafka message timestamp, and ROWKEY , which corresponds with the Kafka
message key.
34 Kafka – Dev Ops

2. Create a table, named users_original , from the users Kafka topic, specifying
the value_format of JSON .
CREATE TABLE users_original (registertime BIGINT, gender VARCHAR, regionid
VARCHAR, userid VARCHAR PRIMARY KEY) WITH
(kafka_topic='users', value_format='JSON');

Your output should resemble:

Message
---------------
Table created
---------------
Tip
35 Kafka – Dev Ops

You can run DESCRIBE users_original; to see the schema for the Table.
3. Optional: Show all streams and tables.
ksql> SHOW STREAMS;

Stream Name | Kafka Topic | Format

-----------------------------------------------------------------
PAGEVIEWS_ORIGINAL | pageviews | DELIMITED

ksql> SHOW TABLES;

Table Name | Kafka Topic | Format | Windowed

--------------------------------------------------------------
USERS_ORIGINAL | users | JSON | false

Write Queries

SET 'auto.offset.reset'='earliest';

These examples write queries using KSQL.

36 Kafka – Dev Ops

Note: By default KSQL reads the topics for streams and tables from the latest offset.

1. Use SELECT to create a query that returns data from a STREAM. This query includes
the LIMIT keyword to limit the number of rows returned in the query result. Note that
exact data output may vary because of the randomness of the data generation.
SELECT pageid FROM pageviews_original EMIT changes LIMIT 3;

Your output should resemble:

Page_24
Page_73
Page_78
LIMIT reached
Query terminated
2. Create a persistent query by using the CREATE STREAM keywords to precede
the SELECT statement. The results from this query are written to
the PAGEVIEWS_ENRICHED Kafka topic. The following query enriches
the pageviews_original STREAM by doing a LEFT JOIN with
the users_original TABLE on the user ID.
CREATE STREAM pageviews_enriched AS
SELECT users_original.userid AS userid, pageid, regionid, gender
FROM pageviews_original
37 Kafka – Dev Ops

JOIN users_original
ON pageviews_original.userid = users_original.userid
Emit changes;

Your output should resemble:

Message
----------------------------
Stream created and running
----------------------------
Tip
You can run DESCRIBE pageviews_enriched; to describe the stream.
3. Use SELECT to view query results as they come in. To stop viewing the query results,
press <ctrl-c> . This stops printing to the console but it does not terminate the actual
query. The query continues to run in the underlying KSQL application.

SELECT * FROM pageviews_enriched Emit Changes;

Your output should resemble:

38 Kafka – Dev Ops

4. Create a new persistent query where a condition limits the streams content,
using WHERE . Results from this query are written to a Kafka topic
called PAGEVIEWS_FEMALE .
CREATE STREAM pageviews_female AS
SELECT * FROM pageviews_enriched
WHERE gender = 'FEMALE';

Your output should resemble:

Message
----------------------------
Stream created and running
----------------------------
Tip
39 Kafka – Dev Ops

You can run DESCRIBE pageviews_female; to describe the stream.

5. Create a new persistent query where another condition is met, using LIKE . Results from
this query are written to the pageviews_enriched_r8_r9 Kafka topic.
CREATE STREAM pageviews_female_like_89
WITH (kafka_topic='pageviews_enriched_r8_r9') AS
SELECT * FROM pageviews_female
WHERE regionid LIKE '%_8' OR regionid LIKE '%_9';

Your output should resemble:

Message
----------------------------
Stream created and running
----------------------------
6. Verify the above 2 streams:

select * from PAGEVIEWS_FEMALE_LIKE_89 emit changes limit 6;

select * from PAGEVIEWS_FEMALE emit changes limit 3;
40 Kafka – Dev Ops

7. Create a new persistent query that counts the pageviews for each region combination in
a tumbling window of 30 seconds when the count is greater than one. Results from this
query are written to the PAGEVIEWS_REGIONS Kafka topic in the Avro format. KSQL
41 Kafka – Dev Ops

will register the Avro schema with the configured Schema Registry when it writes the
first message to the PAGEVIEWS_REGIONS topic.

CREATE TABLE pageviews_regions

WITH (
KAFKA_TOPIC = 'pageviews_regions',VALUE_FORMAT='AVRO'
) AS
SELECT regionid , COUNT(*) AS numusers
FROM pageviews_enriched
WINDOW TUMBLING (size 30 second)
GROUP BY regionid
HAVING COUNT(*) > 1 emit changes;

Your output should resemble:

Message
---------------------------
Table created and running
---------------------------
Tip
You can run DESCRIBE pageviews_regions; to describe the table.
8. Optional: View results from the above queries using SELECT .
42 Kafka – Dev Ops

SELECT regionid, numusers FROM pageviews_regions emit changes LIMIT 5;

Your output should resemble:

9. Optional: Show all persistent queries.

SHOW QUERIES;

Your output should resemble:

Query ID | Kafka Topic | Query String

-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
43 Kafka – Dev Ops

-------------------------------------------------------------------------------------------------------
-----
CSAS_PAGEVIEWS_FEMALE_1 | PAGEVIEWS_FEMALE | CREATE STREA
M pageviews_female AS SELECT * FROM pageviews_enriched WHERE gender =
'FEMALE';
CTAS_PAGEVIEWS_REGIONS_3 | PAGEVIEWS_REGIONS | CREATE TABLE
pageviews_regions WITH (VALUE_FORMAT='avro') AS SELECT gender, region
id , COUNT(*) AS numusers FROM pageviews_enriched WINDOW TUMBLING
(size 30 second) GROUP BY gender, regionid HAVING COUNT(*) > 1;
CSAS_PAGEVIEWS_FEMALE_LIKE_89_2 | PAGEVIEWS_FEMALE_LIKE_89 | CRE
ATE STREAM pageviews_female_like_89 WITH (kafka_topic='pageviews_enriche
d_r8_r9') AS SELECT * FROM pageviews_female WHERE regionid LIKE '%_8' O
R regionid LIKE '%_9';
CSAS_PAGEVIEWS_ENRICHED_0 | PAGEVIEWS_ENRICHED | CREATE STR
EAM pageviews_enriched AS SELECT users_original.userid AS userid, pageid, regio
nid, gender FROM pageviews_original LEFT JOIN users_original ON pagevie
ws_original.userid = users_original.userid;
-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
-----
For detailed information on a Query run: EXPLAIN <Query ID>;
10. Optional: Examine query run-time metrics and details. Observe that information
including the target Kafka topic is available, as well as throughput figures for the
messages being processed.
44 Kafka – Dev Ops

DESCRIBE PAGEVIEWS_REGIONS EXTENDED;

Your output should resemble:

Name : PAGEVIEWS_REGIONS
Type : TABLE
Key field : KSQL_INTERNAL_COL_0|+|KSQL_INTERNAL_COL_1
Key format : STRING
Timestamp field : Not set - using <ROWTIME>
Value format : AVRO
Kafka topic : PAGEVIEWS_REGIONS (partitions: 4, replication: 1)

Queries that write into this TABLE

-----------------------------------
CTAS_PAGEVIEWS_REGIONS_3 : CREATE TABLE pageviews_regions WITH (val
ue_format='avro') AS SELECT gender, regionid , COUNT(*) AS numusers FROM
45 Kafka – Dev Ops

pageviews_enriched WINDOW TUMBLING (size 30 second) GROUP BY gender,

regionid HAVING COUNT(*) > 1;

For query topology and execution plan please run: EXPLAIN <QueryId>

Local runtime statistics

------------------------
messages-per-sec: 3.06 total-messages: 1827 last-message: 7/19/18 4:17:55 PM
UTC
failed-messages: 0 failed-messages-per-sec: 0 last-failed: n/a
(Statistics of the local KSQL server interaction with the Kafka topic PAGEVIEWS_REGI
ONS)
ksql>

----------------------------- Lab Ends Here ----------------------------------------------------

https://ksqldb.io/quickstart.html?_ga=2.53841192.1438767497.1642131382-
2002989446.1641377120&_gac=1.255954681.1642171371.CjwKCAiA24SPBhB0EiwAjBgkh
g1qFCOJ-Ohq2cWlGrT9c3232dWfPKKpOG6zXpZrNXjqUelgasqp5BoCTEoQAvD_BwE
Any issues related to minimum config clean the zookeeper/kafka-logs and restart the
services.
46 Kafka – Dev Ops

6. Errors

I. LEADER_NOT_AVAILABLE
{test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)

Solutions: /opt/kafka/config/server.properties
Update the following information.

java.util.concurrent.ExecutionException:
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for my-kafka-
topic-6: 30037 ms has passed since batch creation plus linger time
47 Kafka – Dev Ops

at
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRe
cordMetadata.java:94)
at
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMeta
data.java:64)
at
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMeta
data.java:29)
at com.tos.kafka.MyKafkaProducer.runProducer(MyKafkaProducer.java:97)
at com.tos.kafka.MyKafkaProducer.main(MyKafkaProducer.java:18)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for
my-kafka-topic-6: 30037 ms has passed since batch creation plus linger time.
Solution:
Update the following in all the server properties: /opt/kafka/config/server.properties
48 Kafka – Dev Ops

Its should be updated with your hostname and restart the broker
Changes in the following file, if the hostname is to be changed.
//kafka/ Server.properties and control center
/apps/confluent/etc/confluent-control-center/control-center-dev.properties
/apps/confluent/etc/ksql/ksql-server.properties
/tmp/confluent.8A2Ii7O4/connect/connect.properties

Update localhost to resolve to the ip in /etc/hosts.

49 Kafka – Dev Ops

In case the hostname doesn’t started, updated with ip address and restart the broker.
50 Kafka – Dev Ops

7. Annexure Code:
II. DumplogSegment
/opt/kafka/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --print-
data-log --files \
/tmp/kafka-logs/my-kafka-connect-0/00000000000000000000.log | head -n 4
51 Kafka – Dev Ops
52 Kafka – Dev Ops

III. Data Generator – JSON

Streaming Json Data Generator

Downloading the generator

You can always find the most recent release over on github where you can download the
bundle file that contains the runnable application and example configurations. Head there
now and download a release to get started!

Configuration
The generator runs a Simulation which you get to define. The Simulation can specify one or
many Workflows that will be run as part of your Simulation. The Workflows then generates
Events and these Events are then sent somewhere. You will also need to define Producers
that are used to send the Events generated by your Workflows to some destination. These
destinations could be a log file, or something more complicated like a Kafka Queue.

You define the configuration for the json-data-generator using two configuration files. The
first is a Simulation Config. The Simulation Config defines the Workflows that should be
run and different Producers that events should be sent to. The second is a Workflow
configuration (of which you can have multiple). The Workflow defines the frequency of
Events and Steps that the Workflow uses to generate the Events. It is the Workflow that
defines the format and content of your Events as well.
53 Kafka – Dev Ops

For our example, we are going to pretend that we have a programmable Jackie Chan robot.
We can command Jackie Chan though a programmable interface that happens to take json
as an input via a Kafka queue and you can command him to perform different fighting
moves in different martial arts styles. A Jackie Chan command might look like this:

{
"timestamp":"2015-05-20T22:05:44.789Z",
"style":"DRUNKEN_BOXING",
"action":"PUNCH",
"weapon":"CHAIR",
"target":"ARMS",
"strength":8.3433
}

view rawexampleJackieChanCommand.json hosted with by GitHub

Now, we want to have some fun with our awesome Jackie Chan robot, so we are going to
make him do random moves using our json-data-generator! First we need to define a
Simulation Config and then a Workflow that Jackie will use.
54 Kafka – Dev Ops

SIMULATION CONFIG
Let’s take a look at our example Simulation Config:

{
"workflows": [{
"workflowName": "jackieChan",
"workflowFilename": "jackieChanWorkflow.json"
}],
"producers": [{
"type": "kafka",
"broker.server": "192.168.59.103",
"broker.port": 9092,
"topic": "jackieChanCommand",
"flatten": false,
"sync": false

}]
55 Kafka – Dev Ops

view rawjackieChanSimConfig.json hosted with by GitHub

As you can see, there are two main parts to the Simulation Config. The Workflows name
and list the workflow configurations you want to use. The Producers are where the
Generator will send the events to. At the time of writing this, we have three supported
Producers:

• A Logger that sends events to log files

• A Kafka Producer that will send events to your specified Kafka Broker
• A Tranquility Producer that will send events to a Druid cluster.

You can find the full configuration options for each on the github page. We used a Kafka
producer because that is how you command our Jackie Chan robot.

WORKFLOW CONFIG
The Simulation Config above specifies that it will use a Workflow called
jackieChanWorkflow.json. This is where the meat of your configuration would live. Let’s
take a look at the example Workflow config and see how we are going to control Jackie
Chan:

{
"eventFrequency": 400,
56 Kafka – Dev Ops

"varyEventFrequency": true,
"repeatWorkflow": true,
"timeBetweenRepeat": 1500,
"varyRepeatFrequency": true,
"steps": [{
"config": [{
"timestamp": "now()",
"style": "random('KUNG_FU','WUSHU','DRUNKEN_BOXING')",
"action": "random('KICK','PUNCH','BLOCK','JUMP')",
"weapon": "random('BROAD_SWORD','STAFF','CHAIR','ROPE')",
"target": "random('HEAD','BODY','LEGS','ARMS')",
"strength": "double(1.0,10.0)"
}
],
"duration": 0
}]
57 Kafka – Dev Ops

view rawjackieChanWorkflow.json hosted with by GitHub

The Workflow defines many things that are all defined on the github page, but here is a
summary:

• At the top are the properties that define how often events should be generated and if /
when this workflow should be repeated. So this is like saying we want Jackie Chan to
do a martial arts move every 400 milliseconds (he’s FAST!), then take a break for 1.5
seconds, and do another one.
• Next, are the Steps that this Workflow defines. Each Step has a config and a
duration. The duration specifies how long to run this step. The config is where it gets
interesting!

WORKFLOW STEP CONFIG

The Step Config is your specific definition of a json event. This can be any kind of json
object you want. In our example, we want to generate a Jackie Chan command message
that will be sent to his control unit via Kafka. So we define the command message in our
config, and since we want this to be fun, we are going to randomly generate what kind of
style, move, weapon, and target he will use.

You’ll notice that the values for each of the object properties look a bit funny. These are
special Functions that we have created that allow us to generate values for each of the
properties. For instance, the “random(‘KICK’,’PUNCH’,’BLOCK’,’JUMP’)” function will
randomly choose one of the values and output it as the value of the “action” property in the
58 Kafka – Dev Ops

command message. The “now()” function will output the current date in an ISO8601 date
formatted string. The “double(1.0,10.0)” will generate a random double between 1 and 10
to determine the strength of the action that Jackie Chan will perform. If we wanted to, we
could make Jackie Chan perform combo moves by defining a number of Steps that will be
executed in order.

There are many more Functions available in the generator with everything from random
string generation, counters, random number generation, dates, and even support for
randomly generating arrays of data. We also support the ability to reference other
randomly generated values. For more info, please check out the full documentation on the
github page.

Once we have defined the Workflow, we can run it using the json-data-generator. To do
this, do the following:

1. If you have not already, go ahead and download the most recent release of the json-
data-generator.
2. Unpack the file you downloaded to a directory.

(tar -xvf json-data-generator-1.4.0-bin.tar -C /apps )

3. Copy your custom configs into the conf directory
4. Then run the generator like so:
1. java -jar json-data-generator-1.4.0.jar jackieChanSimConfig.json
59 Kafka – Dev Ops

You will see logging in your console showing the events as they are being generated. The
jackieChanSimConfig.json generates events like these:

{"timestamp":"2015-05-
20T22:21:18.036Z","style":"WUSHU","action":"BLOCK","weapon":"CHAIR","target":"B
ODY","strength":4.7912}
{"timestamp":"2015-05-
20T22:21:19.247Z","style":"DRUNKEN_BOXING","action":"PUNCH","weapon":"BROA
D_SWORD","target":"ARMS","strength":3.0248}
{"timestamp":"2015-05-
20T22:21:20.947Z","style":"DRUNKEN_BOXING","action":"BLOCK","weapon":"ROPE"
,"target":"HEAD","strength":6.7571}
{"timestamp":"2015-05-
20T22:21:22.715Z","style":"WUSHU","action":"KICK","weapon":"BROAD_SWORD","tar
get":"ARMS","strength":9.2062}
{"timestamp":"2015-05-
20T22:21:23.852Z","style":"KUNG_FU","action":"PUNCH","weapon":"BROAD_SWOR
D","target":"HEAD","strength":4.6202}
{"timestamp":"2015-05-
20T22:21:25.195Z","style":"KUNG_FU","action":"JUMP","weapon":"ROPE","target":"A
RMS","strength":7.5303}
{"timestamp":"2015-05-
60 Kafka – Dev Ops

20T22:21:26.492Z","style":"DRUNKEN_BOXING","action":"PUNCH","weapon":"STAF
F","target":"HEAD","strength":1.1247}
{"timestamp":"2015-05-
20T22:21:28.042Z","style":"WUSHU","action":"BLOCK","weapon":"STAFF","target":"A
RMS","strength":5.5976}
{"timestamp":"2015-05-
20T22:21:29.422Z","style":"KUNG_FU","action":"BLOCK","weapon":"ROPE","target":"
ARMS","strength":2.152}
{"timestamp":"2015-05-
20T22:21:30.782Z","style":"DRUNKEN_BOXING","action":"BLOCK","weapon":"STAFF
","target":"ARMS","strength":6.2686}
{"timestamp":"2015-05-
20T22:21:32.128Z","style":"KUNG_FU","action":"KICK","weapon":"BROAD_SWORD","
target":"BODY","strength":2.3534}

view rawjackieChanCommands.json hosted with by GitHub

If you specified to repeat your Workflow, then the generator will continue to output events
and send them to your Producer simulating a real world client, or in our case, continue to
make Jackie Chan show off his awesome skills. If you also had a Chuck Norris robot, you
could add another Workflow config to your Simulation and have the two robots fight it
out! Just another example of how you can use the generator to simulate real world
situations.
61 Kafka – Dev Ops
62 Kafka – Dev Ops

IV. Resources

https://developer.ibm.com/hadoop/2017/04/10/kafka-security-mechanism-saslplain/
https://sharebigdata.wordpress.com/2018/01/21/implementing-sasl-plain/
https://developer.ibm.com/code/howtos/kafka-authn-authz
https://github.com/confluentinc/kafka-streams-examples/tree/4.1.x/
https://github.com/spring-cloud/spring-cloud-stream-samples/blob/master/kafka-
streams-samples/kafka-streams-table-
join/src/main/java/kafka/streams/table/join/KafkaStreamsTableJoin.java

20200820-Exam-Confluent Certified Administrator PDF
0% (1)
20200820-Exam-Confluent Certified Administrator PDF
4 pages
The Python Manual
97% (31)
The Python Manual
196 pages
APC Building Data Lakes On AWS SG
No ratings yet
APC Building Data Lakes On AWS SG
187 pages
Azure
No ratings yet
Azure
297 pages
Kubernetes Notes
No ratings yet
Kubernetes Notes
36 pages
Basics of Apache Kafka
100% (1)
Basics of Apache Kafka
168 pages
Kafka Low Level Architecture
No ratings yet
Kafka Low Level Architecture
52 pages
Confluent Training Developer DS - 112216
No ratings yet
Confluent Training Developer DS - 112216
1 page
Recommendations For Deploying Apache Kafka On Kubernetes
No ratings yet
Recommendations For Deploying Apache Kafka On Kubernetes
9 pages
Lab-Kafka Administration VI
100% (1)
Lab-Kafka Administration VI
197 pages
Kafka Fund
100% (1)
Kafka Fund
160 pages
Apache Kafka Confluent Enterprise Ref Architecture
No ratings yet
Apache Kafka Confluent Enterprise Ref Architecture
17 pages
Kafka and NiFI
No ratings yet
Kafka and NiFI
8 pages
Slide 5-6 Kafka
No ratings yet
Slide 5-6 Kafka
111 pages
Kafka101training Public v2 140818033637 Phpapp01
No ratings yet
Kafka101training Public v2 140818033637 Phpapp01
119 pages
Apache Kafka Tutorial
No ratings yet
Apache Kafka Tutorial
6 pages
Chandan Prakash's Blog
No ratings yet
Chandan Prakash's Blog
4 pages
2016 05 10 Apache Nifi Deep Dive 160511170654
No ratings yet
2016 05 10 Apache Nifi Deep Dive 160511170654
34 pages
Kafka My Kafka Note v67
No ratings yet
Kafka My Kafka Note v67
55 pages
Bigdata Notes
No ratings yet
Bigdata Notes
26 pages
Cloudurable Kafka Tutorial v1 PDF
No ratings yet
Cloudurable Kafka Tutorial v1 PDF
79 pages
8 Data Modeling Patterns in Redis
No ratings yet
8 Data Modeling Patterns in Redis
56 pages
Kafka Internals
No ratings yet
Kafka Internals
30 pages
NiFi Pyro
No ratings yet
NiFi Pyro
24 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
Open Shift
No ratings yet
Open Shift
80 pages
SS1123 - D2T - Apache Cassandra Overview PDF
100% (1)
SS1123 - D2T - Apache Cassandra Overview PDF
45 pages
Kafka Cloudera Documentation
100% (1)
Kafka Cloudera Documentation
175 pages
Documentation
No ratings yet
Documentation
105 pages
Kanishk Resume
No ratings yet
Kanishk Resume
5 pages
AWS DevOps Interview Questions
No ratings yet
AWS DevOps Interview Questions
5 pages
Practice Test 6
No ratings yet
Practice Test 6
50 pages
5.AWS Lambda PDF
No ratings yet
5.AWS Lambda PDF
27 pages
Real Time Analytics Spark Streaming PDF
No ratings yet
Real Time Analytics Spark Streaming PDF
20 pages
Apache Cassandra Database - Instaclustr
No ratings yet
Apache Cassandra Database - Instaclustr
8 pages
Couchbase Server An Architectural Overview
No ratings yet
Couchbase Server An Architectural Overview
12 pages
Serverless-Architectures-With-Aws-Lambda Documentation
No ratings yet
Serverless-Architectures-With-Aws-Lambda Documentation
50 pages
OpenShift - Container - Platform 4.18 Architecture en US
No ratings yet
OpenShift - Container - Platform 4.18 Architecture en US
73 pages
System Design
No ratings yet
System Design
18 pages
EC2 Notes
No ratings yet
EC2 Notes
10 pages
Cloud Interview Guide - v4
No ratings yet
Cloud Interview Guide - v4
35 pages
AWS Athena Knowledgebase
No ratings yet
AWS Athena Knowledgebase
4 pages
Apache Kafka
No ratings yet
Apache Kafka
245 pages
02 - Apache Spark On Amazon EMR
No ratings yet
02 - Apache Spark On Amazon EMR
31 pages
Interview Question
No ratings yet
Interview Question
24 pages
Top Answers To Kafka Interview Questions
No ratings yet
Top Answers To Kafka Interview Questions
3 pages
Hadoop Interview Questions - Part 1
No ratings yet
Hadoop Interview Questions - Part 1
8 pages
Awesome Kubernetes
No ratings yet
Awesome Kubernetes
37 pages
?MongoDB & Docker Data Volume Containers
No ratings yet
?MongoDB & Docker Data Volume Containers
8 pages
Configuring Kafka For High Throughput
No ratings yet
Configuring Kafka For High Throughput
11 pages
Amazon Elastic Beanstalk Tutorial
No ratings yet
Amazon Elastic Beanstalk Tutorial
38 pages
Aws - RDS
No ratings yet
Aws - RDS
11 pages
All Configuration - Keycloak
No ratings yet
All Configuration - Keycloak
11 pages
Building Data Pipelines - 1
No ratings yet
Building Data Pipelines - 1
25 pages
AWS Scenario Based
No ratings yet
AWS Scenario Based
16 pages
Lecture 07 - Key-Value Databases
No ratings yet
Lecture 07 - Key-Value Databases
75 pages
Apache Kafka Course Curriculum
No ratings yet
Apache Kafka Course Curriculum
5 pages
AWS Data Engineering Services
No ratings yet
AWS Data Engineering Services
24 pages
HBase Administration Cookbook
From Everand
HBase Administration Cookbook
Yifeng Jiang
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
Kubernetes A Complete Guide
From Everand
Kubernetes A Complete Guide
Gerardus Blokdyk
No ratings yet
Applet Slides
No ratings yet
Applet Slides
15 pages
Lab4 5 Thai SE161457
No ratings yet
Lab4 5 Thai SE161457
7 pages
Compilers and Interpreters
No ratings yet
Compilers and Interpreters
2 pages
AR-C Recorder For Claricor 3 Airbus-Defence-and-Space-May2015
No ratings yet
AR-C Recorder For Claricor 3 Airbus-Defence-and-Space-May2015
2 pages
30015QK0
No ratings yet
30015QK0
32 pages
6 1 5 5 3SP9 Delta
No ratings yet
6 1 5 5 3SP9 Delta
78 pages
Computer Network CHAPITRE 3
No ratings yet
Computer Network CHAPITRE 3
7 pages
The Log4j Vulnerability and The Impact On Business-Critical SAP Applications
No ratings yet
The Log4j Vulnerability and The Impact On Business-Critical SAP Applications
6 pages
How To Install OpenSSH in Sun Solaris 10 - Sparc
No ratings yet
How To Install OpenSSH in Sun Solaris 10 - Sparc
3 pages
Revised Competency, Electrical & Computer Engineering at JJU
No ratings yet
Revised Competency, Electrical & Computer Engineering at JJU
13 pages
Java
100% (1)
Java
11 pages
Latex With Overleaf
No ratings yet
Latex With Overleaf
5 pages
TUTORIAL - INSTALASI - DEBIAN 10 - SERVER - Oracle VMBox
No ratings yet
TUTORIAL - INSTALASI - DEBIAN 10 - SERVER - Oracle VMBox
38 pages
Customizing Manual: Maxion Wheels Czech S.R.O
No ratings yet
Customizing Manual: Maxion Wheels Czech S.R.O
44 pages
RN
No ratings yet
RN
56 pages
Parallel Processing Parameter Testing 110411
No ratings yet
Parallel Processing Parameter Testing 110411
28 pages
Top - Niunaijun.blackboxa64 Logcat
No ratings yet
Top - Niunaijun.blackboxa64 Logcat
194 pages
Virtual Chassis Fabric PDF
No ratings yet
Virtual Chassis Fabric PDF
278 pages
CAMAG SI 18-052 visionCATS Signal Meter Function
No ratings yet
CAMAG SI 18-052 visionCATS Signal Meter Function
4 pages
Tutorial PragmaDev
No ratings yet
Tutorial PragmaDev
126 pages
Eage2012 Open Source Workshop Seaseis Poster
No ratings yet
Eage2012 Open Source Workshop Seaseis Poster
2 pages
OneScreen Touchscreen For Education TL7 65 Spec Sheet
No ratings yet
OneScreen Touchscreen For Education TL7 65 Spec Sheet
2 pages
Lab 4 - Apache Web Server Installation
No ratings yet
Lab 4 - Apache Web Server Installation
4 pages
Ankan Saha
No ratings yet
Ankan Saha
1 page
Dsjob Command
No ratings yet
Dsjob Command
3 pages
PLC Siemens S7-300 Cpu 312C Manual
No ratings yet
PLC Siemens S7-300 Cpu 312C Manual
176 pages
Offline-1 Updated
No ratings yet
Offline-1 Updated
3 pages
CRUD Operations
No ratings yet
CRUD Operations
7 pages
Step 3 Icons Settings
No ratings yet
Step 3 Icons Settings
3 pages