Lab-Confluence Kafka KSQL II
Lab-Confluence Kafka KSQL II
1. Prerequisite ......................................................................................................................... 3
2. Install KSQL DB – 60 Minutes(D) ................................................................................... 4
4. Installing Confluent Kafka (Local) – 60 Minutes ...........................................................10
5. Workflow using KSQL - CLI – 90 Minutes(D) ............................................................. 26
6. Errors .............................................................................................................................. 46
I. LEADER_NOT_AVAILABLE ........................................................................................ 46
java.util.concurrent.ExecutionException: ........................................................................... 46
7. Annexure Code: .............................................................................................................. 50
II. DumplogSegment ........................................................................................................ 50
III. Data Generator – JSON .............................................................................................. 52
IV. Resources ..................................................................................................................... 62
https://docs.confluent.io/current/ksql/docs/tutorials/examples.html#ksql-examples
3 Kafka – Dev Ops
1. Prerequisite
JDk : 1.11
Confluent : 7.3.3
Kafka : 3.11
Registry & KSQL DB
4 Kafka – Dev Ops
#vi /opt/kafka/config/server.properties
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
offsets.topic.replication.factor=1
To enable Schema Registry Add the following line at the end of the configuration file.
#------ Schema Registry -------
6 Kafka – Dev Ops
# Uncomment and complete the following to enable KSQL's integration to the Confluent
Schema Registry:
ksql.schema.registry.url=http://kafka0:8081
#/opt/ksqldb/bin/ksql-server-start /opt/ksqldb/etc/ksqldb/ksql-server.properties
Run this command to connect to the ksqlDB server and enter an interactive command-line interface
(CLI) session.
#/opt/ksqldb/bin/ksql http://0.0.0.0:8088
8 Kafka – Dev Ops
#show topics;
9 Kafka – Dev Ops
Demonstrates both the basic and most powerful capabilities of Confluent Platform,
including using Control Center for topic management and event stream processing using
KSQL. In this quick start you create Apache Kafka® topics, use Kafka Connect to generate
mock data to those topics, and create KSQL streaming queries on those topics. You then go
to Control Center to monitor and analyze the streaming queries.
You need to install java before installing zookeeper and Kafka.
Installing Java
#tar -xvf jdk-8u45-linux-x64.tar.gz -C /opt
Set in the path variable and JAVA_HOME
[ex:
export JAVA_HOME=/opt/jdk
export PATH=$PATH:$JAVA_HOME/bin
]
Set the environment variable for the Confluent Platform directory (<path-to-confluent>).
12 Kafka – Dev Ops
export CONFLUENT_HOME=/apps/confluent
# vi ~/.bashrc
export PATH=/apps/confluent/bin:${PATH};
13 Kafka – Dev Ops
After decompressing the file. You should have the following directories:
Install the Kafka Connect Datagen source connector using the Confluent Hub client. This
connector generates mock data for demonstration purposes and is not suitable for
production. Confluent Hub is an online library of pre-packaged and ready-to-install
extensions or add-ons for Confluent Platform and Kafka.
#confluent-hub install --no-prompt confluentinc/kafka-connect-datagen:latest
14 Kafka – Dev Ops
Start Confluent Platform using the Confluent CLI confluent local start command. This
command starts all of the Confluent Platform components; including Kafka, ZooKeeper,
Schema Registry, HTTP REST Proxy for Kafka, Kafka Connect, KSQL, and Control Center.
15 Kafka – Dev Ops
#export CONFLUENT_CURRENT=/opt/data/ckafka
#confluent local services start
Name the connector datagen-pageviews . After naming the connector, new fields appear.
Scroll down and specify the following configuration values:
• Tasks max : 1
• In the Key converter class field,
type org.apache.kafka.connect.storage.StringConverter .
• In the kafka.topic field, type pageviews .
• In the max.interval field, type 100 .
• In the iterations field, type 1000000000 .
• In the quickstart field, type pageviews .
•
1. Click Continue.
18 Kafka – Dev Ops
Run another instance of the Kafka Connect Datagen connector to produce Kafka data to
the users topic in AVRO format.
Click Add connector.
Find the DatagenConnector tile and click Connect.
19 Kafka – Dev Ops
Name the connector datagen-users . After naming the connector, new fields appear. Scroll
down and specify the following configuration values:
• Max Task : 1
• In the Key converter class field,
type org.apache.kafka.connect.storage.StringConverter .
• In the kafka.topic field, type users .
• In the max.interval field, type 1000 .
• In the iterations field, type 1000000000 .
• In the quickstart field, type users .
• Click Continue.
• Review the connector configuration and click Launch.
If there is any issue, verify the status and configuration as shown below:
#confluent local services status
24 Kafka – Dev Ops
If unable to connect in 8088 port. Verify that the KSQL listeners IP and port are specify
correctly in the configuration files.
/apps/confluent/etc/ksqldb/ksql-server.properties
listeners=http://localhost:8088 or
listeners=http://0.0.0.0:8088
25 Kafka – Dev Ops
To get started, you must start a Kafka cluster, including ZooKeeper and a Kafka broker.
KSQL will then query messages from this Kafka cluster. KSQL is installed in the Confluent
Platform by default.
27 Kafka – Dev Ops
Create and produce data to the Kafka topics pageviews and users . These steps use the
KSQL datagen that is included Confluent Platform.
1. Create the pageviews topic and produce data using the data generator. The following
example continuously generates data with a value in DELIMITED format.
ksql-datagen bootstrap-server=kafka0:9092 quickstart=pageviews format=json topic=pageviews
maxInterval=500
2. Produce Kafka data to the users topic using the data generator. The following example
continuously generates data with a value in JSON format.
28 Kafka – Dev Ops
Tip
You can also produce Kafka data using the kafka-console-producer CLI provided with
Confluent Platform.
$ LOG_DIR=./ksql_logs ksql
Important
By default KSQL attempts to store its logs in a directory called logs that is relative to the
location of the ksql executable. For example, if ksql is installed at /usr/local/bin/ksql ,
then it would attempt to store its logs in /usr/local/logs . If you are running ksql from the
default Confluent Platform location, <path-to-confluent>/bin , you must override this
default behavior by using the LOG_DIR variable.
After KSQL is started, your terminal should resemble this.
30 Kafka – Dev Ops
• Use the SHOW TOPICS statement to list the available topics in the Kafka cluster.
• Use the PRINT statement to see a topic's messages as they arrive.
In the KSQL CLI, run the following statement:
SHOW TOPICS;
PRINT 'users';
31 Kafka – Dev Ops
Format:JSON
{"ROWTIME":1540254230041,"ROWKEY":"User_1","registertime":1516754966866,"useri
d":"User_1","regionid":"Region_9","gender":"MALE"}
{"ROWTIME":1540254230081,"ROWKEY":"User_3","registertime":1491558386780,"useri
d":"User_3","regionid":"Region_2","gender":"MALE"}
{"ROWTIME":1540254230091,"ROWKEY":"User_7","registertime":1514374073235,"useri
d":"User_7","regionid":"Region_2","gender":"OTHER"}
^C{"ROWTIME":1540254232442,"ROWKEY":"User_4","registertime":1510034151376,"us
erid":"User_4","regionid":"Region_8","gender":"FEMALE"}
Topic printing ceased
Press CTRL+C to stop printing messages.
PRINT 'pageviews';
Format:STRING
10/23/18 12:24:03 AM UTC , 9461 , 1540254243183,User_9,Page_20
10/23/18 12:24:03 AM UTC , 9471 , 1540254243617,User_7,Page_47
10/23/18 12:24:03 AM UTC , 9481 , 1540254243888,User_4,Page_27
32 Kafka – Dev Ops
These examples query messages from Kafka topics called pageviews and users using the
following schemas:
33 Kafka – Dev Ops
1. Create a stream, named pageviews_original , from the pageviews Kafka topic, specifying
the value_format of DELIMITED .
CREATE STREAM pageviews_original (viewtime bigint, userid varchar, pageid varchar) WITH
(kafka_topic='pageviews', value_format='JSON');
Tip
You can run DESCRIBE pageviews_original; to see the schema for the stream. Notice
that KSQL created two additional columns, named ROWTIME , which corresponds with
the Kafka message timestamp, and ROWKEY , which corresponds with the Kafka
message key.
34 Kafka – Dev Ops
2. Create a table, named users_original , from the users Kafka topic, specifying
the value_format of JSON .
CREATE TABLE users_original (registertime BIGINT, gender VARCHAR, regionid
VARCHAR, userid VARCHAR PRIMARY KEY) WITH
(kafka_topic='users', value_format='JSON');
Message
---------------
Table created
---------------
Tip
35 Kafka – Dev Ops
You can run DESCRIBE users_original; to see the schema for the Table.
3. Optional: Show all streams and tables.
ksql> SHOW STREAMS;
Write Queries
SET 'auto.offset.reset'='earliest';
Note: By default KSQL reads the topics for streams and tables from the latest offset.
1. Use SELECT to create a query that returns data from a STREAM. This query includes
the LIMIT keyword to limit the number of rows returned in the query result. Note that
exact data output may vary because of the randomness of the data generation.
SELECT pageid FROM pageviews_original EMIT changes LIMIT 3;
Page_24
Page_73
Page_78
LIMIT reached
Query terminated
2. Create a persistent query by using the CREATE STREAM keywords to precede
the SELECT statement. The results from this query are written to
the PAGEVIEWS_ENRICHED Kafka topic. The following query enriches
the pageviews_original STREAM by doing a LEFT JOIN with
the users_original TABLE on the user ID.
CREATE STREAM pageviews_enriched AS
SELECT users_original.userid AS userid, pageid, regionid, gender
FROM pageviews_original
37 Kafka – Dev Ops
JOIN users_original
ON pageviews_original.userid = users_original.userid
Emit changes;
Message
----------------------------
Stream created and running
----------------------------
Tip
You can run DESCRIBE pageviews_enriched; to describe the stream.
3. Use SELECT to view query results as they come in. To stop viewing the query results,
press <ctrl-c> . This stops printing to the console but it does not terminate the actual
query. The query continues to run in the underlying KSQL application.
4. Create a new persistent query where a condition limits the streams content,
using WHERE . Results from this query are written to a Kafka topic
called PAGEVIEWS_FEMALE .
CREATE STREAM pageviews_female AS
SELECT * FROM pageviews_enriched
WHERE gender = 'FEMALE';
Message
----------------------------
Stream created and running
----------------------------
Tip
39 Kafka – Dev Ops
Message
----------------------------
Stream created and running
----------------------------
6. Verify the above 2 streams:
7. Create a new persistent query that counts the pageviews for each region combination in
a tumbling window of 30 seconds when the count is greater than one. Results from this
query are written to the PAGEVIEWS_REGIONS Kafka topic in the Avro format. KSQL
41 Kafka – Dev Ops
will register the Avro schema with the configured Schema Registry when it writes the
first message to the PAGEVIEWS_REGIONS topic.
Message
---------------------------
Table created and running
---------------------------
Tip
You can run DESCRIBE pageviews_regions; to describe the table.
8. Optional: View results from the above queries using SELECT .
42 Kafka – Dev Ops
-------------------------------------------------------------------------------------------------------
-----
CSAS_PAGEVIEWS_FEMALE_1 | PAGEVIEWS_FEMALE | CREATE STREA
M pageviews_female AS SELECT * FROM pageviews_enriched WHERE gender =
'FEMALE';
CTAS_PAGEVIEWS_REGIONS_3 | PAGEVIEWS_REGIONS | CREATE TABLE
pageviews_regions WITH (VALUE_FORMAT='avro') AS SELECT gender, region
id , COUNT(*) AS numusers FROM pageviews_enriched WINDOW TUMBLING
(size 30 second) GROUP BY gender, regionid HAVING COUNT(*) > 1;
CSAS_PAGEVIEWS_FEMALE_LIKE_89_2 | PAGEVIEWS_FEMALE_LIKE_89 | CRE
ATE STREAM pageviews_female_like_89 WITH (kafka_topic='pageviews_enriche
d_r8_r9') AS SELECT * FROM pageviews_female WHERE regionid LIKE '%_8' O
R regionid LIKE '%_9';
CSAS_PAGEVIEWS_ENRICHED_0 | PAGEVIEWS_ENRICHED | CREATE STR
EAM pageviews_enriched AS SELECT users_original.userid AS userid, pageid, regio
nid, gender FROM pageviews_original LEFT JOIN users_original ON pagevie
ws_original.userid = users_original.userid;
-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
-----
For detailed information on a Query run: EXPLAIN <Query ID>;
10. Optional: Examine query run-time metrics and details. Observe that information
including the target Kafka topic is available, as well as throughput figures for the
messages being processed.
44 Kafka – Dev Ops
Name : PAGEVIEWS_REGIONS
Type : TABLE
Key field : KSQL_INTERNAL_COL_0|+|KSQL_INTERNAL_COL_1
Key format : STRING
Timestamp field : Not set - using <ROWTIME>
Value format : AVRO
Kafka topic : PAGEVIEWS_REGIONS (partitions: 4, replication: 1)
Field | Type
--------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
GENDER | VARCHAR(STRING)
REGIONID | VARCHAR(STRING)
NUMUSERS | BIGINT
--------------------------------------
For query topology and execution plan please run: EXPLAIN <QueryId>
6. Errors
I. LEADER_NOT_AVAILABLE
{test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
Solutions: /opt/kafka/config/server.properties
Update the following information.
java.util.concurrent.ExecutionException:
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for my-kafka-
topic-6: 30037 ms has passed since batch creation plus linger time
47 Kafka – Dev Ops
at
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRe
cordMetadata.java:94)
at
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMeta
data.java:64)
at
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMeta
data.java:29)
at com.tos.kafka.MyKafkaProducer.runProducer(MyKafkaProducer.java:97)
at com.tos.kafka.MyKafkaProducer.main(MyKafkaProducer.java:18)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for
my-kafka-topic-6: 30037 ms has passed since batch creation plus linger time.
Solution:
Update the following in all the server properties: /opt/kafka/config/server.properties
48 Kafka – Dev Ops
Its should be updated with your hostname and restart the broker
Changes in the following file, if the hostname is to be changed.
//kafka/ Server.properties and control center
/apps/confluent/etc/confluent-control-center/control-center-dev.properties
/apps/confluent/etc/ksql/ksql-server.properties
/tmp/confluent.8A2Ii7O4/connect/connect.properties
In case the hostname doesn’t started, updated with ip address and restart the broker.
50 Kafka – Dev Ops
7. Annexure Code:
II. DumplogSegment
/opt/kafka/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --print-
data-log --files \
/tmp/kafka-logs/my-kafka-connect-0/00000000000000000000.log | head -n 4
51 Kafka – Dev Ops
52 Kafka – Dev Ops
Configuration
The generator runs a Simulation which you get to define. The Simulation can specify one or
many Workflows that will be run as part of your Simulation. The Workflows then generates
Events and these Events are then sent somewhere. You will also need to define Producers
that are used to send the Events generated by your Workflows to some destination. These
destinations could be a log file, or something more complicated like a Kafka Queue.
You define the configuration for the json-data-generator using two configuration files. The
first is a Simulation Config. The Simulation Config defines the Workflows that should be
run and different Producers that events should be sent to. The second is a Workflow
configuration (of which you can have multiple). The Workflow defines the frequency of
Events and Steps that the Workflow uses to generate the Events. It is the Workflow that
defines the format and content of your Events as well.
53 Kafka – Dev Ops
For our example, we are going to pretend that we have a programmable Jackie Chan robot.
We can command Jackie Chan though a programmable interface that happens to take json
as an input via a Kafka queue and you can command him to perform different fighting
moves in different martial arts styles. A Jackie Chan command might look like this:
{
"timestamp":"2015-05-20T22:05:44.789Z",
"style":"DRUNKEN_BOXING",
"action":"PUNCH",
"weapon":"CHAIR",
"target":"ARMS",
"strength":8.3433
}
SIMULATION CONFIG
Let’s take a look at our example Simulation Config:
{
"workflows": [{
"workflowName": "jackieChan",
"workflowFilename": "jackieChanWorkflow.json"
}],
"producers": [{
"type": "kafka",
"broker.server": "192.168.59.103",
"broker.port": 9092,
"topic": "jackieChanCommand",
"flatten": false,
"sync": false
}]
55 Kafka – Dev Ops
You can find the full configuration options for each on the github page. We used a Kafka
producer because that is how you command our Jackie Chan robot.
WORKFLOW CONFIG
The Simulation Config above specifies that it will use a Workflow called
jackieChanWorkflow.json. This is where the meat of your configuration would live. Let’s
take a look at the example Workflow config and see how we are going to control Jackie
Chan:
{
"eventFrequency": 400,
56 Kafka – Dev Ops
"varyEventFrequency": true,
"repeatWorkflow": true,
"timeBetweenRepeat": 1500,
"varyRepeatFrequency": true,
"steps": [{
"config": [{
"timestamp": "now()",
"style": "random('KUNG_FU','WUSHU','DRUNKEN_BOXING')",
"action": "random('KICK','PUNCH','BLOCK','JUMP')",
"weapon": "random('BROAD_SWORD','STAFF','CHAIR','ROPE')",
"target": "random('HEAD','BODY','LEGS','ARMS')",
"strength": "double(1.0,10.0)"
}
],
"duration": 0
}]
57 Kafka – Dev Ops
• At the top are the properties that define how often events should be generated and if /
when this workflow should be repeated. So this is like saying we want Jackie Chan to
do a martial arts move every 400 milliseconds (he’s FAST!), then take a break for 1.5
seconds, and do another one.
• Next, are the Steps that this Workflow defines. Each Step has a config and a
duration. The duration specifies how long to run this step. The config is where it gets
interesting!
You’ll notice that the values for each of the object properties look a bit funny. These are
special Functions that we have created that allow us to generate values for each of the
properties. For instance, the “random(‘KICK’,’PUNCH’,’BLOCK’,’JUMP’)” function will
randomly choose one of the values and output it as the value of the “action” property in the
58 Kafka – Dev Ops
command message. The “now()” function will output the current date in an ISO8601 date
formatted string. The “double(1.0,10.0)” will generate a random double between 1 and 10
to determine the strength of the action that Jackie Chan will perform. If we wanted to, we
could make Jackie Chan perform combo moves by defining a number of Steps that will be
executed in order.
There are many more Functions available in the generator with everything from random
string generation, counters, random number generation, dates, and even support for
randomly generating arrays of data. We also support the ability to reference other
randomly generated values. For more info, please check out the full documentation on the
github page.
Once we have defined the Workflow, we can run it using the json-data-generator. To do
this, do the following:
1. If you have not already, go ahead and download the most recent release of the json-
data-generator.
2. Unpack the file you downloaded to a directory.
You will see logging in your console showing the events as they are being generated. The
jackieChanSimConfig.json generates events like these:
{"timestamp":"2015-05-
20T22:21:18.036Z","style":"WUSHU","action":"BLOCK","weapon":"CHAIR","target":"B
ODY","strength":4.7912}
{"timestamp":"2015-05-
20T22:21:19.247Z","style":"DRUNKEN_BOXING","action":"PUNCH","weapon":"BROA
D_SWORD","target":"ARMS","strength":3.0248}
{"timestamp":"2015-05-
20T22:21:20.947Z","style":"DRUNKEN_BOXING","action":"BLOCK","weapon":"ROPE"
,"target":"HEAD","strength":6.7571}
{"timestamp":"2015-05-
20T22:21:22.715Z","style":"WUSHU","action":"KICK","weapon":"BROAD_SWORD","tar
get":"ARMS","strength":9.2062}
{"timestamp":"2015-05-
20T22:21:23.852Z","style":"KUNG_FU","action":"PUNCH","weapon":"BROAD_SWOR
D","target":"HEAD","strength":4.6202}
{"timestamp":"2015-05-
20T22:21:25.195Z","style":"KUNG_FU","action":"JUMP","weapon":"ROPE","target":"A
RMS","strength":7.5303}
{"timestamp":"2015-05-
60 Kafka – Dev Ops
20T22:21:26.492Z","style":"DRUNKEN_BOXING","action":"PUNCH","weapon":"STAF
F","target":"HEAD","strength":1.1247}
{"timestamp":"2015-05-
20T22:21:28.042Z","style":"WUSHU","action":"BLOCK","weapon":"STAFF","target":"A
RMS","strength":5.5976}
{"timestamp":"2015-05-
20T22:21:29.422Z","style":"KUNG_FU","action":"BLOCK","weapon":"ROPE","target":"
ARMS","strength":2.152}
{"timestamp":"2015-05-
20T22:21:30.782Z","style":"DRUNKEN_BOXING","action":"BLOCK","weapon":"STAFF
","target":"ARMS","strength":6.2686}
{"timestamp":"2015-05-
20T22:21:32.128Z","style":"KUNG_FU","action":"KICK","weapon":"BROAD_SWORD","
target":"BODY","strength":2.3534}
IV. Resources
https://developer.ibm.com/hadoop/2017/04/10/kafka-security-mechanism-saslplain/
https://sharebigdata.wordpress.com/2018/01/21/implementing-sasl-plain/
https://developer.ibm.com/code/howtos/kafka-authn-authz
https://github.com/confluentinc/kafka-streams-examples/tree/4.1.x/
https://github.com/spring-cloud/spring-cloud-stream-samples/blob/master/kafka-
streams-samples/kafka-streams-table-
join/src/main/java/kafka/streams/table/join/KafkaStreamsTableJoin.java