Exercise Book
Exercise Book
Exercise Book
Version 7.1.1-v1.0.0
m
.co
ail
gm
2@
91
a.
irz
fm
Table of Contents
Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Lab 01 Exploring Apache Kafka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Lab 02 Fundamentals of Apache Kafka. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Lab 03 How Kafka Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Lab 04 Integrating Kafka into your Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Lab 05 The Confluent Platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
m
.co
ail
gm
2@
91
a.
irz
fm
Preamble
Copyright & Trademarks
3. Clone the GitHub repository with the sample solutions into the folder ~/confluent-
fundamentals:
Bash:
$ cd ~/confluent-fundamentals
m
.co
ail
gm
2@
5. Run a script to add entries to your /etc/hosts file. This allows you to refer to containers
91
a.
in your cluster via hostnames like kafka and zookeeper. If you are prompted for a
irz
fm
This step isn’t needed if you are following the Running Labs in Docker for
Desktop appendix.
$ ~/confluent-fundamentals/update-hosts.sh
Done!
Docker Basics
This exercise book heavily relies on Docker (containers). Yet, to successfully complete all the
exercises you do NOT need any previous knowledge of Docker. Though, if you have some
basic knowledge of Docker it is definitely a plus.
Don’t worry if the terms broker, ZooKeeper and Kafka client don’t make much sense to you,
we will introduce them in detail in the next module.
m
.co
ail
gm
2@
91
a.
irz
fm
For more info about our source of IoT data please refer to the following URL:
https://digitransit.fi/en/developers/apis/4-realtime-api/vehicle-positions/
1. To do this navigate to the project folder and execute the start.sh script:
Starting the cluster will take a moment, thus be patient. You will observe an output
similar to this (shortened for readability):
...
Creating network "explore_confluent" with the default driver
Creating explore_producer_1 ... done
Creating explore_kafka_1 ... done
Creating explore_zookeeper_1 ... done
Waiting kafka to launch on 9092...
kafka not yet ready...
kafka not yet ready...
...
kafka is now ready!
Connection to kafka port 9092 [tcp/XmlIpcRegSvc] succeeded!
c5eaba41ac19d739d53e6e44b2909a36563ad22d428d5eac9cc5aaa2dbbea18b
2. Next use the tool kafka-console-consumer installed on your lab VM to read data that
m
$ kafka-console-consumer \
91
--bootstrap-server kafka:9092 \
a.
irz
--topic vehicle-positions
fm
After a short moment you should see records being output on your terminal window in
fast cadence. These records are live data coming from an MQTT source. Each record
corresponds to a vehicle position (bus, tram or train) of the Finnish public transport
provider. The data looks like this (shortened):
route":"1040","occu":0}}
ail
...
gm
2@
91
a.
Cleanup
1. Before you end, please clean up your system by running the stop.sh script in the project
folder:
$ ./stop.sh
Conclusion
In this exercise we created a simple real-time data pipeline powered by Kafka. We have
written data that originates from a public IoT data source into a simple Kafka cluster. This
data we then have consumed with a simple Kafka tool called kafka-console-consumer.
m
.co
ail
gm
2@
91
a.
irz
fm
m
.co
ail
gm
2@
91
a.
irz
fm
Prerequisites
1. Run the Kafka cluster by navigating to the project folder and executing the start.sh
script:
$ cd ~/confluent-fundamentals/labs/fundamentals
$ ./start.sh
_confluent-metrics
m
.co
ail
gm
2@
91
a.
irz
fm
3. To verify the details of the topic just created we can use the --describe parameter:
giving us this:
segment.bytes=536870912,retention.bytes=536870912
ail
We can see a line for each partition created. For each partition we get the leader broker,
the replica placement and the ISR list (In-sync Replica list). In our case the list looks
simple since we only have one single replica, placed on broker 101.
4. Try to create another topic called test-topic with 3 partitions and replication factor 1.
5. List all the topics in your Kafka cluster. You should see this:
_confluent-metrics
test-topic
vehicle-positions
7. Double check that the topic is gone by listing all topics in the cluster.
1. Use the tool kafka-topics to create a topic called sample-topic with 3 partitions and
a replication factor of 1. .co
m
If you forgot how to do this, then have a look at how we created the topic
ail
gm
>hello
4. Type a few more lines at the prompt (each terminated with <Enter>):
>world
>Kafka
>is
>cool!
hello
Kafka
is
world
cool!
Notice how the order of the items entered is scrambled. Please take a moment to reflect
why this happens. Discuss your findings with your peers.
7. So far we have produced and consumed data without a key. Let’s now run the producer
gm
2@
The last two parameters tell the producer to expect a key and to use the comma (,) as a
separator between key and value at the input.
>1,apples
>2,pears
>3,walnuts
>4,peanuts
>5,oranges
null hello
null Kafka
1 apples
5 oranges
null is
4 peanuts
null world
null cool!
2 pears
3 walnuts
m
.co
ail
Notice how null is output for the key for the values we first entered without defining a
gm
2@
key.
91
a.
irz
fm
10. Once again the question is: "Why is the order of the items read from the topic not the
order that you produced the messages?"
$ zookeeper-shell zookeeper
WATCHER::
2. From within the zookeeper-shell application, type ls / to view the directory structure
in ZooKeeper. Note the / is required.
ls /
ls /brokers
fm
ls /brokers/ids
[101]
Note the output [101], indicating that we have a single broker with ID 101 in our cluster.
5. Try to find out what’s to be found in other nodes of the ZooKeeper data tree, e.g. to find
out something about the cluster itself use:
get /cluster/id
{"version":"1","id":"Rslk7ZJnRsGFfeHwfwhzmw"}
Cleanup
1. Before you end, please clean up your system by running the stop.sh script in the project
folder:
$ ./stop.sh
Conclusion
In this exercise we created a simple Kafka cluster. We then used the tool kafka-topics to
list, create, describe and delete topics. Next we used the tools kafka-console-producer
and kafka-console-consumer to produce data to and consume data from Kafka. Finally
m
we used the ZooKeeper shell to analyze some of the data stored by Kafka in ZooKeeper.
.co
ail
gm
2@
91
a.
irz
fm
m
.co
ail
gm
2@
91
a.
irz
fm
Prerequisites
1. Run the Kafka cluster by navigating to the project folder and executing the start.sh
script:
$ cd ~/confluent-fundamentals/labs/advanced-topics
$ ./start.sh
This will start our mini Kafka cluster, create the topic vehicle-positions, and then
start the producer that is writing data from our IoT source to the topic.
$ cd ~/confluent-fundamentals/labs/advanced-topics/consumer
$ code .
4. To set a breakpoint at line 15, click the left margin and then click on the Debug link right
above line 14 to start debugging (or use the menu Run→Start Debugging):
m
.co
ail
gm
2@
91
a.
irz
fm
The application should start and code execution should stop at line 15:
6. If you let the app run your output in the DEBUG CONSOLE should look similar to this:
* Starting VP Consumer *
...
offset = 4728, key =
m
/hfp/v1/journey/ongoing/bus/0012/01806/2551/1/Westendinas./14:11/2213
.co
ail
06-
2@
21","lat":60.181576,"odo":14370,"oper":12,"desi":"551","veh":1806,"ts
91
a.
t":"2019-06-
irz
21T11:46:03Z","dir":"1","tsi":1561117563,"hdg":207,"start":"14:11","d
fm
l":34,"jrn":86,"line":842,"spd":0.48,"drst":0,"acc":0.39}}
offset = 4729, key =
/hfp/v1/journey/ongoing/bus/0022/00625/4624/2/Tikkurila/14:37/4700210
/4/60;25/30/26/22, value = {"VP":{"long":25.062562,"oday":"2019-06-
21","lat":60.322748,"odo":3694,"oper":22,"desi":"624","veh":625,"tst"
:"2019-06-
21T11:46:03Z","dir":"2","tsi":1561117563,"hdg":236,"start":"14:37","d
l":-60,"jrn":152,"line":809,"spd":3.62,"drst":0,"acc":0.23}}
...
7. Stop the application with the Stop button on the debug toolbar.
$ cd ~/confluent-fundamentals/labs/advanced-topics/consumer
$ ./build-image.sh
please be patient, this takes a moment or two. You should see something like this in your
terminal (shortened for readability):
---> a8f70d1286d9
ail
...
gm
---> d6a3e36f38d3
Step 13/13 : CMD java -classpath "lib/*"
clients.VehiclePositionConsumer
---> Running in c842fd617a64
Removing intermediate container c842fd617a64
---> 04963927cc8b
Successfully built 04963927cc8b
Successfully tagged sample-consumer:1.0
$ ./run-consumer.sh
97d34f47f2e477f66f360cd6b0e83bb...
$ kafka-consumer-groups \
--bootstrap-server kafka:9092 \
--group vp-consumer \
--describe
7689 consumer-vp-consumer-1-0b9d07d9-322b-46c7-a11c-
2@
3830 consumer-vp-consumer-1-0b9d07d9-322b-46c7-a11c-
fm
3. Repeat the above command a few times and observe how the LAG behaves.
$ watch kafka-consumer-groups \
--bootstrap-server kafka:9092 \
--group vp-consumer \
--describe
$ ./run-consumer.sh
--group vp-consumer \
ail
gm
--describe
2@
91
CLIENT-ID
vp-consumer vehicle-positions 0 14878 14972
94 consumer-vp-consumer-1-0b9... /172.21.0.5
consumer-vp-consumer-1
vp-consumer vehicle-positions 1 16895 16971
76 consumer-vp-consumer-1-0b9... /172.21.0.5
consumer-vp-consumer-1
vp-consumer vehicle-positions 2 15186 15278
92 consumer-vp-consumer-1-0b9... /172.21.0.5
consumer-vp-consumer-1
vp-consumer vehicle-positions 3 16271 16330
59 consumer-vp-consumer-1-84f... /172.21.0.6
consumer-vp-consumer-1
vp-consumer vehicle-positions 4 14009 14066
57 consumer-vp-consumer-1-84f... /172.21.0.6
consumer-vp-consumer-1
vp-consumer vehicle-positions 5 17284 17339
55 consumer-vp-consumer-1-84f... /172.21.0.6
consumer-vp-consumer-1
6. Scale up the consumer group further and observe how the consumer lag behaves.
7. What happens if you scale the consumer group to more than 6 instances?
Cleanup
1. Before you end, please clean up your system by running the stop.sh script in the project
folder:
$ cd ~/confluent-fundamentals/labs/advanced-topics
$ ./stop.sh
Conclusion
m
.co
ail
In this lab we have built and run a simple Kafka consumer. We then have scaled the
gm
2@
consumer up and analyzed the effect of the scaling using the tool kafka-consumer-groups.
91
a.
irz
fm
m
.co
ail
gm
2@
91
a.
irz
fm
Prerequisites
1. Run a simple Kafka cluster, including Confluent Schema Registry, by navigating to the
project folder and executing the start.sh script:
$ cd ~/confluent-fundamentals/labs/ecosystem
$ ./start.sh
2. Let’s create the topic called stations-avro we need for our sample:
$ export SCHEMA='{
"type":"record",
"name":"station",
"fields":[
{"name":"city","type":"string"},
{"name":"country","type":"string"}
]
}'
m
.co
ail
gm
2@
2. Now let’s use this schema and provide it as an argument to the kafka-avro-console-
91
producer tool:
a.
irz
fm
$ kafka-avro-console-producer \
--broker-list kafka:9092 \
--topic stations-avro \
--property schema.registry.url=http://schema-registry:8081 \
--property value.schema="$SCHEMA"
We pass the information about the location of the Schema Registry and the
schema itself as properties to the tool.
3. Now let’s add some data. Enter the following values to the producer:
$ kafka-avro-console-consumer \
--bootstrap-server kafka:9092 \
--topic stations-avro \
--from-beginning \
--property schema.registry.url=http://schema-registry:8081
{"city":"Pretoria","country":"South Africa"}
{"city":"Cairo","country":"Egypt"}
{"city":"Nairobi","country":"Kenya"}
{"city":"Addis Ababa","country":"Ethiopia"}
In this exercise we will use the MQTT source connector from Confluent Hub
91
a.
If you forgot how to do this, then have a look at how we created the topic
vehicle-positions in the exercise Fundamentals of Apache Kafka.
Alternatively simply type kafka-topics at the command line and hit
<Enter>. A description of all options will be output.
2. Kafka Connect is running as part of our cluster. It already has the MQTT Connector
installed from Confluent Hub. Create a Kafka Connect MQTT Source connector using the
Connect REST API:
If you are curious, you can find the Dockerfile in the subfolder connect of
.co
ail
the project folder, which we use to install the requested MQTT connector
gm
2@
$ curl -s http://connect:8083/connectors
["mqtt-source"]
and
{
"name": "mqtt-source",
"connector": {
"state": "RUNNING",
"worker_id": "connect:8083"
},
"tasks": [
{
"id": 0,
"state": "RUNNING",
"worker_id": "connect:8083"
}
],
"type": "source"
}
Both, the state of the connector and the task should be RUNNING.
m
--topic vehicle-positions \
a.
irz
--from-beginning \
fm
--max-messages 5
{"VP":{"desi":"I","dir":"1","oper":90,"veh":1034,"tst":"2019-10-
ail
18T09:10:27.163Z","tsi":1571389827,"spd":0.00,"hdg":174,"lat":60.2613
gm
66,"long":24.854879,"acc":0.00,"dl":0,"odo":37563,"drst":0,"oday":"20
2@
19-10-
91
a.
18","jrn":9048,"line":279,"start":"11:26","loc":"GPS","stop":4150501,
irz
fm
"route":"3001I","occu":0}}
Processed a total of 5 messages
demonstrating that indeed, the MQTT source connector did import vehicle positions from
the source.
Cleanup
1. Before you end, please clean up your system by running the stop.sh script in the project
folder:
$ ./stop.sh
Conclusion
In this lab we have defined an Avro schema used to serialize and deserialize the value part of
m
.co
ail
gm
2@
91
a.
irz
fm
Prerequisites
1. Run the application by navigating to the project folder and executing the start.sh
script:
$ cd ~/confluent-fundamentals/labs/confluent-platform
$ ./start.sh
This will start the Kafka cluster, ksqlDB Server, Confluent Control Center and the
producer. It will take approximately 2 minutes for Control Center to start serving.
m
.co
ail
gm
2@
91
a.
irz
fm
If your cluster is shown as unhealthy then you may have to wait a few
moments until the cluster has stabilized.
2. Select controlcenter.cluster and you’ll see a view called Cluster overview showing a set of
cluster metrics that are indicators for the overall health of the Kafka cluster:
© 2014-2021 Confluent, Inc. Do not reproduce without prior written consent. 31
Notice the Control Center Cluster tabs on the left with the items Brokers,
Topics, Connect and so on. We are currently on the Cluster overview
m
.co
ail
tab.
gm
2@
91
a.
3. From the list of tabs on the left select Topics. You should see this:
irz
fm
4. Click on the topic vehicle-positions and you will be transferred to the topic Overview
page:
m
.co
ail
gm
2@
91
a.
irz
fm
Here you see that our topic has 6 partitions and that they all reside on broker 101 (the
only one we have). Notice that this is a tabbed view and we are on the Overview tab.
Notice that you have 2 display modes for the records, tabular and card view. Experiment
gm
2@
with them. Also notice the Topic Summary on the left side of the view, listing message
91
6. Now click on the tab Schema. You will not see anything meaningful here since we are not
using AVRO, Protobuf, or JSON as data format in our sample. If we used one of these,
then this view would show you schema details and version(s).
ksqldb.
.co
ail
gm
2@
Notice that this is a tabbed view. Currently we are on the ksqlDB Editor tab.
4. Now enter PRINT 'vehicle-positions'; in the ksqlDB editor field and then click the
Run button. You should get a raw output of the content of the topic vehicle-
positions
The single quotes are vital to prevent the hyphen from causing an error.
m
.co
ail
gm
2@
91
a.
irz
fm
5. Now let’s define a stream from the topic vehicle-positions. In the ksqlDB editor field
enter the following query:
seq INTEGER
ail
>
gm
) WITH(KAFKA_TOPIC='vehicle-positions', VALUE_FORMAT='JSON');
2@
91
a.
irz
and then click the Run button. ksqlDB will create a stream called VEHICLE_POSITIONS.
fm
]
}
6. Now we want to use the query data from the stream we just created. In the ksqlDB
editor field enter the following query:
m
.co
ail
gm
2@
and we will see an output similar to this (This may take a few seconds to populate the
table on the lower right corner of the screen):
717A | 1
N | 1
736 | 1
18 | 2
203N | 2
172 | 1
975 | 2
841 | 2
55 | 1
443 | 1
8. Now let’s try to filter the data and only show records for the bus number 600:
Rautatientori - Lentoasema. In the ksqlDB editor field enter the following query:
Cleanup
1. Before you end, please clean up your system by running the stop.sh script in the project
folder from your labVM Terminal window:
$ cd ~/confluent-fundamentals/labs/confluent-platform
$ ./stop.sh
Conclusion
m
.co
ail
gm
In this lab we used Confluent Control Center to monitor our Kafka cluster. We used it to
2@
91
inspect the topic vehicle-positions, specifically the messages that flow into the topic
a.
irz
and their schema. We also observed the consumer lag. Finally, we used ksqlDB to do simple
fm