Slide 5-6 Kafka
Slide 5-6 Kafka
Slide 5-6 Kafka
Trong-Hop Do
Kafka – A distributed event streaming flatform
What is event streaming?
What can I use event streaming for?
• To process payments and financial transactions in real-time, such as in stock exchanges, banks, and insurances.
• To track and monitor cars, trucks, fleets, and shipments in real-time, such as in logistics and the automotive industry.
• To continuously capture and analyze sensor data from IoT devices or other equipment, such as in factories and wind
parks.
• To collect and immediately react to customer interactions and orders, such as in retail, the hotel and travel industry,
• To monitor patients in hospital care and predict changes in condition to ensure timely treatment in emergencies.
• To connect, store, and make available data produced by different divisions of a company.
• To serve as the foundation for data platforms, event-driven architectures, and microservices.
What is a stream?
Store data in Kafka?
Tutorial 1: Kafka installation on Window
• Open C:\kafka\config\server.properties
• Open C:\kafka\config\zookeeper.properties
• By default Apache Kafka will run on port 9092 and Apache Zookeeper will run on port 2181.
Tutorial 2: Run Apache Kafka on Windows
• Install Kafka-Python
• Create a topic
!./kafka_2.13-3.3.1/bin/kafka-topics.sh --create --bootstrap-server 127.0.0.1:9092 --replication-factor 1 --partitions 1 --topic TestTopic
Run Kafka on Colab
• Open terminal using Xterm and run consumer (it will be empty at first)
• Open terminal using Xterm and run producer, write some lines and they will appear on the consumer’s terminal
Run Kafka on Colab
• Use kafka-python on Colab
Tutorial 5: Test Kafka and Spark Structure Streaming on Colab
• Start kafka
• Install PySpark
#currently, 3.3.0 is the latest version. However, you still need to specify this.
!pip install pyspark==3.3.0
topic_name = 'RandomNumber'
kafka_server = 'localhost:9092'
for e in range(1000):
data = {'number' : e}
producer.send(topic_name, value=data)
print(str(data) + " sent")
sleep(5)
producer.flush()
• Open another Jupyter Notebook
kafka_server = 'localhost:9092'
• Write stream
query1 = stream_writer1.start()
query2 = stream_writer2.start()
• View streaming result
Tutorial 7: Kafka and MongoDB on Window
Tutorial 8
https://towardsdatascience.com/make-a-mock-real-time-stream-of-data-with-python-and-kafka-7e5e23123582
Tutorial 8: streaming from CSV
• sendStream.py
Tutorial 8: streaming from CSV
• processStream.py
Tutorial 8: streaming from CSV
https://medium.com/@kevin.michael.horan/distributed-video-streaming-with-python-and-kafka-551de69fe1dd
Tutorial 9: Video streaming using Kafka
• Producer.py
Tutorial 9: Video streaming using Kafka
• Producer.py
Tutorial 9: Video streaming using Kafka
• Producer.py
Tutorial 9: Video streaming using Kafka
• consumer.py
Tutorial 9: Video streaming using Kafka
• Run consumer.py
Tutorial 9: Video streaming using Kafka
• Stream video from webcam
Tutorial 9: Video streaming using Kafka
• Stream a video entitled Countdow1.mp4
Tutorial 10
https://towardsdatascience.com/real-time-anomaly-detection-with-apache-kafka-and-python-3a40281c01c9
Tutorial 10: real-time anomaly detection
Tutorial 10: real-time anomaly detection
• Producer.py
Tutorial 10: real-time anomaly detection
• train.py
Tutorial 10: real-time anomaly detection
• detector.py
Tutorial 10: real-time anomaly detection
Tutorial 10: real-time anomaly detection
Tutorial 10: real-time anomaly detection
Tutorial 10: real-time anomaly detection
Tutorial 10: real-time anomaly detection
Tutorial 11: Tensorflow-IO and Kafka
https://www.tensorflow.org/io/tutorials/kafka
• Just follow https://www.tensorflow.org/io/tutorials/kafka
Tutorial 12: Spotify Recommendation System
https://www.analyticsvidhya.com/blog/2021/06/spotify-recommendation-system-using-pyspark-and-kafka-streaming/
Tutorial 13: Order book simulation
https://github.com/rongpenl/order-book-simulation
Tutorial 14: Create your own data stream
https://aiven.io/blog/create-your-own-data-stream-for-kafka-with-python-and-faker
Tutorial 15: Bigmart sale prediction
• Dataset: https://www.kaggle.com/datasets/brijbhushannanda1979/bigmart-sales-data
• Use train set to train some simple prediction model using Spark MLlib
• Stream data from test set to Kafka server (remember to set the time interval)
• Create Spark streaming dataframe from Kafka and apply the trained model to get the real-time prediction