1 - PDFsam - Data Streaming Architecture Based On Apache Kafka and Github For Tracking Students
1 - PDFsam - Data Streaming Architecture Based On Apache Kafka and Github For Tracking Students
Abstract — Data streaming architecture can be used to practices into account, possible architecture is formed. The
collect data and gain insights into the dynamics of individu- purpose of this paper is to investigate the integration of
al or collaborative software development activity that takes the version control platform (GitHub platform) with the
place in higher education courses. There is a place to further data streaming platforms such as Kafka to process events
investigate streaming architecture in a given context. The
generated while conducting higher education software en-
code versioning platforms, such as GitHub, serving as data
sources in the existing implementations of data streaming ar- gineering courses. This use case is lacking in practice and
chitecture are lacking in practice. The goal of this paper is to could be beneficial for further related work as far as higher
investigate the implementation of a custom data streaming education is concerned.
architecture that could be used to track real-time students’
analytics in higher education software development courses. The solution gives a presentation of a data streaming
The solution is based on Apache Kafka and GitHub plat- architecture based on Apache Kafka and the GitHub plat-
forms. Also, the architecture developed in the paper could form. Besides, GitHub webhook concept is described, as
be considered when planning on integrating LMS (Learning well as the flow of communication between Kafka produc-
Management System) as a visual web interface for students’ er and Kafka consumer. In the proposed solution, the com-
analytics. munication between Kafka producer and Kafka consumer
Keywords — data streaming architecture, Apache Kafka, starts with a GitHub webhook event.
GitHub, higher education, software development, Learning
Kafka producer and Kafka consumer are implemented
Management System (LMS)
using the Java Spring Boot framework. Java Spring Boot
is an open-source, microservice-based Java web frame-
I. INTRODUCTION work [4]. The microservice architecture provides develop-
ers with a fully enclosed application, including embedded
Data streaming architecture is based on the concept of application servers [4].
events [1]. Event Streaming Platforms, which are based
on the data streaming architecture, provide the infrastruc- The question of integrating Learning Management sys-
ture that enables software to react in real-time to the giv- tems (LMS) such as Moodle LMS into the architecture for
en events [1]. Apache Kafka is an open-source streaming a unified dashboard preview of students’ analytics is also
platform that can make use of producers which are appli- considered.
cations that are sending messages to the Kafka broker [2].
Kafka Broker stores messages that are later accessed by II. LITERATURE REVIEW
consumers [2].
As the authors state in [5], the GitHub platform pro-
Git represents a system that enables tracking chang-
vides insights into social coding activities. Apart from the
es to the user files and it is considered a Version Control
popular usage of the GitHub platform in the software de-
System (VCS) [3]. GitHub platform relies on git and its’
velopment industry, this is also the reason to consider us-
commands to perform version control of user files.
ing GitHub as a collaborative software development plat-
The aim of this paper is to provide a possible solution form [6] in higher education setup and as a data source in
to the data steaming architecture that would be used to col- data streaming architecture.
lect and process data on students’ activity that takes place
GitHub data analysis done in [7] demonstrates the pos-
on version control platforms in higher education software
sibilities of data generated through events on the GitHub
development courses.
platform. Different analytics are considered including the
Upon considering different use cases of implementing number of commits per contributor and SNA (Social Net-
data streaming architecture and taking good architectural work Analysis) analysis [7]. Those analytics could also be
considered when implementing Kafka consumer.
2022 International conference on E-business technologies (EBT) 133
Some of the research papers that are dealing with the Another usage of data streaming is implemented in
integration of code versioning platforms into the curricu- CERN HSE (occupational Health & Safety and Environ-
lum of the software development university courses have mental protection) Unit [16] which deals with the imple-
relied on the GitHub platform and its’ functionality to gain mentation of the CERN Safety Policy. Researchers de-
insight into students’ activities [8][9][10]. However, using veloped REMUS (Radiation and Environmental Unified
GitHub as a data source provider to deal with real-time Supervision) system that is using an open-source Apache
stream processing and analytics in an educational environ- Kafka streaming platform to stream real-time data to their
ment is lacking in practice. Web Interfaces and Data Visualization Tools [16].