Machine Learning Based Education Data Mining Through Student Session Streams
Machine Learning Based Education Data Mining Through Student Session Streams
Corresponding Author:
Shashirekha Hanumanthappa
Department of Computer Science and Engineering, Visvesvaraya Technological University
Ring Road, Hanchya Sathagally Layout, Mysore, Karnataka 570019, India
Email: [email protected]
1. INTRODUCTION
With the wide usage of the internet and the growth of information technology have affected the way
academics and industries learn i.e., it is moved from the conventional offline mode to online mode namely
the e-learning platform [1]. Especially during the COVID-19 pandemic period, all classes have moved to
an online model, highlighting the significance of the e-learning platform. However, significant challenges
exist in providing a reliable and accurate model to predict student performance [2]. Designing an effective
assessment model for understanding student behavior using session streams of the e-learning platform will
aid in improving students’ academic performance by providing personalized content.
Personalized content delivery for improving student performance according to individual behavior
in the e-learning platform is the major challenge of the current century [3]. Adaptive personalizing techniques
for understanding learner profiles have been emphasized [4], [5]. Recently, data mining (DM) and machine
learning (ML) have been used for building student performance prediction models. The DM has been used for
establishing useful insight from student session stream data of the e-learning platform as shown in Figure 1;
alongside, improves decision-making performance by establishing behavior patter from data [6]–[9]. Both ML
and DM methodologies are very promising in different fields such as business, and network security
including education. Recently, a new field has emerged namely education data mining (EDM) for enhancing
learning style, understanding behavior, and improving student performance [10]–[13]. The EDM data is
composed of different information such as administration data, student session stream activity, and student
academic performance data. Here they provided an EDM dataset collected from different databases and e-
learning systems. Here different ML models and an ensemble learning mechanism are constructed for
predicting student performance during the course. The outcome shows ensemble model outperforms another
model in terms of prediction accuracy [14]–[16]. However, when data is imbalanced these model fails to
establish feature affecting the predictive model; thus, providing poor classification accuracies. The objective
of this paper is to build an effective student prediction model for predicting student grades during the course
through an ensemble-based ML model that works well for student session stream e-learning data [17]–[19].
Existing models construct ensemble learning by combining multiple ML models. However, these models are
effective to address binary classification problems and when put forth under multi-label classification
problems considering data imbalance, these methods exhibit poor accuracy [20], [21]. The aforementioned
limitations motivate this research work to develop an improved student performance prediction model
through improved ensemble methodology [22], [23]. This paper presents an effective student performance
prediction through an improved ensemble-based ML model. First, the model briefs a detail of the ensemble
algorithm namely XGBoost. Then, discusses the limitation of standard XGBoost when data is imbalanced. In
addressing a modified XGBoost based student, a performance prediction model is presented [24], [25]. The
modified XGBoost (MXGB) encompasses an improved cross-validation mechanism for establishing features
affecting the accuracy of the student performance prediction model. Finally, an ensemble-based ML is
constructed for building an effective student performance predictive model. Here research significance is
discussed: i) the proposed student performance prediction model employs an efficient ensemble-based
predictive model through MXGB, which works well even when data is imbalanced; and ii) the MXGB
encompasses an improved cross-validation mechanism to study which feature impacts the accuracy of the
student prediction model; and the proposed student performance prediction model achieves better receiver
operating characteristic (ROC) performance such as accuracy, sensitivity, specificity, and sensitivity,
precision, and F-measure comparison with the state-of-art student performance prediction model.
In section 2, ML model for EDM of student session streams. In section 3, the outcome was achieved
using the proposed MXGB-based student performance prediction model over the existing ensemble-based
existing proposed student performance prediction model. In the last section, the significance of the MXGB-
based student performance prediction model over the existing ensemble-based student performance
prediction model is discussed.
Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 383-394
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 385
where 𝑗=1,2,3, …, 𝑚, outlines row size considered, 𝑏𝑗 ∈ {−1,1} defines 𝑗𝑡ℎ row output, and 𝑎𝑗 defines 𝑛-
dimension vector of self-determining features experimental of row 𝑗. In general, EDM data has diverse
features that are multi-dimensional. Nonetheless, with fewer rows 𝑚. Thus, for studying and designing
student performance prediction model 𝐺̂, for forecasting the real estimation of actual 𝐺 is defined as (2):
𝑔: 𝐴 → 𝐵 (2)
in this work modifying the feature selection process during training XGBoost through minimization of the
objective function and effective student performance prediction model is designed as shown in Figure 2.
where 𝑔𝑙 defines a distinct regression tree and (𝑦𝑖 ) defines the respective prediction outcome provided by the
respective 𝑙 − 𝑡ℎ tree concerning 𝑗 − 𝑡ℎ sample. The regression tree 𝑔𝑙 and its function can be learned through
the minimization of the following objective in (4).
In this work, 𝑚 defines training loss operation for measuring variance among predicated value 𝑧𝑗
and the actual value 𝑧𝑗 . To avoid the over-fitting problem, the parameter 𝛽 is used for penalizing the
complexity of the predictive model as (5):
Machine learning based education data mining through student … (Shashirekha Hanumanthappa)
386 ISSN: 2089-4864
1 2
𝛽(𝑔𝑙 ) = 𝛿𝑈 + 𝜇||𝑥|| 2 (5)
2
where 𝛿 and 𝜇 define the regularization parameter, 𝑈 defines the leaf size and 𝑥 defines the score of the
different leaves. The ensemble tree is constructed is through a summation process. Let 𝑧̂ (𝑢) define the
prediction outcome of the 𝑗 − 𝑡ℎ sample considering 𝑢 − 𝑡ℎ iterations, it requires to add 𝑔𝑢 for minimizing
the (6):
(𝑢−1)
𝐺 (𝑢) = ∑𝑜𝑗=1 𝑚 (𝑧𝑗 , 𝑧̂𝑗 + 𝑔(𝑦)) + 𝛽(𝑔) (6)
the (6) is simplified by eliminating constant parameter through second-order Taylor expansion as (7):
1 2
𝐺 (𝑢) = ∑𝑜𝑗=1[ℎ𝑗 𝑔𝑗 (𝑦𝑗 ) + 𝑖𝑗 𝑔𝑢 (𝑦𝑗 ) ] + 𝛽(𝑔𝑙 ) (7)
2
(𝑢−1)
ℎ𝑗 = 𝜕𝑧̂𝑧 𝑚(𝑧𝑗 , 𝑧̂ (𝑢−1) ) (8)
(𝑢−1) (𝑢−1)
𝑖𝑗 = 𝜕 2 𝑧̂𝑗 𝑚(𝑧𝑗 , 𝑧̂𝑗 ) (9)
therefore, the predictive model objective parameter is expressed using the (10).
1 2 1
𝐺 (𝑢) = ∑𝑜𝑗=1 [ℎ𝑗 𝑔𝑗 (𝑦𝑗 ) + 𝑖𝑗 𝑔𝑢 (𝑦𝑗 ) ] + 𝛿𝑈 + 𝜇 ∑𝑈 2
𝑘=1 𝑥𝑘 (10)
2 2
1
𝒪 (𝑢) == ∑𝑈 2
𝑗=1 [(∑𝑗∈𝑗𝑘 ℎ𝑗 )𝑥𝑗 (∑𝑗∈𝑗𝑘 𝑖𝑗 + 𝜇 )𝑥𝑘 ] + 𝛿𝑈 (11)
2
where 𝑗𝑘 defines the sample set of leaf 𝑘, which is represented as (12) and (13):
1
𝐺 (𝑢) == ∑𝑈 2
𝑗=1 [(∑𝑗∈𝑗𝑘 ℎ𝑗 )𝑥𝑗 (∑𝑗∈𝑗𝑘 𝑖𝑗 + 𝜇 )𝑥𝑘 ] + 𝛿𝑈 (12)
2
where 𝑟 defines the size of the tree, which is fixed, the optimal weights 𝑥𝑘∗ of leaf 𝑗 is obtained through the
(14).
𝐻𝑘
𝑥𝑘∗ = (14)
𝐼𝑘 +𝜇
1 𝐻𝑘2
𝐺 ∗ = ∑𝑈
𝑘−1 + 𝛿𝑈 (15)
2 𝐼𝑘 +𝜇
𝐻𝑘 = ∑𝑗∈𝑗𝑘 ℎ𝑗 (16)
𝐼𝑘 = ∑𝑗∈𝑗𝑘 𝑖𝑗 (17)
Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 383-394
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 387
The 𝐺 ∗ defines the qualities of tree 𝑟 where a smaller value indicates better tree structure. Though
XGBoost is efficient in obtaining high prediction accuracy; however, poor feature selection under unknown
environments or when data is imbalanced exhibit degradation of prediction accuracy. In addressing the
research problem, an effective feature selection within training data is modeled in the next sub-section.
1
𝐶𝑉(𝜎) = ∑𝑘𝑘=1 ∑𝑗∈𝐺𝑘 𝑃 (𝑏𝑗 , 𝑔̂𝜎−𝑘(𝑗) (𝑦𝑗 , 𝜎)) (18)
𝑀
however, the above equation does not identify which feature affects the accuracy of the predictive model. In
addressing this work an effective cross-validation with effective feature selection with high importance
affecting prediction accuracy is modeled as (19):
1
𝐶𝑉(𝜎) = ∑𝑠𝑠=1 ∑𝑘𝑘=1 ∑𝑗∈𝐺𝑘 𝑃 (𝑏𝑗 , 𝑔̂𝜎−𝑘(𝑗) (𝑦𝑗 , 𝜎)) (19)
𝑆𝑀
in (19), selecting ideal 𝜎̂ for optimizing the student prediction model is attained as (20).
(𝑗)
In (19), 𝑀 defines the size of the training dataset considered, (∙) defines the loss function and 𝑔̂𝜎 (∙)
defines a function to compute coefficients. The (19) is executed iteratively for constructing the best student
performance prediction model (i.e., its optimization of training error is done in the first phase; the parameter
is passed onto the second phase to understand and update the feature importance characteristic into the
predictive model. The optimization process to obtain effective features is obtained through the minimization
process of objective function employing gradient decent mechanism. The effective feature is selected
employing the ranking method (∙) for constructing a student performance prediction model through the (21):
0 𝑖𝑓 𝑛𝑗 𝑖𝑠 𝑛𝑜𝑡 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑
𝑟(𝑎) = { (21)
1 𝑖𝑓 𝑛𝑗 𝑖𝑠 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 𝑎𝑠 𝑜𝑝𝑡𝑖𝑚𝑎𝑙 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑚𝑜𝑑𝑒𝑙 𝑗 = 1,2,3, … , 𝑛
the ideal feature with maximum score considering varied 𝐾-folds instance is obtained as (23).
Then, compute the number of occurrences a particular feature is selected for 𝐾 feature subsets
having maximum score and the final feature subset is obtained as (24):
Machine learning based education data mining through student … (Shashirekha Hanumanthappa)
388 ISSN: 2089-4864
where (∙) depicts a case when where 𝑛𝑡ℎ feature is selected/not and mathematically represented as (25).
𝐾
0 𝑖𝑓 𝑞𝑗 𝑖𝑠 𝑐ℎ𝑜𝑠𝑒𝑛 𝑙𝑒𝑠𝑠𝑒𝑟 𝑡ℎ𝑎𝑛 𝑡𝑖𝑚𝑒, 𝑗 = 1,2,3, … , 𝑛
2
𝐹𝑠 (𝑎) = { 𝐾 (25)
1 𝑖𝑓𝑞𝑗 𝑖𝑠 𝑐ℎ𝑜𝑠𝑒𝑛 𝑔𝑟𝑒𝑎𝑡𝑒𝑟 𝑜𝑟 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 𝑡𝑖𝑚𝑒𝑠, 𝑗 = 1,2,3, … , 𝑛
2
The aforementioned equation is used for the generation of a subset of 𝑛′ selected features, where 𝑛𝑡ℎ
describe how many times a feature is selected. The enterprise performance management (EPM) training data
utilized is a subset through selected features for building an effective student prediction model. To reduce
randomness during the training process, 𝐾 −folds are built by iterating 𝑆 number of times in the first phase. In
the second phase, for reducing variance subset of features is selected. Therefore, the proposed MXGB-based
student performance prediction model significantly improves overall prediction accuracy in comparison with
state-of-art ML-based student performance prediction schemes.
where 𝑇𝑃 defines true positive, 𝐹𝑃 defines false positive, 𝑇𝑁 defines true negative, and 𝐹𝑁 defines false
negative. The sensitivity is computed as (27):
𝑇𝑃
Sensitivity = (27)
TP+FN
2×Precision×Sensitivity
F − measure = (30)
Precision×Sensitivity
Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 383-394
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 389
RF-based proposed MXGB-based student performance prediction model is much more efficient than other
ML-based student performance prediction models in terms of sensitivity. However, the MXBG-based brings
tradeoffs between higher sensitivity and specificity; thus, attaining much better student performance
prediction accuracies.
Further, performance is validated considering different ROC metrics such as specificity, recall,
accuracy, precision, and F-measure using different predictive models as shown in Figure 2. From Figure 2,
we can see the factor analysis based XGBoost (FA-XGB)-based predictive model achieves much better
performance in comparison with XGBoost and ensemble-based predictive model. Figure 5 shows the ROC
performance of different ML-based student performance prediction models.
Figure 7. Accuracy performance using ML-based student performance prediction model for different sessions
Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 383-394
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 391
Figure 8. Sensitivity performance using ML-based Figure 9. Specificity performance using ML-based
student performance prediction model for different student performance prediction model for different
sessions sessions
Figure 10. Precision performance using ML-based Figure 11. F-measure performance using ML-based
student performance prediction model for different student performance prediction model for different
sessions sessions
Machine learning based education data mining through student … (Shashirekha Hanumanthappa)
392 ISSN: 2089-4864
Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 383-394
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 393
4. CONCLUSION
Predicting the performance of a student by analyzing the student session stream is a challenging
task. ML algorithms have been used by various existing student performance prediction models to achieve
improved prediction outcomes. However, these models tend to achieve higher accuracy to specific student
data and when adapted to new data they exhibit poor performance. In addressing such issues, recent work has
used an ensemble-based ML model for choosing the best model to perform prediction tasks. However, when
data is imbalanced existing ensemble-based models exhibit poor performance. This paper presented an
efficient ensemble machine-learning model by modifying XGBoost that works well even when training data
is imbalanced. Here an effective cross-validation scheme is presented to identify which feature impacts the
accuracy of the prediction model. The cross-validation scheme employs an effective feature ranking
mechanism to improve prediction accuracy by optimizing the prediction error. The experiment is conducted
using standard student session stream data. The proposed MXGB model significantly improves accuracy,
sensitivity, specificity, precision, and F-measure performance in comparison with RF-based, LR-based,
ensemble-based, and XGBoost-based student performance prediction models. The performance of the
MXGB model will be tested using a more diverse dataset. Alongside this, would consider reducing training
errors by considering multi-class classification.
REFERENCES
[1] A. Moubayed, M. Injadat, A. B. Nassif, H. Lutfiyya, and A. Shami, “E-Learning: challenges and research opportunities using
machine learning & data analytics,” IEEE Access, vol. 6, pp. 39117–39138, 2018, doi: 10.1109/ACCESS.2018.2851790.
[2] F. Essalmi, L. J. B. Ayed, M. Jemni, S. Graf, and Kinshuk, “Generalized metrics for the analysis of E-learning personalization
strategies,” Computers in Human Behavior, vol. 48, pp. 310–322, Jul. 2015, doi: 10.1016/j.chb.2014.12.050.
[3] J. Yang, J. Ma, and S. K. Howard, “Usage profiling from mobile applications: a case study of online activity for Australian
primary schools,” Knowledge-Based Systems, vol. 191, Mar. 2020, doi: 10.1016/j.knosys.2019.105214.
[4] A. Wakjira and S. Bhattacharya, “Predicting student engagement in the online learning environment,” International Journal of
Web-Based Learning and Teaching Technologies, vol. 16, no. 6, pp. 1–21, Oct. 2021, doi: 10.4018/IJWLTT.287095.
[5] M. Hussain, W. Zhu, W. Zhang, and S. M. R. Abidi, “Student engagement predictions in an e-learning system and their impact on
student course assessment scores,” Computational Intelligence and Neuroscience, vol. 2018, pp. 1–21, Oct. 2018, doi:
10.1155/2018/6347186.
[6] G. Kaur and W. Singh, “Prediction of student performance using weka tool,” Research Cell : An International Journal of
Engineering Sciences, vol. 17, no. January, pp. 2229–6913, 2016.
[7] Y. Chen, Y. Mao, H. Liang, S. Yu, Y. Wei, and S. Leng, “Data poison detection schemes for distributed machine learning,” IEEE
Access, vol. 8, pp. 7442–7454, 2020, doi: 10.1109/ACCESS.2019.2962525.
[8] A. J. Stimpson and M. L. Cummings, “Assessing intervention timing in computer-based education using machine learning
algorithms,” IEEE Access, vol. 2, pp. 78–87, 2014, doi: 10.1109/ACCESS.2014.2303071.
[9] E. Alyahyan and D. Düştegör, “Predicting academic success in higher education: literature review and best practices,”
International Journal of Educational Technology in Higher Education, vol. 17, no. 1, 2020, doi: 10.1186/s41239-020-0177-7.
[10] M. Injadat, F. Salo, A. B. Nassif, A. Essex, and A. Shami, “Bayesian optimization with machine learning algorithms towards
anomaly detection,” in 2018 IEEE Global Communications Conference (GLOBECOM), IEEE, Dec. 2018, pp. 1–6. doi:
10.1109/GLOCOM.2018.8647714.
[11] L. Yang, A. Moubayed, I. Hamieh, and A. Shami, “Tree-based intelligent intrusion detection system in internet of vehicles,” in
2019 IEEE Global Communications Conference (GLOBECOM), 2019, pp. 1–6. doi: 10.1109/GLOBECOM38437.2019.9013892.
[12] A. Moubayed, M. Injadat, A. Shami, and H. Lutfiyya, “DNS typo-squatting domain detection: a data analytics & machine
learning based approach,” in 2018 IEEE Global Communications Conference (GLOBECOM), IEEE, Dec. 2018, pp. 1–7. doi:
10.1109/GLOCOM.2018.8647679.
[13] A. Namoun and A. Alshanqiti, “Predicting student performance using data mining and learning analytics techniques: a systematic
literature review,” Applied Sciences, vol. 11, no. 1, Dec. 2020, doi: 10.3390/app11010237.
[14] S. Ayouni, F. Hajjej, M. Maddeh, and S. Al-Otaibi, “A new ML-based approach to enhance student engagement in online
environment,” PLOS ONE, vol. 16, no. 11, Nov. 2021, doi: 10.1371/journal.pone.0258788.
Machine learning based education data mining through student … (Shashirekha Hanumanthappa)
394 ISSN: 2089-4864
[15] S. M. Aslam, A. K. Jilani, J. Sultana, and L. Almutairi, “Feature evaluation of emerging e-learning systems using machine
learning: an extensive survey,” IEEE Access, vol. 9, pp. 69573–69587, 2021, doi: 10.1109/ACCESS.2021.3077663.
[16] S. S. Khanal, P. W. C. Prasad, A. Alsadoon, and A. Maag, “A systematic review: machine learning based recommendation
systems for e-learning,” Education and Information Technologies, vol. 25, no. 4, pp. 2635–2664, Jul. 2020, doi: 10.1007/s10639-
019-10063-9.
[17] S. Helal et al., “Predicting academic performance by considering student heterogeneity,” Knowledge-Based Systems, vol. 161, pp.
134–146, Dec. 2018, doi: 10.1016/j.knosys.2018.07.042.
[18] L. Juhaňák, J. Zounek, and L. Rohlíková, “Using process mining to analyze students’ quiz-taking behavior patterns in a learning
management system,” Computers in Human Behavior, vol. 92, pp. 496–506, Mar. 2019, doi: 10.1016/j.chb.2017.12.015.
[19] Q. Liu et al., “Exploiting cognitive structure for adaptive learning,” in Proceedings of the 25th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining, New York, NY, USA: ACM, Jul. 2019, pp. 627–635. doi:
10.1145/3292500.3330922.
[20] F. Wang et al., “Neural cognitive diagnosis for intelligent education systems,” Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 34, no. 04, pp. 6153–6161, Apr. 2020, doi: 10.1609/aaai.v34i04.6080.
[21] B. Kehrwald, “Understanding social presence in text‐based online learning environments,” Distance Education, vol. 29, no. 1, pp.
89–106, May 2008, doi: 10.1080/01587910802004860.
[22] M. Injadat, A. Moubayed, A. B. Nassif, and A. Shami, “Systematic ensemble model selection approach for educational data
mining,” Knowledge-Based Systems, vol. 200, Jul. 2020, doi: 10.1016/j.knosys.2020.105992.
[23] M. Injadat, A. Moubayed, A. B. Nassif, and A. Shami, “Multi-split optimized bagging ensemble model selection for multi-class
educational data mining,” Applied Intelligence, vol. 50, no. 12, pp. 4506–4528, Dec. 2020, doi: 10.1007/s10489-020-01776-3.
[24] K. Abe, “Data mining and machine learning applications for educational big data in the university,” in 2019 IEEE Intl Conf on
Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big
Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), IEEE, Aug.
2019, pp. 350–355. doi: 10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00071.
[25] T. Chen and C. Guestrin, “XGBoost: a scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, New York, NY, USA: ACM, Aug. 2016, pp. 785–794. doi:
10.1145/2939672.2939785.
BIOGRAPHIES OF AUTHORS
Dr. Chetana Prakash holds Ph.D. in computer science and engineering and she is
currently working as professor in the Department of Computer Science and Engineering,
Bapuji Institute of Engineering and Technology, Davangere. She has teaching experience of
more than 30 years. Her field of interest is speech signal processing, data mining, image
processing, fuzzy techniques, IoT, and data analytics. She can be contacted at email:
[email protected].
Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 383-394