0% found this document useful (0 votes)

21 views12 pages

Machine Learning Based Education Data Mining Through Student Session Streams

Recently, significant growth in using online-based learning stream (i.e., elearning systems) have been seen due to pandemic such as COVID-19. Forecasting student performance has become a major task as an institution is focusing on improving the quality of education and students' performance. Data mining (DM) employing machine learning (ML) techniques have been employed in the e-learning platform for analyzing student session streams and predicting academic performance with good effects. A recen

Uploaded by

IJRES team

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views12 pages

Machine Learning Based Education Data Mining Through Student Session Streams

Uploaded by

IJRES team

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

International Journal of Reconfigurable and Embedded Systems (IJRES)

Vol. 13, No. 2, July 2024, pp. 383~394

ISSN: 2089-4864, DOI: 10.11591/ijres.v13.i2.pp383-394  383

Machine learning based education data mining through student

session streams

Shashirekha Hanumanthappa1, Chetana Prakash2

1
Department of Computer Science and Engineering, Visvesvaraya Technological University, Mysore, India
2
Department of Computer Science and Engineering, Bapuji Institute of Engineering and Technology, Davanagere, India

Article Info ABSTRACT

Article history: Recently, significant growth in using online-based learning stream (i.e., e-
learning systems) have been seen due to pandemic such as COVID-19.
Received Oct 16, 2023 Forecasting student performance has become a major task as an institution is
Revised Dec 18, 2023 focusing on improving the quality of education and students' performance.
Accepted Dec 30, 2023 Data mining (DM) employing machine learning (ML) techniques have been
employed in the e-learning platform for analyzing student session streams
and predicting academic performance with good effects. A recent, study
Keywords: shows ML-based methodologies exhibit when data is imbalanced. In
addressing ensemble learning by combining multiple ML algorithms for
Data imbalance choosing the best model according to data. However, the existing ensemble-
E-learning based model does not incorporate feature importance into the student
Ensemble algorithm performance prediction model. Thus, exhibits poor performance, especially
Feature importance for multi-label classification. In addressing this, this paper presents an
Machine learning improved ensemble learning mechanism by modifying the XGBoost
algorithm, namely modified XGBoost (MXGB). The MXGB incorporates an
effective cross-validation scheme that learns correlation among features
more efficiently. The experiment outcome shows the proposed MXGB-
abased student performance prediction model achieves much better
prediction accuracy contrary to the state-of-art ensemble-based student
performance prediction model.
This is an open access article under the CC BY-SA license.

Corresponding Author:
Shashirekha Hanumanthappa
Department of Computer Science and Engineering, Visvesvaraya Technological University
Ring Road, Hanchya Sathagally Layout, Mysore, Karnataka 570019, India
Email: [email protected]

1. INTRODUCTION
With the wide usage of the internet and the growth of information technology have affected the way
academics and industries learn i.e., it is moved from the conventional offline mode to online mode namely
the e-learning platform [1]. Especially during the COVID-19 pandemic period, all classes have moved to
an online model, highlighting the significance of the e-learning platform. However, significant challenges
exist in providing a reliable and accurate model to predict student performance [2]. Designing an effective
assessment model for understanding student behavior using session streams of the e-learning platform will
aid in improving students’ academic performance by providing personalized content.
Personalized content delivery for improving student performance according to individual behavior
in the e-learning platform is the major challenge of the current century [3]. Adaptive personalizing techniques
for understanding learner profiles have been emphasized [4], [5]. Recently, data mining (DM) and machine
learning (ML) have been used for building student performance prediction models. The DM has been used for
establishing useful insight from student session stream data of the e-learning platform as shown in Figure 1;

Journal homepage: http://ijres.iaescore.com

384  ISSN: 2089-4864

alongside, improves decision-making performance by establishing behavior patter from data [6]–[9]. Both ML
and DM methodologies are very promising in different fields such as business, and network security
including education. Recently, a new field has emerged namely education data mining (EDM) for enhancing
learning style, understanding behavior, and improving student performance [10]–[13]. The EDM data is
composed of different information such as administration data, student session stream activity, and student
academic performance data. Here they provided an EDM dataset collected from different databases and e-
learning systems. Here different ML models and an ensemble learning mechanism are constructed for
predicting student performance during the course. The outcome shows ensemble model outperforms another
model in terms of prediction accuracy [14]–[16]. However, when data is imbalanced these model fails to
establish feature affecting the predictive model; thus, providing poor classification accuracies. The objective
of this paper is to build an effective student prediction model for predicting student grades during the course
through an ensemble-based ML model that works well for student session stream e-learning data [17]–[19].
Existing models construct ensemble learning by combining multiple ML models. However, these models are
effective to address binary classification problems and when put forth under multi-label classification
problems considering data imbalance, these methods exhibit poor accuracy [20], [21]. The aforementioned
limitations motivate this research work to develop an improved student performance prediction model
through improved ensemble methodology [22], [23]. This paper presents an effective student performance
prediction through an improved ensemble-based ML model. First, the model briefs a detail of the ensemble
algorithm namely XGBoost. Then, discusses the limitation of standard XGBoost when data is imbalanced. In
addressing a modified XGBoost based student, a performance prediction model is presented [24], [25]. The
modified XGBoost (MXGB) encompasses an improved cross-validation mechanism for establishing features
affecting the accuracy of the student performance prediction model. Finally, an ensemble-based ML is
constructed for building an effective student performance predictive model. Here research significance is
discussed: i) the proposed student performance prediction model employs an efficient ensemble-based
predictive model through MXGB, which works well even when data is imbalanced; and ii) the MXGB
encompasses an improved cross-validation mechanism to study which feature impacts the accuracy of the
student prediction model; and the proposed student performance prediction model achieves better receiver
operating characteristic (ROC) performance such as accuracy, sensitivity, specificity, and sensitivity,
precision, and F-measure comparison with the state-of-art student performance prediction model.

Figure 1. General design of student performance prediction through ML models

In section 2, ML model for EDM of student session streams. In section 3, the outcome was achieved
using the proposed MXGB-based student performance prediction model over the existing ensemble-based
existing proposed student performance prediction model. In the last section, the significance of the MXGB-
based student performance prediction model over the existing ensemble-based student performance
prediction model is discussed.

2. MACHINE LEARNING MODEL FOR EDM OF STUDENT SESSION STREAMS

This section presents an improved ML model namely MXGB for EDM of student session streams.
The MXGB is an improvement of the standard XGBoost by considering an effective feature selection
mechanism. The dataset of standard EDM is defined as (1):

Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 383-394
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864  385

𝐸 = {(𝑎1 , 𝑏1 ), (𝑎2 , 𝑏2 ), … , (𝑎𝑚, 𝑏𝑚)} (1)

where 𝑗=1,2,3, …, 𝑚, outlines row size considered, 𝑏𝑗 ∈ {−1,1} defines 𝑗𝑡ℎ row output, and 𝑎𝑗 defines 𝑛-
dimension vector of self-determining features experimental of row 𝑗. In general, EDM data has diverse
features that are multi-dimensional. Nonetheless, with fewer rows 𝑚. Thus, for studying and designing
student performance prediction model 𝐺̂, for forecasting the real estimation of actual 𝐺 is defined as (2):

𝑔: 𝐴 → 𝐵 (2)

in this work modifying the feature selection process during training XGBoost through minimization of the
objective function and effective student performance prediction model is designed as shown in Figure 2.

Figure 2. Proposed ML model for EDM of student session streams

2.1. XGBoost prediction algorithm

XGBoost algorithm is an improvised version of the gradient boosting algorithm [25] where weaker
classifiers are combined for constructing strong classifiers for attaining better classification outcomes. Let
consider a student session stream data 𝐸 = {(𝑦𝑗 , 𝑧𝑗 ); 𝑗 = 1 … 𝑜, 𝑦𝑗 ∈ 𝘚𝑛 , 𝑧𝑗 ∈ 𝘚}, which composed of 𝑜
samples of data with 𝑛 features. Let 𝑧𝑗 the predicted outcome by models as (3):

𝑍𝑗̂ = ∑𝐿𝑙−1 𝑔𝑙 (𝑦𝑖 ), 𝑔𝑙 ∈ 𝐺 (3)

where 𝑔𝑙 defines a distinct regression tree and (𝑦𝑖 ) defines the respective prediction outcome provided by the
respective 𝑙 − 𝑡ℎ tree concerning 𝑗 − 𝑡ℎ sample. The regression tree 𝑔𝑙 and its function can be learned through
the minimization of the following objective in (4).

𝐺 = ∑𝑜𝑗=1 𝑚(𝑧𝑗 , 𝑧𝑗̂ ) + ∑𝐿𝑙=1 𝛽 (𝑔𝑙 ) (4)

In this work, 𝑚 defines training loss operation for measuring variance among predicated value 𝑧𝑗
and the actual value 𝑧𝑗 . To avoid the over-fitting problem, the parameter 𝛽 is used for penalizing the
complexity of the predictive model as (5):

Machine learning based education data mining through student … (Shashirekha Hanumanthappa)
386  ISSN: 2089-4864

1 2
𝛽(𝑔𝑙 ) = 𝛿𝑈 + 𝜇||𝑥|| 2 (5)
2

where 𝛿 and 𝜇 define the regularization parameter, 𝑈 defines the leaf size and 𝑥 defines the score of the
different leaves. The ensemble tree is constructed is through a summation process. Let 𝑧̂ (𝑢) define the
prediction outcome of the 𝑗 − 𝑡ℎ sample considering 𝑢 − 𝑡ℎ iterations, it requires to add 𝑔𝑢 for minimizing
the (6):

(𝑢−1)
𝐺 (𝑢) = ∑𝑜𝑗=1 𝑚 (𝑧𝑗 , 𝑧̂𝑗 + 𝑔(𝑦)) + 𝛽(𝑔) (6)

the (6) is simplified by eliminating constant parameter through second-order Taylor expansion as (7):

1 2
𝐺 (𝑢) = ∑𝑜𝑗=1[ℎ𝑗 𝑔𝑗 (𝑦𝑗 ) + 𝑖𝑗 𝑔𝑢 (𝑦𝑗 ) ] + 𝛽(𝑔𝑙 ) (7)
2

where ℎ𝑗 defines the first-order gradient concerning 𝑚 as (8):

(𝑢−1)
ℎ𝑗 = 𝜕𝑧̂𝑧 𝑚(𝑧𝑗 , 𝑧̂ (𝑢−1) ) (8)

where 𝑖𝑗 defines the first-order gradient concerning 𝑚 as (9):

(𝑢−1) (𝑢−1)
𝑖𝑗 = 𝜕 2 𝑧̂𝑗 𝑚(𝑧𝑗 , 𝑧̂𝑗 ) (9)

therefore, the predictive model objective parameter is expressed using the (10).

1 2 1
𝐺 (𝑢) = ∑𝑜𝑗=1 [ℎ𝑗 𝑔𝑗 (𝑦𝑗 ) + 𝑖𝑗 𝑔𝑢 (𝑦𝑗 ) ] + 𝛿𝑈 + 𝜇 ∑𝑈 2
𝑘=1 𝑥𝑘 (10)
2 2

The simplified representation of the (10) is given as (11):

1
𝒪 (𝑢) == ∑𝑈 2
𝑗=1 [(∑𝑗∈𝑗𝑘 ℎ𝑗 )𝑥𝑗 (∑𝑗∈𝑗𝑘 𝑖𝑗 + 𝜇 )𝑥𝑘 ] + 𝛿𝑈 (11)
2

where 𝑗𝑘 defines the sample set of leaf 𝑘, which is represented as (12) and (13):

1
𝐺 (𝑢) == ∑𝑈 2
𝑗=1 [(∑𝑗∈𝑗𝑘 ℎ𝑗 )𝑥𝑗 (∑𝑗∈𝑗𝑘 𝑖𝑗 + 𝜇 )𝑥𝑘 ] + 𝛿𝑈 (12)
2

𝑗𝑘 = {𝑗|𝑟(𝑦𝑗 = 𝑘)} (13)

where 𝑟 defines the size of the tree, which is fixed, the optimal weights 𝑥𝑘∗ of leaf 𝑗 is obtained through the
(14).
𝐻𝑘
𝑥𝑘∗ = (14)
𝐼𝑘 +𝜇

In addition, the respective optimal size is obtained as (15):

1 𝐻𝑘2
𝐺 ∗ = ∑𝑈
𝑘−1 + 𝛿𝑈 (15)
2 𝐼𝑘 +𝜇

where 𝐻𝑘 is represented as (16):

𝐻𝑘 = ∑𝑗∈𝑗𝑘 ℎ𝑗 (16)

similarly, 𝐼𝑘 is represented as (17).

𝐼𝑘 = ∑𝑗∈𝑗𝑘 𝑖𝑗 (17)

Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 383-394
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864  387

The 𝐺 ∗ defines the qualities of tree 𝑟 where a smaller value indicates better tree structure. Though
XGBoost is efficient in obtaining high prediction accuracy; however, poor feature selection under unknown
environments or when data is imbalanced exhibit degradation of prediction accuracy. In addressing the
research problem, an effective feature selection within training data is modeled in the next sub-section.

2.2. Modified XGBoost prediction algorithm

In this work, the feature selection process of standard XGBoost is modified by establishing better
feature importance outcomes to achieve an improved prediction scheme. The feature selection process is
improved by optimizing the cross-validation with a minimal validation error. The K-fold cross-validation
scheme is used for optimizing the outcome of the predictive model where the dataset is randomly divided
into 𝐾 subset of equal size. Then, for constructing the student prediction model 𝐾−1 is used, and the
remaining is used for optimizing the prediction error of the student prediction model. Lastly, the mean of the
prediction error of different combinations.
𝐾 is used for optimizing the cross-validation error. After that, a grid of 𝑙 appropriate outcomes is
obtained for obtaining optimal prediction that minimized cross-validation error considering feature
importance, and the student prediction model with minimal cross-validation error is chosen. The proposed
cross-validation scheme with effective feature selection is composed of two phases. In the first phase, the
main feature is selected from feature subsets. In the second phase, features chosen from the first phase are
utilized for constructing an effective student performance prediction model. The traditional single-fold cross-
validation error is constructed as (18):

1
𝐶𝑉(𝜎) = ∑𝑘𝑘=1 ∑𝑗∈𝐺𝑘 𝑃 (𝑏𝑗 , 𝑔̂𝜎−𝑘(𝑗) (𝑦𝑗 , 𝜎)) (18)
𝑀

however, the above equation does not identify which feature affects the accuracy of the predictive model. In
addressing this work an effective cross-validation with effective feature selection with high importance
affecting prediction accuracy is modeled as (19):

1
𝐶𝑉(𝜎) = ∑𝑠𝑠=1 ∑𝑘𝑘=1 ∑𝑗∈𝐺𝑘 𝑃 (𝑏𝑗 , 𝑔̂𝜎−𝑘(𝑗) (𝑦𝑗 , 𝜎)) (19)
𝑆𝑀

in (19), selecting ideal 𝜎̂ for optimizing the student prediction model is attained as (20).

𝜎̂ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝐶𝑉𝑠 (𝜎) (20)

𝜎 ∈ {𝜎1 , … , 𝜎2 }

(𝑗)
In (19), 𝑀 defines the size of the training dataset considered, (∙) defines the loss function and 𝑔̂𝜎 (∙)
defines a function to compute coefficients. The (19) is executed iteratively for constructing the best student
performance prediction model (i.e., its optimization of training error is done in the first phase; the parameter
is passed onto the second phase to understand and update the feature importance characteristic into the
predictive model. The optimization process to obtain effective features is obtained through the minimization
process of objective function employing gradient decent mechanism. The effective feature is selected
employing the ranking method (∙) for constructing a student performance prediction model through the (21):

0 𝑖𝑓 𝑛𝑗 𝑖𝑠 𝑛𝑜𝑡 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑
𝑟(𝑎) = { (21)
1 𝑖𝑓 𝑛𝑗 𝑖𝑠 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 𝑎𝑠 𝑜𝑝𝑡𝑖𝑚𝑎𝑙 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑚𝑜𝑑𝑒𝑙 𝑗 = 1,2,3, … , 𝑛

the feature subset is constructed as (22):

𝐹𝑠 = {𝑟(𝑛1 ), 𝑟(𝑛1 ), … , 𝑟(𝑛𝑛 )}, (22)

the ideal feature with maximum score considering varied 𝐾-folds instance is obtained as (23).

𝐹𝑠𝑘 = {𝑟(𝑛1 ), 𝑟(𝑛1 ), … , 𝑟(𝑛𝑛 )}, (23)

Then, compute the number of occurrences a particular feature is selected for 𝐾 feature subsets
having maximum score and the final feature subset is obtained as (24):

𝐹𝑠𝑓𝑖𝑛𝑎𝑙 = {𝑓𝑠 (𝑝1 ), 𝑓𝑠 (𝑛1 ), … , 𝑓𝑠 (𝑛𝑛 )}, (24)

Machine learning based education data mining through student … (Shashirekha Hanumanthappa)
388  ISSN: 2089-4864

where (∙) depicts a case when where 𝑛𝑡ℎ feature is selected/not and mathematically represented as (25).
𝐾
0 𝑖𝑓 𝑞𝑗 𝑖𝑠 𝑐ℎ𝑜𝑠𝑒𝑛 𝑙𝑒𝑠𝑠𝑒𝑟 𝑡ℎ𝑎𝑛 𝑡𝑖𝑚𝑒, 𝑗 = 1,2,3, … , 𝑛
2
𝐹𝑠 (𝑎) = { 𝐾 (25)
1 𝑖𝑓𝑞𝑗 𝑖𝑠 𝑐ℎ𝑜𝑠𝑒𝑛 𝑔𝑟𝑒𝑎𝑡𝑒𝑟 𝑜𝑟 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 𝑡𝑖𝑚𝑒𝑠, 𝑗 = 1,2,3, … , 𝑛
2

The aforementioned equation is used for the generation of a subset of 𝑛′ selected features, where 𝑛𝑡ℎ
describe how many times a feature is selected. The enterprise performance management (EPM) training data
utilized is a subset through selected features for building an effective student prediction model. To reduce
randomness during the training process, 𝐾 −folds are built by iterating 𝑆 number of times in the first phase. In
the second phase, for reducing variance subset of features is selected. Therefore, the proposed MXGB-based
student performance prediction model significantly improves overall prediction accuracy in comparison with
state-of-art ML-based student performance prediction schemes.

3. RESULT AND ANALYSIS

In this section, student performance prediction using the proposed MXGB and other existing ML-
based student prediction methods are studied [22]. The e-learning dataset from [22] is used for performance
analysis. The selection of the dataset is based on a comparison paper [22]. The model is a ML model for
performing student performance prediction implemented using the Python 3 frameworks. The ROC
performance metrics such as accuracy, sensitivity, specificity, precision, and F-measure are used for
validating the student performance prediction model. The accuracy is computed as (26):
𝑇𝑃+𝑇𝑁
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (26)
TP+FP+TN+FN

where 𝑇𝑃 defines true positive, 𝐹𝑃 defines false positive, 𝑇𝑁 defines true negative, and 𝐹𝑁 defines false
negative. The sensitivity is computed as (27):
𝑇𝑃
Sensitivity = (27)
TP+FN

the specificity is computed as (28):

𝑇𝑁
Specificity = (28)
TN+FP

the precision is computed as (29):

𝑇𝑃
Precision = (29)
TP+FP

the F-measure is computed as (30).

2×Precision×Sensitivity
F − measure = (30)
Precision×Sensitivity

3.1. Predictive model performance evaluation

In this section different ML-based student, performance prediction model in terms of specificity and
sensitivity is studied. Figure 3 shows the specificity outcome achieved using different student performance
prediction models such as random forest (RF), logistic regression (LR), and ensemble-based [22]. XGBoost-
based, and proposed MXGB-based. The RF-based attain a specificity of 0.875, the LR-based attain a
specificity of 0.75, ensemble-based attain a specificity of 0.857. XGBoost-based attain a specificity of
0.8502, and the proposed MXGB-based attain a specificity of 0.946. A higher value of specificity i.e., closer
to 1 is considered a good prediction model. Thus, the proposed MXGB-based student performance prediction
model is much more efficient than other ML-based student performance prediction models in terms of
specificity. Figure 4 shows the sensitivity outcome achieved using different student performance prediction
models such as RF-based, LR-based, and ensemble-based. XGBoost-based, and proposed MXGB-based. The
RF-based attains a sensitivity of 1, the LR-based attains a sensitivity of 0.857, ensemble-based attains a
sensitivity of 0.857. XGBoost-based attain a sensitivity of 0.9449, and the proposed MXGB-based attain a
sensitivity of 1. A higher value of sensitivity i.e., closer to 1 is considered a good prediction model. Thus, the

Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 383-394
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864  389

RF-based proposed MXGB-based student performance prediction model is much more efficient than other
ML-based student performance prediction models in terms of sensitivity. However, the MXBG-based brings
tradeoffs between higher sensitivity and specificity; thus, attaining much better student performance
prediction accuracies.
Further, performance is validated considering different ROC metrics such as specificity, recall,
accuracy, precision, and F-measure using different predictive models as shown in Figure 2. From Figure 2,
we can see the factor analysis based XGBoost (FA-XGB)-based predictive model achieves much better
performance in comparison with XGBoost and ensemble-based predictive model. Figure 5 shows the ROC
performance of different ML-based student performance prediction models.

Figure 3. Specificity performance of different ML algorithms for predicting student performance

Figure 4. Sensitivity performance of different ML algorithms for predicting student performance

Figure 5. ROC performance of different ML-based student performance prediction models

Machine learning based education data mining through student … (Shashirekha Hanumanthappa)
390  ISSN: 2089-4864

3.2. Feature importance performance

Figure 3 shows a graphical representation of the feature importance parameter obtained using
XGBoost and FA-XGB-based predictive model. From Figure 3, we can see that FA-XGB gives higher
importance to features in comparison with XGBoost. Further, the FA-XGB-based predictive model gives
importance in the following order Kolmogorov-Smirnov (KS), weight (WT), majorization-minimization
(MM), moving window (MW), machine learning-based checker (MLC), machine reading comprehension
(MRC), and moving window classifier (MWC). On the other side, the XGB-based predictive model gives
importance in the following order WT, KS, MW, MM, MRC, MLC, and MWC. Further, it is noticed in both
cases MWC is given very less importance. Figure 6 shows how selecting the right feature aid in improving
the overall classification accuracy of the proposed FA-XGB-based predictive model.

Figure 6. Feature ranking score graphical representation

3.3. Student performance prediction for a different session

Here the performance is validated considering different ROC metrics such as specificity, recall,
accuracy, precision, and F-measure for different sessions such as session 2, session 3, session 4, session 5,
and session 6 using a different predictive model such as XGBoost and FA-XGB as shown in Figures 4 to 8,
respectively. Figure 7 shows the accuracy performance using ML-based student performance prediction
model for different sessions. From Figures 4 to 8 we can see the FA-XGB-based predictive model achieves
much better ROC performance in comparison with the XGBoost-based predictive model. Figures 9 to 11
show the specificity, precision, and F-measure performance using an ML-based student performance
prediction model for different sessions.

Figure 7. Accuracy performance using ML-based student performance prediction model for different sessions

Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 383-394
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864  391

Figure 8. Sensitivity performance using ML-based Figure 9. Specificity performance using ML-based
student performance prediction model for different student performance prediction model for different
sessions sessions

Figure 10. Precision performance using ML-based Figure 11. F-measure performance using ML-based
student performance prediction model for different student performance prediction model for different
sessions sessions

3.4. Feature ranking importance

The graphical representation of the feature ranking score of the XGBoost-based and MXGB-based
student performance prediction model for different sessions is shown in Figures 12 to 16. Figure 11 shows
the graphical representation of the feature ranking score attained using XGBoost-based and MXGB-based
student performance prediction model for session 2. From the result it can be stated that XGBoost-based
gives a higher score for MW and a lesser score for MRC; On the other side, MXGB-based gives a
higher score to WT and a lesser score for MRC. Figure 12 shows the graphical representation of the feature
ranking score attained using the XGBoost-based and MXGB-based student performance prediction model for
session 3. From the result, it can be stated both XGB-based and MXGB-based give higher scores for MM and
lesser scores for MWC; however, the MXGB-based model gives much higher feature importance in
comparison with XGBoost-based student performance predictions. Figure 13 shows the graphical
representation of the feature ranking score attained using the XGBoost-based and MXGB-based student
performance prediction model for session 4. From the result, it can be stated that XGBoost-based gives a
higher score for MW and a lesser score for KS, MWC, and MRC; On the other side, MXGB-based gives a
higher score to MM and a lesser score to MWC. Figure 14 shows the graphical representation of the feature
ranking score attained using the XGBoost-based and MXGB-based student performance prediction model for
session 5. From the result, it can be stated that XGBoost-based gives a higher score for KS and WT and a
lesser score for MW, MM, and MWC; On the other side, MXGB-based gives a higher score to KS and a
lesser score to MWC. Figure 15 shows the graphical representation of the feature ranking score attained
using the XGBoost-based and MXGB-based student performance prediction model for session 6. From the
result it can be stated that XGBoost-based gives a higher score for KS and a lesser score for MLC and MWC;
On the other side, MXGB-based gives a higher score to KS and a lesser score to MWC. The graphical
representation from Figures 11 to 15 shows the MXGB-based gives higher importance to features in
comparison with the XGBoost-based student performance prediction model. Thus, aiding the MXGB-based
student performance prediction model to achieve higher accuracy in comparison with ensemble-based and
XGBoost-based student performance prediction models.

Machine learning based education data mining through student … (Shashirekha Hanumanthappa)
392  ISSN: 2089-4864

Figure 12. Feature ranking score graphical representation for session 2

Figure 13. Feature ranking score graphical representation for session 3

Figure 14. Feature ranking score graphical representation for session 4

Figure 15. Feature ranking score graphical representation for session 5

Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 383-394
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864  393

Figure 16. Feature ranking score graphical representation for session 6

4. CONCLUSION
Predicting the performance of a student by analyzing the student session stream is a challenging
task. ML algorithms have been used by various existing student performance prediction models to achieve
improved prediction outcomes. However, these models tend to achieve higher accuracy to specific student
data and when adapted to new data they exhibit poor performance. In addressing such issues, recent work has
used an ensemble-based ML model for choosing the best model to perform prediction tasks. However, when
data is imbalanced existing ensemble-based models exhibit poor performance. This paper presented an
efficient ensemble machine-learning model by modifying XGBoost that works well even when training data
is imbalanced. Here an effective cross-validation scheme is presented to identify which feature impacts the
accuracy of the prediction model. The cross-validation scheme employs an effective feature ranking
mechanism to improve prediction accuracy by optimizing the prediction error. The experiment is conducted
using standard student session stream data. The proposed MXGB model significantly improves accuracy,
sensitivity, specificity, precision, and F-measure performance in comparison with RF-based, LR-based,
ensemble-based, and XGBoost-based student performance prediction models. The performance of the
MXGB model will be tested using a more diverse dataset. Alongside this, would consider reducing training
errors by considering multi-class classification.

REFERENCES
[1] A. Moubayed, M. Injadat, A. B. Nassif, H. Lutfiyya, and A. Shami, “E-Learning: challenges and research opportunities using
machine learning & data analytics,” IEEE Access, vol. 6, pp. 39117–39138, 2018, doi: 10.1109/ACCESS.2018.2851790.
[2] F. Essalmi, L. J. B. Ayed, M. Jemni, S. Graf, and Kinshuk, “Generalized metrics for the analysis of E-learning personalization
strategies,” Computers in Human Behavior, vol. 48, pp. 310–322, Jul. 2015, doi: 10.1016/j.chb.2014.12.050.
[3] J. Yang, J. Ma, and S. K. Howard, “Usage profiling from mobile applications: a case study of online activity for Australian
primary schools,” Knowledge-Based Systems, vol. 191, Mar. 2020, doi: 10.1016/j.knosys.2019.105214.
[4] A. Wakjira and S. Bhattacharya, “Predicting student engagement in the online learning environment,” International Journal of
Web-Based Learning and Teaching Technologies, vol. 16, no. 6, pp. 1–21, Oct. 2021, doi: 10.4018/IJWLTT.287095.
[5] M. Hussain, W. Zhu, W. Zhang, and S. M. R. Abidi, “Student engagement predictions in an e-learning system and their impact on
student course assessment scores,” Computational Intelligence and Neuroscience, vol. 2018, pp. 1–21, Oct. 2018, doi:
10.1155/2018/6347186.
[6] G. Kaur and W. Singh, “Prediction of student performance using weka tool,” Research Cell : An International Journal of
Engineering Sciences, vol. 17, no. January, pp. 2229–6913, 2016.
[7] Y. Chen, Y. Mao, H. Liang, S. Yu, Y. Wei, and S. Leng, “Data poison detection schemes for distributed machine learning,” IEEE
Access, vol. 8, pp. 7442–7454, 2020, doi: 10.1109/ACCESS.2019.2962525.
[8] A. J. Stimpson and M. L. Cummings, “Assessing intervention timing in computer-based education using machine learning
algorithms,” IEEE Access, vol. 2, pp. 78–87, 2014, doi: 10.1109/ACCESS.2014.2303071.
[9] E. Alyahyan and D. Düştegör, “Predicting academic success in higher education: literature review and best practices,”
International Journal of Educational Technology in Higher Education, vol. 17, no. 1, 2020, doi: 10.1186/s41239-020-0177-7.
[10] M. Injadat, F. Salo, A. B. Nassif, A. Essex, and A. Shami, “Bayesian optimization with machine learning algorithms towards
anomaly detection,” in 2018 IEEE Global Communications Conference (GLOBECOM), IEEE, Dec. 2018, pp. 1–6. doi:
10.1109/GLOCOM.2018.8647714.
[11] L. Yang, A. Moubayed, I. Hamieh, and A. Shami, “Tree-based intelligent intrusion detection system in internet of vehicles,” in
2019 IEEE Global Communications Conference (GLOBECOM), 2019, pp. 1–6. doi: 10.1109/GLOBECOM38437.2019.9013892.
[12] A. Moubayed, M. Injadat, A. Shami, and H. Lutfiyya, “DNS typo-squatting domain detection: a data analytics & machine
learning based approach,” in 2018 IEEE Global Communications Conference (GLOBECOM), IEEE, Dec. 2018, pp. 1–7. doi:
10.1109/GLOCOM.2018.8647679.
[13] A. Namoun and A. Alshanqiti, “Predicting student performance using data mining and learning analytics techniques: a systematic
literature review,” Applied Sciences, vol. 11, no. 1, Dec. 2020, doi: 10.3390/app11010237.
[14] S. Ayouni, F. Hajjej, M. Maddeh, and S. Al-Otaibi, “A new ML-based approach to enhance student engagement in online
environment,” PLOS ONE, vol. 16, no. 11, Nov. 2021, doi: 10.1371/journal.pone.0258788.

Machine learning based education data mining through student … (Shashirekha Hanumanthappa)
394  ISSN: 2089-4864

[15] S. M. Aslam, A. K. Jilani, J. Sultana, and L. Almutairi, “Feature evaluation of emerging e-learning systems using machine
learning: an extensive survey,” IEEE Access, vol. 9, pp. 69573–69587, 2021, doi: 10.1109/ACCESS.2021.3077663.
[16] S. S. Khanal, P. W. C. Prasad, A. Alsadoon, and A. Maag, “A systematic review: machine learning based recommendation
systems for e-learning,” Education and Information Technologies, vol. 25, no. 4, pp. 2635–2664, Jul. 2020, doi: 10.1007/s10639-
019-10063-9.
[17] S. Helal et al., “Predicting academic performance by considering student heterogeneity,” Knowledge-Based Systems, vol. 161, pp.
134–146, Dec. 2018, doi: 10.1016/j.knosys.2018.07.042.
[18] L. Juhaňák, J. Zounek, and L. Rohlíková, “Using process mining to analyze students’ quiz-taking behavior patterns in a learning
management system,” Computers in Human Behavior, vol. 92, pp. 496–506, Mar. 2019, doi: 10.1016/j.chb.2017.12.015.
[19] Q. Liu et al., “Exploiting cognitive structure for adaptive learning,” in Proceedings of the 25th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining, New York, NY, USA: ACM, Jul. 2019, pp. 627–635. doi:
10.1145/3292500.3330922.
[20] F. Wang et al., “Neural cognitive diagnosis for intelligent education systems,” Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 34, no. 04, pp. 6153–6161, Apr. 2020, doi: 10.1609/aaai.v34i04.6080.
[21] B. Kehrwald, “Understanding social presence in text‐based online learning environments,” Distance Education, vol. 29, no. 1, pp.
89–106, May 2008, doi: 10.1080/01587910802004860.
[22] M. Injadat, A. Moubayed, A. B. Nassif, and A. Shami, “Systematic ensemble model selection approach for educational data
mining,” Knowledge-Based Systems, vol. 200, Jul. 2020, doi: 10.1016/j.knosys.2020.105992.
[23] M. Injadat, A. Moubayed, A. B. Nassif, and A. Shami, “Multi-split optimized bagging ensemble model selection for multi-class
educational data mining,” Applied Intelligence, vol. 50, no. 12, pp. 4506–4528, Dec. 2020, doi: 10.1007/s10489-020-01776-3.
[24] K. Abe, “Data mining and machine learning applications for educational big data in the university,” in 2019 IEEE Intl Conf on
Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big
Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), IEEE, Aug.
2019, pp. 350–355. doi: 10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00071.
[25] T. Chen and C. Guestrin, “XGBoost: a scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, New York, NY, USA: ACM, Aug. 2016, pp. 785–794. doi:
10.1145/2939672.2939785.

BIOGRAPHIES OF AUTHORS

Shashirekha Hanumanthappa currently working as assistant professor in the

Department of Computer Science and Engineering, Visvesvaraya Technological University
Centre for Post Graduation Studies, Mysuru. She has completed M.Tech. in computer science
and engineering from UBDT College of Engineering (Kuvempu University), Davanagere,
Karnataka, India in the year 2008. Her field of interest is big data, artificial intelligence, and
machine learning. She can be contacted at email: [email protected] or
[email protected].

Dr. Chetana Prakash holds Ph.D. in computer science and engineering and she is
currently working as professor in the Department of Computer Science and Engineering,
Bapuji Institute of Engineering and Technology, Davangere. She has teaching experience of
more than 30 years. Her field of interest is speech signal processing, data mining, image
processing, fuzzy techniques, IoT, and data analytics. She can be contacted at email:
[email protected].

Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 383-394

Probability and Statistics For Engineers - Richard L. Scheaffer, Madhuri S. Mulekar, James T. McClave
No ratings yet
Probability and Statistics For Engineers - Richard L. Scheaffer, Madhuri S. Mulekar, James T. McClave
159 pages
Design of Flood Warning Prototype Using ESP32 Module-Based Ultrasonic Sensors
No ratings yet
Design of Flood Warning Prototype Using ESP32 Module-Based Ultrasonic Sensors
10 pages
Academic Analytics Using Machine Learning
No ratings yet
Academic Analytics Using Machine Learning
26 pages
Statistical Analysis: A Manual On Dissertation Statistics in SPSS
No ratings yet
Statistical Analysis: A Manual On Dissertation Statistics in SPSS
198 pages
FPGA Implementation of Artificial Neural Network For PUF Modeling
No ratings yet
FPGA Implementation of Artificial Neural Network For PUF Modeling
8 pages
Multiclass Prediction Model For Student Grade Prediction Using Machine Learning
No ratings yet
Multiclass Prediction Model For Student Grade Prediction Using Machine Learning
14 pages
Comparative Analysis of ZigBee, LoRa, and NB-IoT in A Smart Building: Advantages, Limitations, and Integration Possibilities
100% (1)
Comparative Analysis of ZigBee, LoRa, and NB-IoT in A Smart Building: Advantages, Limitations, and Integration Possibilities
11 pages
Test - 2 - Probability: IB Mathematics HL
No ratings yet
Test - 2 - Probability: IB Mathematics HL
2 pages
Artificial Intelligent Approach To Predict The Student Behaviour and Performance
No ratings yet
Artificial Intelligent Approach To Predict The Student Behaviour and Performance
11 pages
2017 - Predicting Student Performance With Neural Networks
No ratings yet
2017 - Predicting Student Performance With Neural Networks
33 pages
Student Academic Performance Prediction Using Supervised Learning Techniques
No ratings yet
Student Academic Performance Prediction Using Supervised Learning Techniques
13 pages
ARMA Forecasting Using Eviews - 15 Feb 24
No ratings yet
ARMA Forecasting Using Eviews - 15 Feb 24
9 pages
A Deep Learning Approach Towards Student Performance Prediction in Online Courses Challenges Based On A Global Perspective
No ratings yet
A Deep Learning Approach Towards Student Performance Prediction in Online Courses Challenges Based On A Global Perspective
6 pages
Predicting Students Performance by Learning Analytics
No ratings yet
Predicting Students Performance by Learning Analytics
51 pages
Test Analysis and Utilization: Rodger R. de Padua Ed, D. PSDS - Hermosa
No ratings yet
Test Analysis and Utilization: Rodger R. de Padua Ed, D. PSDS - Hermosa
48 pages
Ujian Statistik Praktek 2012
No ratings yet
Ujian Statistik Praktek 2012
26 pages
Multi-Label Feature Aware XGBoost Model For Student Performance Assessment Using Behavior Data in Online Learning Environment
No ratings yet
Multi-Label Feature Aware XGBoost Model For Student Performance Assessment Using Behavior Data in Online Learning Environment
7 pages
4.2 Estimation of Absolute Performance
No ratings yet
4.2 Estimation of Absolute Performance
42 pages
Waste Incinerator Monitoring System Based On Remote Communication With Android Interface
No ratings yet
Waste Incinerator Monitoring System Based On Remote Communication With Android Interface
9 pages
Week 4 Moderation
No ratings yet
Week 4 Moderation
35 pages
Report WT
No ratings yet
Report WT
24 pages
Presentation 3
No ratings yet
Presentation 3
23 pages
Lucky Mini Project
No ratings yet
Lucky Mini Project
32 pages
A Systematic Literature Review
No ratings yet
A Systematic Literature Review
28 pages
PredictingStudentSuccess-AutoML PrePrint
No ratings yet
PredictingStudentSuccess-AutoML PrePrint
23 pages
Analysis of Student Academic Performance Using Machine Learning Algorithms: - A Study
No ratings yet
Analysis of Student Academic Performance Using Machine Learning Algorithms: - A Study
15 pages
Enhancing The Prediction of Student Performance Based On The Machine Learning XGBoost Algorithm
No ratings yet
Enhancing The Prediction of Student Performance Based On The Machine Learning XGBoost Algorithm
21 pages
Yash 21BSDS12 Perdictive Analysis Report
No ratings yet
Yash 21BSDS12 Perdictive Analysis Report
20 pages
Heart Disease Prediction - Colab
No ratings yet
Heart Disease Prediction - Colab
18 pages
Linear Regression and Correlation: Mcgraw-Hill/Irwin
No ratings yet
Linear Regression and Correlation: Mcgraw-Hill/Irwin
16 pages
Predicting Student Performance From Online Engagement Activities Using Novel Statistical Features
No ratings yet
Predicting Student Performance From Online Engagement Activities Using Novel Statistical Features
19 pages
L10 - T Test
No ratings yet
L10 - T Test
28 pages
Students Performance Prediction System Using Multi Agent Data Mining Technique
No ratings yet
Students Performance Prediction System Using Multi Agent Data Mining Technique
20 pages
Biostatistics Unit 10. Measures of Relationship
No ratings yet
Biostatistics Unit 10. Measures of Relationship
37 pages
2023-Contextualizing The Current State of Research On The Use Ofmachine Learning For Student Performance Prediction Asystematic Literature Review
No ratings yet
2023-Contextualizing The Current State of Research On The Use Ofmachine Learning For Student Performance Prediction Asystematic Literature Review
25 pages
Sustainability 15 06229
No ratings yet
Sustainability 15 06229
25 pages
UNIT 7.4 T-Test - Z - Test
No ratings yet
UNIT 7.4 T-Test - Z - Test
27 pages
Predicting and Interpreting Student Performance Using Ensemble Models and Shapley Additive Explanations
No ratings yet
Predicting and Interpreting Student Performance Using Ensemble Models and Shapley Additive Explanations
16 pages
Techniques For Examining Student Data For Indicators of Future Success - A Survey and Analysis
No ratings yet
Techniques For Examining Student Data For Indicators of Future Success - A Survey and Analysis
8 pages
Lecture 10 Comparisons Involving Means
No ratings yet
Lecture 10 Comparisons Involving Means
38 pages
Students' Course Results Prediction Based On Data Processing and Machine Learning Methods
No ratings yet
Students' Course Results Prediction Based On Data Processing and Machine Learning Methods
13 pages
Paper Predicting Student Scores
No ratings yet
Paper Predicting Student Scores
10 pages
Data Mining Approach To Predict Academic Performance of Students
No ratings yet
Data Mining Approach To Predict Academic Performance of Students
11 pages
Prediction of Student Academic Performance Based On Their Emotional Wellbeing and Interaction On Various e Learning Platforms
No ratings yet
Prediction of Student Academic Performance Based On Their Emotional Wellbeing and Interaction On Various e Learning Platforms
30 pages
Optimizing Resource Allocation in Job Shop Production Systems With Seasonal Demand Patterns
No ratings yet
Optimizing Resource Allocation in Job Shop Production Systems With Seasonal Demand Patterns
14 pages
Salah Hashim 2020 IOP Conf. Ser. Mater. Sci. Eng. 928 032019
No ratings yet
Salah Hashim 2020 IOP Conf. Ser. Mater. Sci. Eng. 928 032019
19 pages
Ai-Based Early Prediction and Intervention For Student Academic Performance in Higher Education
No ratings yet
Ai-Based Early Prediction and Intervention For Student Academic Performance in Higher Education
19 pages
U-4 Iml
No ratings yet
U-4 Iml
17 pages
Statistics Module 11
No ratings yet
Statistics Module 11
9 pages
Group 4 - Forecast Time Series Data - Toyota Vietnam
No ratings yet
Group 4 - Forecast Time Series Data - Toyota Vietnam
15 pages
Design of Medium Grain Integrated Clock Gater For Low Power Clock Network
No ratings yet
Design of Medium Grain Integrated Clock Gater For Low Power Clock Network
9 pages
Performance Comparison of Indoor Navigation and Obstacle Avoidance Methods For Low-Cost Implementation in Wheelchairs
No ratings yet
Performance Comparison of Indoor Navigation and Obstacle Avoidance Methods For Low-Cost Implementation in Wheelchairs
9 pages
Ijesrt: International Journal of Engineering Sciences & Research Technology
No ratings yet
Ijesrt: International Journal of Engineering Sciences & Research Technology
11 pages
Journal Publications
No ratings yet
Journal Publications
13 pages
Final PPT Gruop 143k
No ratings yet
Final PPT Gruop 143k
26 pages
Predicting Students Performance Through Data Mini
No ratings yet
Predicting Students Performance Through Data Mini
15 pages
2017 - StudentCGPA PDF
No ratings yet
2017 - StudentCGPA PDF
7 pages
Machine Learning Glob (22241a1237)
No ratings yet
Machine Learning Glob (22241a1237)
16 pages
Central Processing Unit Load Reduction Through Application Code Optimization and Memory Management
No ratings yet
Central Processing Unit Load Reduction Through Application Code Optimization and Memory Management
10 pages
2025 Proceedings of The International COnference On Decision Aid and Artificial Intelligence (ICODAI 2024)
No ratings yet
2025 Proceedings of The International COnference On Decision Aid and Artificial Intelligence (ICODAI 2024)
14 pages
Constructing The Spatial Weights Matrix Using A Local Statistic
No ratings yet
Constructing The Spatial Weights Matrix Using A Local Statistic
15 pages
Low-Noise Amplifier With Pre-Distortion Architecture For Ultra-Wide Band Application in Radio Frequency
No ratings yet
Low-Noise Amplifier With Pre-Distortion Architecture For Ultra-Wide Band Application in Radio Frequency
13 pages
Artificial Intelligence Driven Robotic Control System For Personalized Elderly Care and Foot Massage
No ratings yet
Artificial Intelligence Driven Robotic Control System For Personalized Elderly Care and Foot Massage
13 pages
Formula 1
No ratings yet
Formula 1
8 pages
Comparative Analysis of Feature Descriptors and Classifiers For Real-Time Object Detection
No ratings yet
Comparative Analysis of Feature Descriptors and Classifiers For Real-Time Object Detection
11 pages
Finite Element Analysis Method As An Alternative For Furniture Prototyping Process and Product Testing
No ratings yet
Finite Element Analysis Method As An Alternative For Furniture Prototyping Process and Product Testing
12 pages
Design and Implementation of Smart Traffic Light Controller With Emergency Vehicle Detection On FPGA
No ratings yet
Design and Implementation of Smart Traffic Light Controller With Emergency Vehicle Detection On FPGA
12 pages
An Internet of Things-Driven Smart Key System With Real-Time Alerts: Innovations in Hotel Security
No ratings yet
An Internet of Things-Driven Smart Key System With Real-Time Alerts: Innovations in Hotel Security
12 pages
TENS Device For Cervical Pain During Teleworking Controlled Remotely by Mobile Application
No ratings yet
TENS Device For Cervical Pain During Teleworking Controlled Remotely by Mobile Application
9 pages
Multimodal Recognition With Deep Learning: Audio, Image, and Text
No ratings yet
Multimodal Recognition With Deep Learning: Audio, Image, and Text
11 pages
Self-Attention Encoder-Decoder With Model Adaptation For Transliteration and Translation Tasks in Regional Language
No ratings yet
Self-Attention Encoder-Decoder With Model Adaptation For Transliteration and Translation Tasks in Regional Language
11 pages
Implementing A Very High-Speed Secure Hash Algorithm 3 Accelerator Based On PCI-express
No ratings yet
Implementing A Very High-Speed Secure Hash Algorithm 3 Accelerator Based On PCI-express
11 pages
Performance Analysis of Parallel Prefix Adders Developed With Field Programmable Gate Array Technology
No ratings yet
Performance Analysis of Parallel Prefix Adders Developed With Field Programmable Gate Array Technology
8 pages
Stata Notes
No ratings yet
Stata Notes
7 pages
Algorithm-Driven Development of A Simulation Tool For Industrial Manipulator Stability Analysis
No ratings yet
Algorithm-Driven Development of A Simulation Tool For Industrial Manipulator Stability Analysis
10 pages
Reviewed
No ratings yet
Reviewed
19 pages
Practice Problems: Chapter 4, Forecasting: Problem 1
No ratings yet
Practice Problems: Chapter 4, Forecasting: Problem 1
10 pages
Modeling of Chimp Optimization Algorithm Node Localization Scheme in Wireless Sensor Networks
No ratings yet
Modeling of Chimp Optimization Algorithm Node Localization Scheme in Wireless Sensor Networks
10 pages
Predicting Student Academic Performanceusing Support Vector Machineand Random Forest
No ratings yet
Predicting Student Academic Performanceusing Support Vector Machineand Random Forest
9 pages
Development and Evaluation of Robotic Exoskeleton Arm For Enhanced Human Load Carrying Efficiency
No ratings yet
Development and Evaluation of Robotic Exoskeleton Arm For Enhanced Human Load Carrying Efficiency
9 pages
20 21374 Ijres
No ratings yet
20 21374 Ijres
9 pages
An E Cient Deep Learning Approach For Prediction of Student Performance Using Neural Network
No ratings yet
An E Cient Deep Learning Approach For Prediction of Student Performance Using Neural Network
13 pages
Development of Internet of Vehicles and Recurrent Neural Network Enabled Intelligent Transportation System For Smart Cities
No ratings yet
Development of Internet of Vehicles and Recurrent Neural Network Enabled Intelligent Transportation System For Smart Cities
10 pages
Predicting The Students Performance
No ratings yet
Predicting The Students Performance
18 pages
A Fast Half-Subtractor Using 8T Static Random Access Memory For In-Memory Computation
No ratings yet
A Fast Half-Subtractor Using 8T Static Random Access Memory For In-Memory Computation
9 pages
Data Mining Mid Project Report-Sagor
No ratings yet
Data Mining Mid Project Report-Sagor
11 pages
Integration of K-Means and Silhouette Score For Energy Efficiency of Wireless Sensor Networks
No ratings yet
Integration of K-Means and Silhouette Score For Energy Efficiency of Wireless Sensor Networks
9 pages
Design of Agrivoltaic System With Internet of Things Control For Chili Fruit Classification Using The Neural Network Method
No ratings yet
Design of Agrivoltaic System With Internet of Things Control For Chili Fruit Classification Using The Neural Network Method
8 pages
Implementation of Flexible Axis Photovoltaic System Based On Internet of Things
No ratings yet
Implementation of Flexible Axis Photovoltaic System Based On Internet of Things
8 pages
Analysing Feature Selection: Impacts Towards Forecasting Electricity Power Consumption
No ratings yet
Analysing Feature Selection: Impacts Towards Forecasting Electricity Power Consumption
8 pages
347 1038 1 PB
No ratings yet
347 1038 1 PB
9 pages
Mini Project Analysis On Messi
No ratings yet
Mini Project Analysis On Messi
10 pages
Competency Learning and Student Centric
No ratings yet
Competency Learning and Student Centric
14 pages
Article 4
No ratings yet
Article 4
9 pages
83 CD
No ratings yet
83 CD
6 pages
A Study of IoT Based Real-Time Monitoring of Photovoltaic Power Plant
No ratings yet
A Study of IoT Based Real-Time Monitoring of Photovoltaic Power Plant
7 pages
(Gdrive) A Learning Performance Assessment Model Using Neural Network Classification Methods of E-Learning Activity Log Data
No ratings yet
(Gdrive) A Learning Performance Assessment Model Using Neural Network Classification Methods of E-Learning Activity Log Data
8 pages
Introduce and Related Work
No ratings yet
Introduce and Related Work
3 pages
11861-Article Text-21047-1-10-20211230
No ratings yet
11861-Article Text-21047-1-10-20211230
7 pages
Review On Prediction Algorithms in Educational Data Mining: A.Dinesh Kumar, R.Pandi Selvam, K.Sathesh Kumar
No ratings yet
Review On Prediction Algorithms in Educational Data Mining: A.Dinesh Kumar, R.Pandi Selvam, K.Sathesh Kumar
8 pages
Dicision Trees On Weka
No ratings yet
Dicision Trees On Weka
4 pages
5 Test of Population Variance Workbook
No ratings yet
5 Test of Population Variance Workbook
5 pages
Student Performance Prediction Using Multi-Layers Artificial Neural Networks A Case Study On Educational Data Mining
No ratings yet
Student Performance Prediction Using Multi-Layers Artificial Neural Networks A Case Study On Educational Data Mining
6 pages
Prediction of Students Performance With Learning Coefficients Using Regression Based Machine Learning Models
No ratings yet
Prediction of Students Performance With Learning Coefficients Using Regression Based Machine Learning Models
11 pages
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
No ratings yet
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
2 pages
Ramaswami 2020
No ratings yet
Ramaswami 2020
5 pages
Irjet V7i2688 PDF
No ratings yet
Irjet V7i2688 PDF
4 pages
R3 - Classification and Prediction of Student Performance Data Using Various
No ratings yet
R3 - Classification and Prediction of Student Performance Data Using Various
4 pages
Predicting Short-Term Electricity Demand by Combining The Advantages of Arma and Xgboost in Fog Computing Environment
No ratings yet
Predicting Short-Term Electricity Demand by Combining The Advantages of Arma and Xgboost in Fog Computing Environment
3 pages
DATA MINING - Syllabus
No ratings yet
DATA MINING - Syllabus
4 pages
QGM03
No ratings yet
QGM03
3 pages
Business Analytics S3 MBA May 2022 (S)
No ratings yet
Business Analytics S3 MBA May 2022 (S)
2 pages
Evaluation of Literature Review
No ratings yet
Evaluation of Literature Review
2 pages
Exercise 20
No ratings yet
Exercise 20
2 pages
SFA Paper 4
No ratings yet
SFA Paper 4
2 pages
Histogram Notes
No ratings yet
Histogram Notes
2 pages
Mastering Partial Least Squares Structural Equation Modeling (Pls-Sem) with Smartpls in 38 Hours
From Everand
Mastering Partial Least Squares Structural Equation Modeling (Pls-Sem) with Smartpls in 38 Hours
Ken Kwong-Kay Wong
3/5 (1)
ICT Project Management: Framework for ICT-based Pedagogy System: Development, Operation, and Management
From Everand
ICT Project Management: Framework for ICT-based Pedagogy System: Development, Operation, and Management
Suman Ahmmed
No ratings yet
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
From Everand
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
Dr. GEETHA N DATA SCIENTIST, BENGALURU
No ratings yet
Teaching and Learning in STEM With Computation, Modeling, and Simulation Practices: A Guide for Practitioners and Researchers
From Everand
Teaching and Learning in STEM With Computation, Modeling, and Simulation Practices: A Guide for Practitioners and Researchers
Alejandra J. Magana
No ratings yet

Machine Learning Based Education Data Mining Through Student Session Streams

Uploaded by

Machine Learning Based Education Data Mining Through Student Session Streams

Uploaded by

International Journal of Reconfigurable and Embedded Systems (IJRES)

Vol. 13, No. 2, July 2024, pp. 383~394

Machine learning based education data mining through student

Shashirekha Hanumanthappa1, Chetana Prakash2

Article Info ABSTRACT

Journal homepage: http://ijres.iaescore.com

Figure 1. General design of student performance prediction through ML models

2. MACHINE LEARNING MODEL FOR EDM OF STUDENT SESSION STREAMS

𝐸 = {(𝑎1 , 𝑏1 ), (𝑎2 , 𝑏2 ), … , (𝑎𝑚, 𝑏𝑚)} (1)

Figure 2. Proposed ML model for EDM of student session streams

2.1. XGBoost prediction algorithm

𝑍𝑗̂ = ∑𝐿𝑙−1 𝑔𝑙 (𝑦𝑖 ), 𝑔𝑙 ∈ 𝐺 (3)

𝐺 = ∑𝑜𝑗=1 𝑚(𝑧𝑗 , 𝑧𝑗̂ ) + ∑𝐿𝑙=1 𝛽 (𝑔𝑙 ) (4)

where ℎ𝑗 defines the first-order gradient concerning 𝑚 as (8):

where 𝑖𝑗 defines the first-order gradient concerning 𝑚 as (9):

The simplified representation of the (10) is given as (11):

𝑗𝑘 = {𝑗|𝑟(𝑦𝑗 = 𝑘)} (13)

In addition, the respective optimal size is obtained as (15):

where 𝐻𝑘 is represented as (16):

similarly, 𝐼𝑘 is represented as (17).

2.2. Modified XGBoost prediction algorithm

𝜎̂ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝐶𝑉𝑠 (𝜎) (20)

the feature subset is constructed as (22):

𝐹𝑠 = {𝑟(𝑛1 ), 𝑟(𝑛1 ), … , 𝑟(𝑛𝑛 )}, (22)

𝐹𝑠𝑘 = {𝑟(𝑛1 ), 𝑟(𝑛1 ), … , 𝑟(𝑛𝑛 )}, (23)

𝐹𝑠𝑓𝑖𝑛𝑎𝑙 = {𝑓𝑠 (𝑝1 ), 𝑓𝑠 (𝑛1 ), … , 𝑓𝑠 (𝑛𝑛 )}, (24)

3. RESULT AND ANALYSIS

the specificity is computed as (28):

the precision is computed as (29):

the F-measure is computed as (30).

3.1. Predictive model performance evaluation

Figure 3. Specificity performance of different ML algorithms for predicting student performance

Figure 4. Sensitivity performance of different ML algorithms for predicting student performance

Figure 5. ROC performance of different ML-based student performance prediction models

3.2. Feature importance performance

Figure 6. Feature ranking score graphical representation

3.3. Student performance prediction for a different session

3.4. Feature ranking importance

Figure 12. Feature ranking score graphical representation for session 2

Figure 13. Feature ranking score graphical representation for session 3

Figure 14. Feature ranking score graphical representation for session 4

Figure 15. Feature ranking score graphical representation for session 5

Figure 16. Feature ranking score graphical representation for session 6

Shashirekha Hanumanthappa currently working as assistant professor in the

You might also like