0% found this document useful (0 votes)
35 views18 pages

AI Student

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views18 pages

AI Student

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Computers in Human Behavior 158 (2024) 108301

Contents lists available at ScienceDirect

Computers in Human Behavior


journal homepage: www.elsevier.com/locate/comphumbeh

AI student success predictor: Enhancing personalized learning in campus


management systems
Muhammad Shoaib a, Nasir Sayed a, b, Jaiteg Singh c, **, Jana Shafi d, Shakir Khan e, f,
Farman Ali g, *
a
Department of Computer Science, CECOS University of IT and Emerging Sciences, Khyber Pakhtunkhwa, Pakistan
b
Department of Computer Science, Islamia College Peshawar, Khyber Pakhtunkhwa, Pakistan
c
Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
d
Department of Computer Engineering and Information, College of Engineering in Wadi Alddawasir, Prince Sattam Bin Abdulaziz University, Wadi Alddawasir, 11991,
Saudi Arabia
e
College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, 11432, Saudi Arabia
f
University Centre for Research and Development, Department of Computer Science and Engineering, Chandigarh University, Mohali, 140413, India
g
Department of Applied AI, School of Convergence, College of Computing and Informatics, Sungkyunkwan University, Seoul, 03063, South Korea

A R T I C L E I N F O A B S T R A C T

Handling editor: Paul Kirschner Campus Management Systems (CMSs) are vital tools in managing educational institutions, handling tasks like
student enrollment, scheduling, and resource allocation. The increasing adoption of CMS for online and mixed-
Keywords: learning environments highlights their importance. However, inherent limitations in conventional CMS plat­
Learning management system forms hinder personalized student guidance and effective identification of academic challenges. Addressing this
Predictive analytics
crucial gap, our study introduces an AI Student Success Predictor empowered by advanced machine learning
Machine learning
algorithms, capable of automating grading processes, predicting student risks, and forecasting retention or
AI-Driven educational systems
Student success prediction and educational dropout outcomes. Central to our approach is the creation of a standardized dataset, meticulously curated by
technology integrating student information from diverse relational databases. A Convolutional Neural Network (CNN)
feature learning block is developed the extract the hidden patterns in the student data. This classification model
stands as an ensemble masterpiece, incorporating SVM, Random Forest, and KNN classifiers, subsequently
refined by a Bayesian averaging model. The proposed ensemble model shows the ability to predict the student
grades, retention, and risk levels of dropout. The accuracy achieved by the proposed model is assessed using test
data, culminating in a commendable 93% accuracy for student grade prediction and student risk prediction, and
a solid 92% accuracy for the complex domain of retention and dropout forecasting. The proposed AI system
seamlessly integrates with existing CMS infrastructure, enabling real-time data retrieval and swift, accurate
predictions, enhancing academic decision-making efficiency. Our study’s pioneering AI Student Success Pre­
dictor bridges the chasm between traditional CMS limitations and the growing demands of modern education.

1. Introduction can also provide a feeling of purpose and meaning in life. In addition to
personal benefits, education also has societal advantages (Jones-Khosla
Education is essential to everyone’s existence because it enables the & Gomes, 2023). Education is crucial for the economic development and
development of the knowledge, skills, and values required for success in progress of a nation, as it fosters a more knowledgeable and involved
life. Education equips individuals with the knowledge, skills, and re­ population. This can lead to stronger economies and more successful
sources necessary to make enlightened decisions, solve issues, and civilizations. Overall, education is a crucial component of a successful
contribute to their communities (Maguvhe, 2023). It helps people and full life, and it is necessary for the growth of both individuals and
comprehend their place in the world and the world around them, and it civilizations (Gare, 2023).

* Corresponding author.
** Corresponding author.
E-mail addresses: [email protected] (M. Shoaib), [email protected] (N. Sayed), [email protected] (J. Singh), [email protected]
(J. Shafi), [email protected] (S. Khan), [email protected], [email protected] (F. Ali).

https://doi.org/10.1016/j.chb.2024.108301
Received 22 February 2024; Received in revised form 22 April 2024; Accepted 12 May 2024
Available online 13 May 2024
0747-5632/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

Retention is the capacity of a student to remain enrolled in an understanding of student learning and in identifying ways to enhance
educational institution or program. In contrast, achievement refers to a the efficacy of educational programs. A learning management system
student’s academic development and the achievement of their educa­ (LMS) is a software application or web-based technology that is used to
tional objectives. There are a lot of elements that can influence both organize, implement, and evaluate a particular learning process. A
student retention and achievement (Rothes et al., 2022; Yadusky et al., learning management system often allows an instructor to develop and
2021). Academically unprepared students may struggle to complete deliver course content, monitor student progress, and manage student
their coursework and may be more prone to drop out (Academic pre­ interactions (Laparra et al., 2023).
paredness). Financial obstacles, such as the expense of tuition or the The core problem we aim to address revolves around the limitations
necessity to work while attending school, may make it more difficult for of traditional Campus Management Systems (CMSs) in providing
students to remain enrolled and make academic progress (Financial personalized guidance to students and effectively identifying and miti­
Considerations). Students who feel supported by their professors, gating potential academic challenges. Conventional CMSs often struggle
friends, and school community are more likely to fulfil their academic to adapt to the dynamic landscape of online and mixed-learning envi­
goals and remain in school (Allen et al., 2021). Students with personal or ronments, hindering their ability to meet the diverse needs of students
social difficulties, such as mental health concerns or family conflicts, across different academic departments and demographics. Our study
may find it difficult to concentrate on their schoolwork and may be more seeks to bridge this gap by introducing an AI Student Success Predictor
likely to drop out. Schools and educational programs may provide ac­ empowered by advanced machine learning algorithms. By leveraging
ademic support services, such as tutoring and study skills workshops, student data from various relational databases, our approach aims to
and financial aid or other resources to help students overcome obstacles automate grading processes, predict student risks, and forecast retention
to success in order to increase retention and achievement. Moreover, or dropout outcomes. Through the integration of AI-based systems into
providing a supportive and inclusive learning atmosphere can help kids existing CMS infrastructure, we endeavor to enhance the efficiency and
feel more connected to their school and increase their success motiva­ efficacy of academic decision-making, ultimately revolutionizing the
tion (Oldehinkel & Ormel, 2023). educational landscape and meeting the growing demands of modern
Computer science and machine learning are applied sciences that education.
have significantly influenced the evolution of educational technologies An artificial intelligence (AI) integrated Campus Management Sys­
(Alshurideh et al., 2023). Computer science is the study of computers tem (CMS) serves as a dynamic educational platform that elevates the
and computational systems, and it involves a vast array of topics, such as realms of teaching and learning. Powered by a spectrum of AI technol­
algorithms, data structures, programming languages, and computer ogies, encompassing machine learning, natural language processing,
hardware. It is essential to the development of educational technologies and predictive analytics, the CMS orchestrates the automation and
because it provides the fundamental knowledge and tools required to enhancement of diverse aspects within the educational landscape (Salah
design and implement educational software and hardware systems et al., 2023). This AI-enhanced CMS embarks on the creation of tailored
(Aminizadeh et al., 2023). Machine learning is a subject of computer learning paths for each student, leveraging their strengths, weaknesses,
science that entails the creation of algorithms and models that can learn and preferences, thus optimizing their educational journey (Dawson).
from data and make inferences based on that data. In the context of Harnessing the prowess of predictive analytics, the AI-infused CMS
educational technologies, machine learning can be used to evaluate effortlessly adapts test complexities based on a student’s performance.
student data to personalize learning experiences and offer students This intelligent system encompasses automated grading, offering edu­
real-time feedback. The convergence of computer science and machine cators respite from routine assessments and enabling them to focus on
learning with educational technologies has resulted in the creation of more strategic educational facets. One of the significant considerations
novel and effective teaching and learning tools (Zheng et al., 2023). in adopting an AI model for educational data mining is the alignment of
Learning analytics is the collecting, analysis, and reporting of data on the model with the unique characteristics of the data (Shoaib et al.,
learners and their settings for the purpose of optimizing learning and its 2023a). Structured data such as grades and test scores may harmonize
environments (Shoaib, Sayed, et al., 2022a,b,c). Learning analytics can well with linear models, while unstructured data like text or audio may
be used to monitor student progress, identify problem areas, and provide necessitate more sophisticated models like neural networks. Tailoring
individualized assistance to help students achieve. It can also be utilized the model to desired outcomes is imperative, with decision tree models
to uncover trends and patterns in student behavior and performance and excelling in forecasting, and clustering models being well-suited for
to improve educational programs and resources (Kew & Tasir, 2022). pattern identification (Przegalinska & Jemielniak, 2023).
Learning analytics is an expanding discipline that is enhancing the ef­ Developing a machine learning model for predicting student per­
ficacy and efficiency of education by offering insights into student formance involves a sequence of meticulous stages. Commencing with
learning and assisting educators in making data-driven decisions. The the collection of relevant student performance and attribute data, fol­
practice of identifying patterns and insights from massive databases is lowed by data cleansing and preprocessing to prepare it for machine
known as data mining (Shu & Ye, 2023). Data mining can be utilized in learning utilization (Bhatt, 2023). The division of data into training and
the field of education to evaluate student data in order to comprehend test sets further refines the model’s efficacy, enhancing its performance
and improve learning outcomes. evaluation. This sophisticated AI-enhanced CMS addresses a multitude
Data mining can assist educators in identifying trends and patterns in of educational challenges. The AI-driven individualized learning paths
student data, such as exam performance and course content engagement empower each student to progress at an optimal pace while ensuring
(Aulakh et al., 2023). This can help educators comprehend how pupils necessary support. Automated test difficulty adjustments foster a
are learning and identify potential areas of difficulty. Data mining can be balanced challenge, preventing student disengagement. Furthermore,
used to discover the strengths and weaknesses of individual students and advanced CMS capabilities encompass automated grading and real-time
to give them customized, individualized learning experiences (Ismail tailored feedback, freeing educators from mundane tasks and bolstering
et al., 2023). Data mining can be used to evaluate student data from the teaching and learning process.
earlier iterations of a course in order to identify areas in which the The paper introduces a novel fuzzy rule-based path planning algo­
course could be updated. Assessing the performance of educational in­ rithm for mobile robots in complex terrain (Wang et al., 2022). By
terventions: By examining the influence of educational interventions, integrating spatial point-taking methods with Dijkstra’s and fuzzy logic
such as new teaching techniques or resources, on student learning out­ algorithms, it addresses issues in existing methods related to motion
comes, data mining can be used to evaluate the effectiveness of educa­ laws, particularly angular and linear acceleration. The research paper
tional interventions, such as new teaching methods or resources. The use addresses the reachable set problem for impulse switched singular (ISS)
of data mining in education can assist educators in gaining a deeper systems with mixed time-varying delays using Lyapunov theory. It

2
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

introduces a real-bounded lemma to analyze impulse switching points experience. For example, a learning management system might recom­
and establish a condition (Zhang et al., 2023), presented as linear matrix mend specific resources or activities based on a student’s progress or
inequalities (LMIs), ensuring the system’s reachable states remain interests. It can also be used to track student engagement with course
within a closed bounded region. Simulation results confirm the effec­ content and use this information to identify factors that influence
tiveness of the proposed approach. When considering the ideal model, engagement. For example, an instructor might use analytics to identify
CNNs present a potent solution for image and video data analysis when students are most likely to drop out of a course and intervene to
(Shoaib et al., 2022a,b,c, pp. 1–15). Their applications span from image provide support. Students’ progress over time provides feedback on their
classification to object recognition and facial detection. Within the learning are also a goal of learning analytics. This can help instructors to
educational sphere, CNNs can gauge performance metrics, like grades identify areas where a student is struggling and provide targeted
and test scores, to predict student success or identify individuals in need support.
of additional support (Talebi et al., 2023). The AI-infused CMS harnesses Learning analytics can be used to collect data about student learning
the prowess of machine learning algorithms (Singh et al., 2023), outcomes and use this information to evaluate the effectiveness of
fostering an educational experience attuned to each student’s unique different teaching strategies. This can help instructors to improve their
learning styles and preferences. By predicting student performance and teaching and create more effective learning experiences. Learning ana­
identifying potential pitfalls, educators and administrators can proac­ lytics in education can significantly improve the effectiveness of smart
tively provide timely interventions, promoting student success. While education, by providing insights into student learning and engagement
AI-based learning management systems promise a host of benefits, and helping educators to tailor instruction to the needs of individual
including enhanced accessibility through natural language processing learners.
and speech recognition, it’s crucial to align these solutions with specific There are many different theoretical frameworks that have been
educational needs (Saadati et al., 2023). A thorough analysis of the proposed to guide the development and implementation of learning
application’s requirements should steer the selection of the most fitting analytics. Learning analytics should be aligned with theories of learning
AI-based CMS, ensuring a harmonious educational ecosystem. and consider how data can be used to support and enhance learning
Below are the major contributions of this research study. processes. Learning analytics should be based on robust and reliable
data sources, and consider issues of data quality, privacy, and ethics.
• The study’s foundational contribution lies in the meticulous curation Learning analytics should provide timely and actionable feedback to
and integration of diverse student data from various relational da­ learners, instructors, and other stakeholders, and consider how this
tabases. This comprehensive dataset is harmonized to ensure accu­ feedback can be used to support learning and improve educational
racy and consistency, forming the bedrock for subsequent analyses. outcomes. Personalization in learning analytics refers to the use of data
• An innovative feature learning block, based on CNN, is introduced. and technology to tailor the learning experience to the needs and pref­
This intricate process unveils latent patterns within student data, erences of individual learners. This can involve adapting the content,
enabling the extraction of meaningful insights that significantly pacing, or delivery of instruction to meet the unique needs of each
bolster the performance of predictive models. learner and may be based on data such as a learner’s past performance,
• A distinctive contribution is the development of an ensemble clas­ interests, or learning style. Personalized learning can improve student
sification model, combining Support Vector Machine (SVM), engagement by providing learners with content and activities that are
Random Forest, and K-Nearest Neighbors (KNN) classifiers. relevant and meaningful to them. Personalized learning can enhance
Augmented by Bayesian averaging, this model showcases a novel motivation by providing learners with a sense of ownership and control
approach to enhancing predictive accuracy. over their own learning, and by making the learning experience more
• The study introduces a holistic prediction framework encompassing relevant and enjoyable. Personalized learning has been shown to
student grade prediction, risk assessment, and progression outcomes improve learning outcomes, particularly for students who may struggle
(retention or dropout). This multifaceted approach demonstrates the with traditional approaches to instruction. Personalization in learning
capability to address complex educational challenges in a unified analytics also presents some challenges, including the need for robust
manner. and reliable data sources, and the potential for biased algorithms or
• Rigorous evaluation of the models using unseen test data results in personalized recommendations to perpetuate existing inequalities.
noteworthy accuracy rates. Specifically, the models achieve 93% Personalization in learning analytics has the potential to significantly
accuracy for grade and risk prediction, and 92% accuracy for improve the effectiveness of education, but it is important to carefully
retention/dropout prediction, underscoring the reliability of the AI- consider the potential benefits and challenges and to ensure that
based system. personalized approaches are fair and equitable for all learners.
Educational data mining is the application of data mining techniques
This article starts by introducing the proposed model and its unique to the field of education, with the goal of discovering patterns and re­
contributions. Then in section 2, which is a literature review, it looks at lationships in educational data that can be used to improve teaching and
what other research has found, pointing out the gaps it wants to fill. learning. Educational data mining typically involves the analysis of data
Section 3 explains in detail how the study was done, including how the from educational technologies, such as learning management systems,
data was collected and processed, and how the machine learning models student response systems, and intelligent tutoring systems. Educational
were built using an ensemble classification approach. Section 4 discusses data mining as student modeling can be used to create models of student
the results of the experiments, giving insights into how well the model knowledge, skills, and learning strategies, which can be used to
performs. This section also wraps up with a discussion of the proposed personalize instruction and support learning.
model, and results and highlights the significance of the AI-based LMS. Educational data mining can be used to predict student performance,
identify at-risk students, and forecast the impact of interventions. This
2. Literature review can help educators intervene early and provide targeted support to
students who may be struggling. Educational data mining can be used to
Learning analytics is the measurement, collection, analysis, and evaluate the effectiveness of educational interventions, such as new
reporting of data about learners and their contexts, for the purposes of teaching methods or technologies. This can help educators to identify
understanding and optimizing learning and the environments in which it what works and what doesn’t make informed decisions about how to
occurs. In the context of smart education, learning analytics can be used improve the learning environment. Educational data mining can be used
to collect data about a student’s strengths, weaknesses, and learning to identify patterns and trends in student data, which can inform the
preferences, and use this information to personalize the learning design of new learning resources or the improvement of existing ones.

3
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

This can help educators to create more effective and engaging learning 2.2. Learning management system
materials. Education data mining has the potential to significantly
improve the performance of a learning environment by providing in­ The ubiquitous educational setting entails the utilization of a
sights into student learning and helping educators tailor instruction to learning management system (LMS), with Moodle serving as an exem­
the needs of individual learners. plar of an open-source platform within this domain. Martin Dougiamas
disseminated the prototype of Moodle (Palmer, 2013) in 1999 as an
2.1. Students at risk integral component of his doctoral dissertation at Curtin University
(Dougiamas & Taylor, 2000). The second paper, with Peter Charles
Student retention and performance are the two primary areas of Taylor (Dougiamas & Taylor, 2002), employs Moodle for constructivism
focus for EDM and LA. Some journals connected to learning analytics, graduate courses. Social constructivist pedagogy informs Moodle’s
enterprise data management, and online learning focus on student design and growth. This citation can be found at https://docs.moodle.
retention and identifying predictors of student success, where success is org/35/en/Philosophy. The premise of constructivist learning theory
defined as passing or finishing a course. In the fields of AL and MLE, (Staver, 1998) is that humans generate knowledge via experience. Social
these are considered typical areas of study. Tinto (Tinto, 1975) is a constructivism (Doolittle & Camp, 1999) promotes peer interaction.
highly quoted paper in the literature on student retention that presents a According to constructivist learning theory, humans gain knowledge
theoretical framework for comprehending student behavior. A research more efficiently when they make things. Therefore, the majority of
finding performed by Tinto’s is robust and similar to the social Moodle Type 2 activities adhere to these learning theories. Assessments
constructivist mechanism on the basis of which Moodle (Doolittle & in Moodle vary significantly based on the Moodle activity being evalu­
Camp, 1999) is founded. Based on the essay by Tinto’s (Tinto, 1975), ated. It is possible to automatically grade quizzes and provide students
after a student is accepted into a degree or program, social character­ with comments depending on their responses. The instructor often
istics become more significant than their background. Therefore, social grades assignments manually. Consequently, instructors possess the
integration is one of the most important factors to consider when capability to furnish personalized feedback on student submissions and
attempting to predict which students will drop out of a degree program authorize select students to revise their work. Within Moodle courses,
or field of study. In the LA and MDE literature, using various data adaptive exercises incorporate both content and question sections, with
sources to forecast at-risk students is commonplace. The authors of students’ responses to queries undergoing automatic grading. Moodle
(Romero et al., 2008), research on early student performance using also enables other sorts of assessments that demand greater student
Moodle data to predict students’ final grades, employed multiple clas­ participation, such as students evaluating the information supplied by
sification algorithms to predict student success. An early warning pre­ other students in a workshop or evaluating the contributions of other
dictive model for students at risk of dropping out was assessed using a students to grading activities.
large number of variables as predictors (Márquez-Vera et al., 2016), In the second case, contributions can be things like forum posts,
including demographics, prior accomplishment, and high reporting glossary entries, or database activity entries. All grades are kept in the
precision. Establishing weekly predictive models for many courses (He Moodle grade book, which gives teachers different reports and sum­
et al., 2015); The study also reported the efficacy of early intervention as maries. You can get to Moodle’s backend systems using the internet or
opposed to intervention in the latter weeks of the course. Caldwell et al. mobile apps. When Moodle runs online courses, student learning actions
(Coldwell et al., 2008) Add demographic information to are logged and saved in a database with granular activity records.
student-generated learning management system activities. Moodle provides a relatively rudimentary reporting capability by
The study discovered a relationship between demographic informa­ default. Its reporting features enable users, typically instructors, to ac­
tion such as gender nationality and performance. OU Analyze is another cess course activity logs, and student or activity group statistics, and to
example of merging demographic and activity data (Item, 2015). The build aggregated data graphs. Moodle LMS does not include analysis
project’s-built technologies automatically generate teacher dashboards. tools to extract data from activity logs and visualizations; nevertheless, a
Chaplot et al. (Chaplot et al., 2015) Using neural networks and senti­ number of Moodle-compatible plugins can be added independently and
ment analysis, the phenomenon of MOOC dropout was investigated. It are simple to install.
was determined to what extent the content of various MOOCs and the Moodle possesses two notable educational data mining tools: CVLA
interaction between instructors and students caused dropout (Hone & El (Dragulescu et al., 2015), which includes a predictive job submission
Said, 2016). Student performance was also investigated (Xing et al., model, and MDM (Luna et al., 2017). This open-source educational data
2015); the objective of this piece is to enhance teachers’ analysis and mining application facilitates the discovery of new knowledge. Some of
comprehension of performance data. In a recent study (Burgos et al., these tools are open source, while others are equipped with predictive
2017), for instance, the authors utilized various classification methods analytics. To effectively oversee the entire process from hypothesis
to predict academic failure. Other research-based predicting approaches testing to obtaining actionable findings, it is necessary to establish a
include homework submission (Dragulescu et al., 2015) and categori­ comprehensive framework. At present, the popularity of Moodle has
zation of student learning styles (Abdullah et al., 2015; II & Bower, surged, boasting a user base of 144,413,576 users and offering 15
2011). Although demographic input variables have been included in million courses across over 100,000 Moodle sites worldwide. There are,
certain studies (Jayaprakash et al., 2014a,b; Márquez-Vera et al., 2016; however, other LMSs with significant market share. Blackboard Learn4
Palmer, 2013), variables based on student-generated activities in the provides a variety of exams, assignments, and learning modules.
LMS appear to be the subject of the majority of research. In 2005 (Morris Canvas5 is a contemporary LMS developed by Instructure that shares
et al., 2005), variables such as the number of content pages read, features with Moodle and Blackboard Learn. It delivers the majority of
number of original articles, and course length variables were studied. functionality through external LTI programs that are also compatible
Other studies have expanded the variable lists to include specific LMS with other LMSs. Google Classroom is yet another well-known LMS that
activities such as assignments, quizzes, forums, and wikis (Zacharis, was published in 2014 and was created by Google. It offers assignments,
2015). Other LMS elements, such as message systems, have also been quizzes, and a variety of grading techniques.
evaluated as predictors (Macfadyen & Dawson, 2010). There is consid­
erable variation amongst studies, making it difficult to draw solid 2.3. Supervised learning
findings. The predictive effectiveness of LMS activity-based variables
varies by course and by activity type. Machine learning falls under the umbrella of computer science and
encompasses various techniques for modeling data. Within the domain
of supervised learning, algorithms are employed to discern a function

4
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

that delineates the correlation between a dependent variable and a set of component that addresses the transferability of predictive models across
independent variables. Notably, regression analysis and least squares institutions. The results of this project have been published in several
represent early instances of supervised learning algorithms (Gauss et al., articles (Lauria et al., 2012; Jayaprakash, 2014a,b; Jayaprakash et al.,
1857; Legendre, 1805). The linear discriminant (Kemp, 2003) is a pre­ 2014a,b, 2016). According to these investigations, the predictive per­
liminary classifier. Another prominent classifier based on Bayes’ theo­ formance of datasets from different institutions is high. Subsequent
rem (‘The Royal Society is collaborating) is Naive Bayes (Rish, 2001), research (Gašević et al., 2016) has contended that the transferability of
which assumes that the features are entirely independent and indepen­ predictive models lacking specific teaching conditions can yield highly
dently linked with the labels. Sadly, this assumption also represents the variable outcomes when applied to different courses, posing potential
greatest shortcoming of these models (Rish, 2001). Logistic regression threats to result validity such as the overestimation or underestimation
(Cox, 1958) was an additional primary classifier; this is characterized as of certain predictors. Recent studies (Conijn et al., 2017) corroborate
a linear separable model employing a sigmoid activation functions. As a these findings. Scholars in the realms of learning analytics and educa­
classification technique, it attempts to identify functions which apply to tional data mining are increasingly mindful of the transferability of
the overall training data instances in order to more precisely estimate predictive models across diverse courses. Applying a model trained on
the dependent variable. The decision tree (Quinlan, 1986) is another data from one course to data from another course poses significant
categorization system that has been popular for decades. Every terminal considerations (Gitinabard et al., 2019), the authors report doing so with
node within the tree symbolizes a composite of the edges (also known as at least 60% accuracy. The authors studied two different courses from
input features) encompassing all branches of the tree leading up to the two different courses. Another recently published example of evaluating
root node. Decision trees are closely connected to random forests (Ho, the portability of the author’s predictive model is (Moreno-Marcos et al.,
1995). They attempted to circumvent the overfitting issue associated 2019). In this publication, the author reports high accuracy. They
with decision trees by generating many trees and pooling their results. concluded that predictive models can be successfully transferred to
SVMs (Cortes & Vapnik, 1995) are extensively used supervised model different educational contexts if certain context-relevant conditions are
for classification and regression analysis. SVM segment the linear line met, such as the course having the same or similar users (Mor­
(hyperplane) and categorize samples in accordance. Artificial neural eno-Marcos et al., 2019). Another recently published example of eval­
networks (ANNs) (Rosenblatt, 1958) are well-known functions for su­ uating the portability of the author’s predictive model is (Morris et al.,
pervised and unsupervised learning that resemble the human neuro 2005).
system for passing and forwarding of message for task completion and
learning purpose. ANNs enhance their performance during the model 3. Methodology
training phase by tunning the parameters and back propagating the
discrepancies between the data points labels computed by the ANN In this phase, software frameworks are developed for educational
(Rumelhart et al., 1986) and the actual values to the preceding layer of data mining and learning analytics prediction models by providing a set
the network, where the weights of the ANN are modified. of standardized tools and libraries that are used to build and deploy
CNNs are frequently employed in computer vision (Shoaib et al., these models. The schematic representation of the proposed framework
2023b); the visual cortex of animals stimulates CNNs, which analyze for student analysis and future prediction, encompassing Final Grade
nearby characteristics of each feature to discover patterns. For time Prediction, Risk of Dropout Prediction, and Progression Prediction
series problems, recurrent neural networks (RNNs) (Shoaib, Hussain, Model Assessment, is depicted in Fig. 1. One advantage of using a soft­
et al., 2022, pp. 1–18) and long-term memory units (LSTMs) (Ullah ware framework is that it can save time and effort by providing pre-built
et al., 2021) are utilized. This form of ANN is employed in systems for components that can be easily reused and customized for specific ap­
natural language processing and speech recognition. Instead of a single plications. For example, a software framework might include tools for
feature vector, such a network receives a sequence of vectors as input. data preprocessing, feature extraction, model training and evaluation,
The output is computed by merging each input vector with the internal and visualization, which can be used to build and test prediction models
state generated by the preceding input vector. This network type has more efficiently (Aldoseri et al., 2023). Another advantage of using a
also been employed to predict school dropout (Fei & Yeung, 2015). software framework is that it can help to ensure the quality and reli­
In machine learning, data preprocessing is an essential task. Data ability of prediction models by providing a consistent and
preparation approaches include filling in missing values, which is well-documented development process. A software framework can also
essential for sparse datasets, or ensuring that there are no outliers that facilitate collaboration and sharing of prediction models by providing a
could impair the prediction model’s performance (Shoaib, Sayed, et al., common platform that can be used by multiple researchers and de­
2022a,b,c). Data cleaning involves the thorough examination of a velopers. A software framework provides a solid foundation for the
dataset to rectify any inaccuracies present in its samples and features, development of educational data mining and learning analytics predic­
thereby ensuring their alignment with the topic under investigation. tion models by saving time and effort, ensuring quality and reliability,
Furthermore, normalization and data standardization are additional and facilitating collaboration and sharing.
techniques commonly employed in machine learning to ensure that all This section outlines the research design and procedures employed in
features fall within a specified range of values, thereby enhancing the the development and evaluation of the AI-based Learning Management
efficacy of machine learning algorithms (Starbuck, 2023). System (LMS) for the specific context of Pakistan’s university environ­
ment. The research aimed to address the unique challenges faced by
2.4. Predictive model’s portability students in engineering, computer science, management science,
biotechnology, and pharmacy departments by utilizing a mixed-method
Portability stands as a crucial component of any predictive model, research approach, combining elements of experimental and quasi-
ensuring its capability to furnish dependable predictions utilizing data experimental designs. The methodology encompasses multiple phases,
from diverse origins. Both the data utilized for generating predictions starting with data collection from various universities, followed by
and the data employed for training the supervised learning algorithm rigorous data preprocessing and the development CNN features learning
can stem from the same dataset. However, portability requires block with an ensemble classification model. The models were fine-
leveraging data collected from various sources. In this particular case, tuned and evaluated to predict student success and identify those at
different courses are offered through different LMSs. Examining issues risk of academic hazards. Subsequently, the AI-based LMSwas integrated
related to the expansion of learning analytics techniques and solutions in into the university environment, offering individualized learning expe­
higher education (Jayaprakash et al., 2014a,b) is the goal of the Open riences, predictive analytics, and automated feedback to support student
Academic Analytics Initiative (OAAI). The program includes a retention and academic achievement. User feedback and iterative

5
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

Fig. 1. Proposed framework for student analysis and predictive modelling.

improvement played a pivotal role in refining the system to meet the AI-based LMS to cater to the specific needs of individual students.
diverse needs of different departments and provide valuable insights for Detailed information regarding the subjects or courses undertaken
future research and implementation of best practices in AI-driven by students within their respective departments was also incorporated
educational technologies. into the dataset which can be seen in Table 1. This included course
codes, course titles, credit hours, and academic semesters during which
the courses were completed. Understanding the subject specifics allowed
3.1. Dataset the AI-based LMS to offer tailored learning experiences, identifying
areas of strength and weakness for each student and providing relevant
The dataset used in this research study was meticulously curated to learning resources accordingly.
comprehensively address the diverse academic challenges faced by Table 2 provides an insightful overview of various student statistics
students in the departments of Engineering, Computer Science, Man­ across different academic departments. The table includes information
agement Science, Biotechnology, and Pharmacy within esteemed uni­ on the number of students enrolled, the average age of students, the
versities across Pakistan. The dataset was sourced from multiple count of male and female students, and the average CGPA (Cumulative
educational institutions, representing a diverse student population, and Grade Point Average) achieved by students in each department. This
encompassed comprehensive student records spanning multiple aca­ table enables a quick comparison of key metrics related to student de­
demic years. These records comprised a rich set of academic indicators, mographics and academic performance among various departments.
including course grades, test scores, course enrollments, student in­ The information presented can be valuable in identifying trends,
teractions within the Learning Management System (LMS), and essential assessing the distribution of students, and understanding the academic
demographic details. The dataset included a wide array of student- achievements of students in each discipline.
specific details, such as age, gender, ethnicity, and prior educational In the above Table 3, we have three predicted attributes: "Final
background. This information offered valuable insights into the diverse Grade," "Student Risk," and "Progression Status." For each attribute, the
composition of the student population across the selected departments. corresponding classes or categories are listed. For "Final Grade," the
By understanding these demographic aspects, the study aimed to classes represent the different possible grades a student can achieve,
address any potential disparities in academic performance and tailor the ranging from "A" to "F." For "Student Risk," the classes represent the
different levels of risk a student may have, which are categorized as
Table 1 "Low," "High," and "Moderate." For "Progression Status," the classes
Attributes of the student dataset. indicate whether a student is "Retained" in the educational program or at
Attribute No. Attribute Name Type/Value risk of "Dropout."
1 Student ID Numeric
These predicted attributes and their classes are essential for machine
2 Department Char/String learning models to make accurate predictions and classifications based
3 Age Numeric on the input data. The models can use these classes as target labels to
4 Gender Char
5 Prior Education String
6 Campus Distance Numeric (Kilometers)
7 Subject Code Char Table 2
8 Subject Name Char Overview of student statistics by department.
9 Credit Hours Numeric
Department Number of Average Male Female Average
10 Semester Numeric
Students Age Students Students CGPA
11 Subject Percentage Numeric
12 GPA Numeric Engineering 350 21.5 240 110 2.7
13 CGPA Numeric Computer 250 20.8 180 70 2.6
14 Prerequisite Subject Name Char Science
15 Prerequisite Subject GPA Numeric Management 300 21.3 150 150 2.8
16 Final Grade Char Science
17 Risk of Dropout Char (No Risk, Low, Moderate, High) Biotechnology 200 21.9 100 100 2.9
18 Progression Char (Retention, Dropout) Pharmacy 180 23.1 90 90 2.7

6
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

Table 3 and rectify discrepancies, inconsistencies, and missing values. Through


Predicted Attributes and their Corresponding Classes/Categories by the cleansing operations, the dataset’s reliability was fortified, setting the
Proposed AI-LMS. stage for accurate and meaningful analyses.
Predicted Attribute Classes/Categories Talend’s cross-platform integration capabilities played a crucial role
Final Grade A, B+, B C+, C, D+, D, F
in harmonizing data originating from different database management
Student Risk Low, High, Moderate systems. The software seamlessly integrated data from SQL Server,
Progression Status Retained, Dropout Oracle, MySQL, and the.csv file, enriching the dataset with attributes
such as course codes, titles, credit hours, semesters, and student
demographics.
learn patterns and make predictions for new data instances, aiding
The integrated and transformed data was stored in a central re­
educational institutions in identifying at-risk students, understanding
pository, enabling easy access and retrieval for subsequent analysis and
academic performance, and improving overall student outcomes.
modelling. Aggregation processes were applied to consolidate student
records, course details, and performance indicators into a unified data­
3.2. Data collection, integration, and transformation set. The successful execution of these steps using Talend software
facilitated the creation of a unified and standardized dataset, represen­
The data utilized in this research study was meticulously curated to tative of diverse academic departments. The streamlined data collection
comprehensively address the diverse academic challenges faced by and integration process ensured data integrity and reliability, laying the
students across various departments within esteemed universities in foundation for subsequent analyses and modelling in the research study.
Pakistan. This section outlines the process of data collection, integra­
tion, and transformation, which was executed with precision using
Talend software. The software’s capabilities facilitated the harmoniza­ 3.3. Outlier Detection Using Z-score method
tion of data from disparate sources and its subsequent preparation for
analysis. The initial phase involved the identification of data sources, In the realm of student data analysis, identifying and handling out­
including SQL Server for Engineering, Oracle for Computer Science and liers is crucial for obtaining accurate insights into students’ academic
Management Science, and MySQL for Biotechnology and Pharmacy. performance and well-being. One widely used method for outlier
These sources represented a wide range of academic departments, each detection is the Z-score method, which leverages statistical measures to
contributing to the comprehensive dataset. In addition, a flat file identify data points that deviate significantly from the mean of the
structure in the.csv format was created to represent unrelated database dataset. The Z-score is a measure of how many standard deviations a
sources. data point is away from the mean. By calculating the Z-score for each
Additionally, Fig. 2 illustrates the seamless conversion process from student attribute, such as test scores, grades, or academic progress in­
relational databases to a flat database structure utilizing the Talend dicators, we can identify students whose data points lie far from the
Application, emphasizing the efficiency of data integration. Talend’s average, indicating potential outliers. A Z-score greater than a pre­
versatile connectivity features were employed to extract data from SQL defined threshold suggests that the data point is an outlier and requires
Server, Oracle, MySQL, and the.csv file. This step ensured the seamless further examination.
extraction of data, regardless of the source type, and facilitated its Fig. 3 shows the visualization of outlier removal using the box plot.
transition into a unified environment for further processing. In the context of our student dataset, applying the Z-score outlier
Upon extraction, the collected data underwent a series of trans­ detection method allows us to pinpoint students who exhibit extraor­
formation processes to ensure compatibility and harmonization. Tal­ dinary academic performance or experience challenges that differ
end’s data transformation capabilities allowed for mapping, merging, significantly from their peers. These outliers may be students who
and standardizing data elements, fostering a cohesive dataset structure consistently achieve exceptionally high grades (positive outliers) or
that accommodated diverse data formats. Data quality and integrity students who face academic struggles and deviations from the norm
were paramount. Talend’s data quality tools were utilized to identify (negative outliers). Detecting such outliers can provide valuable insights

Fig. 2. Relational Databases to Flat Database Conversion using the Talend Application.

7
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

Fig. 3. Box plot - outlier detection using Z-score.

normalizes the activations of the previous layer, which enhances the


Table 4
stability and convergence of the training process. This normalization
Example statistics before and after Z-score outlier removal.
step aids in faster and more efficient learning while promoting better
Statistic Before Removal After Removal generalization.
Mean 60.25 64.12 Moving on to the Fourth Level, the Dropout Layer is introduced to
Median 61.50 65.00 prevent overfitting. Dropout randomly deactivates a fraction of neurons
Standard Deviation 15.63 8.28 during training, forcing the network to learn more robust and less reliant
features. This regularization technique improves the model’s ability to
to educational institutions and instructors, enabling them to offer generalize to unseen data. The Fifth Level concludes the learnable fea­
tailored support and interventions based on individual student needs. tures block with a Fully Connected Layer (Dense Layer). This layer takes
Table 4 shows the details of the removed outlier sample from the the learned features from the previous layers and forms the classification
dataset. block.
The proposed features learning block (FLB) employs a multi-level
CNN architecture to extract and learn meaningful features from input
3.4. Features learning block data. At the final level of the FLB, a Fully Connected Layer (Dense Layer)
is utilized to output the learned features as a feature vector. This feature
The proposed features learning block is a multi-level architecture vector contains high-dimensional representations of the input data,
designed to extract and learn meaningful features from input data. The capturing both low-level and high-level features that are crucial for
model is composed of various layers that can be seen in Table 5 which subsequent classification tasks.
perform specific operations to process the data and gradually capture The output feature vector from the FLB serves as valuable input to an
hierarchical representations. At the start of the model is the Input Layer, ensemble classification model, which comprises three popular machine
which receives the raw input data. The subsequent First Level consists of learning classifiers: SVM, KNN, and Random Forest. Ensemble learning
three convolutional layers - Convolutional1, Convolutional2, and Con­ combines the predictions of multiple classifiers to make more robust and
volutional3. These layers apply filters to the input data, convolving them accurate decisions, leveraging the diverse perspectives of individual
over the input to detect different patterns and features. Each convolu­ models.
tional layer captures varying levels of abstraction from low-level to high-
level features. Following the First Level, the Second Level introduces
3.5. Ensemble bagging-based classification
MaxPooling1, MaxPooling2, and MaxPooling3. These layers perform
down-sampling on the feature maps generated by the convolutional
In this section, we present the details of our ensemble classification
layers, effectively reducing the spatial dimensions. This process retains
approach using a bagging-based technique. Bagging, short for Bootstrap
the most salient features while reducing computation and preventing
Aggregating, is a powerful ensemble learning method that combines
overfitting. The Third Level incorporates Batch Normalization layers -
multiple base classifiers to achieve improved predictive accuracy and
BatchNorm1, BatchNorm2, and BatchNorm3. Batch normalization
robustness. Fig. 4 depict our proposed ensemble classifier consists of
three base classifiers: SVM, KNN, and Random Forest. The bagging
Table 5 approach allows us to harness the diversity of these classifiers and
Model details and parameters.
leverage their unique decision-making capabilities for a more compre­
Model Details Parameters Details hensive classification.
Model Architecture Multi-level CNN
Model Input Input Layer 3.5.1. Base classifiers
First Level Conv1, Conv2, Conv3 In this subsection, we provide a brief overview of the base classifiers
Second Level MaxPooling1, MaxPooling2, MaxPooling3
used in our ensemble.
Third Level BatchNorm1, BatchNorm2, BatchNorm3
Fourth Level Dropout Layer
Fifth Level Fully Connected Layer (Dense Layer) 3.5.1.1. Support Vector Machine. SVM is a powerful and widely used
Total Number of 8 classifier for both binary and multi-class classification tasks. It seeks to
Layers
find an optimal hyperplane that maximally separates different classes in
Number of Depends on the specific configuration of the model (Number
Parameters of filters, kernel size, dropout rate, etc.) the feature space, making it highly effective in capturing complex de­
Learning Paradigm Supervised Learning cision boundaries. The best hyperplane selection and kernel trick is
Objective Function Categorical Cross-Entropy (for classification tasks) Mean applied using the below equation (1). Table 6 outlines the parameter
Squared Error (for regression tasks)
settings employed for training the SVM classification model.
Optimization Gradient Descent (e.g., Stochastic Gradient Descent)
Method m

Regularization Dropout f(x) = β0 + (αi yi )K(x, xi ) (1)
Activation ReLU, Softmax, Linear i=1
Functions
Loss Function Categorical CE-Loss
3.5.1.2. K-nearest neighbors (KNN). KNN is a non-parametric and

8
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

C

Entropy(S) = − pi log2 (pi ) (3)
i=1

where.

• H(x) is the entropy of the random variable X,


• P(xi ) is the probability of the event xi occurring,
• n is the total number od events/outcomes in the probability
distribution.
• log2 denotes the base 2 logarithm.
∑V
|Si |
Information Grain(S, A) = Entropy(S) − .Entropy(Si ) (4)
i=1
||S||

where.

• IG(S, A) is the information gain by splitting S on Feature A ,


• H(S) is the entropy of the original dataset S,
• Values (A) is the set of all possible values of feature A,
• |Sv | is the number of instances in S for which feature A has values v,
• |S| is the total number of instances in S,
• H(Sv ) is the entropy of the subset of S for which feature A has value v..

3.5.2. Bayesian averaging


Bayesian averaging is an ensemble learning technique used to
combine the predictions of multiple models in a principled and proba­
Fig. 4. Proposed Ensemble Classification followed by Bayesian Aver­ bilistic manner (Lipman et al., 2023). Unlike traditional averaging
aging Voting. methods, Bayesian averaging takes into account both the predictions of
the models and their uncertainties. It assigns weights to each model’s
prediction based on its performance and reliability, leading to a more
Table 6 robust and accurate ensemble prediction. The ensemble prediction is
SVM model training hyperparameters. obtained by calculating the weighted average of the individual model
Hyperparameters Description Possible Values predictions. The Bayesian averaging approach not only improves the
Kernel Type of kernel function used for RBF overall accuracy of the ensemble but also provides a measure of un­
non-linear separation certainty for the final prediction. This uncertainty estimation is valuable
C Regularization parameter Positive real values in situations where the reliability of predictions is critical. equation (5)
controlling the margin width
is used to make the final decision for ensemble learning classification
gamma Kernel coefficient for RBF Positive real values
degree Degree of the polynomial kernel Integer values model. This equation uses the predicted probabilities for all the classes
function and calculates the final class probabilities, the class with maximum
coef0 Independent term in the Real values probability is selected as the final class label for the test sample.
polynomial and sigmoid kernels
∑N Pi
class_weight Weight for each class in case of ’balanced’, dictionary of i=1 U2
imbalanced datasets class weights, None Bayesian Averaging = ∑N 1
i
(5)
tol Tolerance for stopping criterion Positive real values i=1 U2
i
in optimization
max_iter Maximum number of iterations Integer values
for solver convergence
3.6. Performance metrics

instance-based classifier (Sharma & Guleria, 2022). It classifies an The performance of the proposed model is evaluated using four key
instance by considering the class labels of its k-nearest neighbors in the metrics: accuracy, precision, recall, and F1-score. These metrics provide
feature space. KNN is intuitive, easy to implement, and well-suited for a comprehensive understanding of the model’s effectiveness in handling
multi-class classification tasks. equation (2) show the Euclidean distance various aspects of classification tasks. However, it’s important to note
between n data points, whereas minimum distance shows greater that the dataset exhibits class imbalance, meaning that not all classes
similarity. have an equal number of samples for model training. In such scenarios,
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ accuracy alone may not be the most informative metric, as it can be
Distance = (x2 − x1 )2 + (y2 − y1 )2 + …+(n2 − n1 )2 (2) biased towards the majority class. Instead, the F1-score, which balances
precision and recall, becomes crucial. The F1-score considers both false
3.5.1.3. Random forest. Random Forest is an ensemble learning method positives and false negatives, making it a robust measure for evaluating
based on decision trees (Shoaib et al., 2023c). It constructs multiple model performance, particularly in situations with imbalanced data
decision trees during training and combines their predictions through distributions. Therefore, in this study, while accuracy is still considered,
voting to arrive at the final classification. Random Forest is robust, the F1-score takes precedence as a more reliable indicator of the model’s
handles large datasets, and effectively deals with noisy data. The below overall performance.
equations (3) and (4) is used by the decision trees with the random forest
TP + TN
to select the nodes sequence and classify new instances based on the Accuracy = (6)
TP + FP + FN + TN
training data.

9
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

(Precision*Recall) 4.1. Performance analysis for Final Grade Prediction


F1 − Score = 2* (7)
(Precision + Recall)
In this subsection, we meticulously detail the outcomes of our
True Positive (TP) refers to instances correctly predicted as positive,
experimental investigations and provide a comprehensive performance
while True Negative (TN) denotes instances correctly predicted as
analysis of the model tailored to forecast students’ final grades, utilizing
negative. False Positive (FP) indicates instances incorrectly predicted as
the predefined class labels. Our aim is to shed light on the accuracy,
positive, and False Negative (FN) signifies instances incorrectly pre­
precision, and robustness of the predictive model, enabling a profound
dicted as negative. These metrics are essential in assessing classification
understanding of its efficacy in classifying students based on their final
model performance.
grade outcomes.
The experimental assessment encompasses a rigorous evaluation of
3.7. Hardware and simulation environment the model’s predictive capabilities against ground truth final grades. By
examining the achieved classification accuracy, precision, recall, and
To facilitate the implementation, simulation, and analysis of the F1-score, we ascertain the model’s proficiency in assigning correct final
proposed AI-based LMS and ensemble classifier, the research examines grade labels to students across diverse academic departments. Addi­
the hardware resources utilized in the study. The computational infra­ tionally, we employ ROC curves and AUC scores to gauge the model’s
structure incorporates a Core i7 9th gen processor, an RTX 2070 Super ability to differentiate between different grade categories effectively.
Nvidia card, and 32 GB of DDR4 RAM. These high-performance hard­ The confusion matrix, as depicted in Fig. 5, provides a comprehen­
ware components are complemented by the utilization of Python, Ten­ sive overview of the classification performance of the developed model
sorFlow, NumPy, Matplotlib, and Seaborn libraries, which collectively across multiple class labels. In this matrix, each row represents the true
offer a versatile and efficient environment for conducting experiments, or actual class, while each column corresponds to the predicted class
simulations, and data analysis. The research brings forth a comprehen­ assigned by the model. The cells of the matrix contain counts that
sive examination of the AI-based LMS and ensemble classifier, sub­ indicate the number of instances classified into a particular combination
stantiating their efficacy in enhancing educational support and of true and predicted classes. The diagonal elements of the matrix
predictive capabilities. Additionally, the combination of state-of-the-art represent instances where the true class and the predicted class match,
hardware and powerful software libraries ensures the successful indicating accurate predictions. For instance, the entry at the intersec­
implementation and evaluation of these cutting-edge models, signifying tion of the "A" row and "A" column (15) signifies that 15 instances from
their potential to revolutionize the realm of educational technology. class "A" were correctly classified as "A." Similarly, the cell where the
true class is "B+" and the predicted class is also "B+" (30) signifies those
4. Experimental results 30 instances from class "B+" were accurately classified.
On the other hand, the off-diagonal elements indicate instances of
The Experimental Results section presents a comprehensive evalua­ misclassification. For instance, the value at the intersection of the "A"
tion of the performance and effectiveness of the developed predictive
models for Final Grade, Risk of Dropout, and Progression. This empirical
analysis aims to provide a thorough understanding of the models’ ca­ Table 7
pabilities in accurately classifying students based on these critical at­ Distribution of Predicted vs. Actual Final Grades by Department.
tributes. Through rigorous experimentation and meticulous assessment, Class Accuracy Precision Recall F1-Score
this section unveils valuable insights into the predictive power, gener­
A 0.93 0.83 0.94 0.88
alization, and interpretability of each model. By examining their out­ B+ 0.93 0.83 0.94 0.88
comes, comparing their performances, and delving into their B 0.94 0.97 0.94 0.96
interpretability, this section contributes to a deeper comprehension of C+ 0.93 0.97 0.93 0.95
the models’ potential applications within educational contexts. The C 0.97 0.97 0.99 0.98
D+ 0.94 0.92 0.92 0.92
ensuing discussion will shed light on the models’ classification accuracy, D 0.88 0.94 0.86 0.90
robustness, feature importance, and real-world implications, further F 0.92 0.93 0.90 0.91
highlighting their significance in enhancing student support systems and Average 0.93 0.92 0.92 0.92
overall academic outcomes.

Fig. 5. Confusion Matrix of Proposed Grade Prediction ensemble model.

10
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

row and "B+" column (1) implies that 1 instance from class "A" was
misclassified as "B+." These misclassification counts provide insights
into the types of errors made by the model and help assess its perfor­
mance in distinguishing between different classes.
The performance analysis of the model demonstrates its effectiveness
in predicting final grades across diverse classes, as illustrated in Table 7.
Notably, the model demonstrates strong accuracy for most classes, with
scores ranging from 0.88 to 0.97. Precision and recall scores indicate the
model’s ability to make accurate positive predictions and capture actual
positive instances, respectively. Overall, the model maintains a balanced
F1-score, emphasizing its consistent predictive capabilities across
different academic categories. These results offer valuable insights into
the model’s performance and its potential to enhance educational
decision-making and outcomes.
Fig. 6 illustrates the performance analysis of the Grade Prediction
Model through a Box Plot, providing a visual representation of key Fig. 7. Proposed grade prediction model ROC-AUC curve.
metrics and insights into the model’s predictive capabilities.
The Receiver Operating Characteristic (ROC) curve is a significant
visual tool employed to evaluate and assess the performance of classi­ Table 8
fication models. In our study, we utilize the ROC curve to analyze the Class-wise performance analysis for Risk Prediction using the proposed model.
predictive accuracy of the grade prediction model for different classes. Class Accuracy Precision Recall F-Measure
The Area Under the Curve (AUC) values associated with each class
No Risk 0.97 0.81 0.84 0.83
provide an insightful measure of the model’s ability to discriminate Low 0.93 0.88 0.88 0.88
between positive and negative instances. As revealed by the AUC scores, Moderate 0.89 0.96 0.91 0.94
the grade prediction model demonstrates varying levels of discrimina­ High 0.94 0.91 0.92 0.92
tive power for different classes. Notably, classes B, C, and C+ exhibit Average 0.93 0.89 0.88 0.89
notably high AUC values of 0.96, 0.98, and 0.94, respectively, indicating
the model’s strong ability to distinguish between the corresponding
class labels. Class A, B+, D+, and F also demonstrate respectable AUC
values of 0.88, 0.89, 0.91, and 0.88, respectively, signifying their reli­
able predictive capabilities. The ROC curve’s visualization depicted in
Fig. 7, coupled with the AUC scores, provides a comprehensive assess­
ment of the model’s performance across different grade categories,
facilitating valuable insights for educators and stakeholders to make
informed decisions and interventions to optimize student outcomes.

4.2. Performance evaluation for Risk of Dropout Prediction

This subsection delves into an in-depth performance evaluation of


the predictive model specifically designed to assess the risk of dropout
for individual students. The analysis aims to ascertain the model’s pro­
ficiency in accurately categorizing students into distinct risk levels,
namely "No Risk," "Low," "Moderate," and "High." By examining the Fig. 8. Confusion Matrix of Proposed Student at Risk Prediction
model’s performance across these risk categories, we gain valuable ensemble model.

insights into its ability to identify students who are more susceptible to
dropout, thereby enabling timely interventions to mitigate potential
attrition. The details performance analysis of the dropout prediction
model can be seen in Table 8. In Fig. 8, the Confusion Matrix for the
Proposed Student at Risk Prediction ensemble model is presented. This
matrix provides values for True Positives (TP), False Positives (FP), False
Negatives (FN), and True Negatives (TN), crucial metrics used to
calculate the model’s performance, including accuracy, precision, recall,
and F1-score.
In the evaluation of our Risk Prediction ensemble model, as depicted
in Fig. 9 through a Box Plot, and further validated by the ROC-AUC
Curve presented in Fig. 10, we observe a comprehensive performance
analysis, providing valuable insights into the model’s robustness and
predictive accuracy.

4.3. Progression Prediction Model Assessment

In this subsection, we embark on a comprehensive evaluation of the


progression prediction model, aiming to discern the model’s efficacy
Fig. 6. Performance Analysis using the Box Plot for Grade Prediction Model. and resilience in foreseeing the academic trajectory of students. The

11
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

Fig. 9. Performance Analysis using the Box Plot for Risk Prediction ensemble model.

Fig. 10. Proposed Risk Prediction ensemble model ROC-AUC Curve.

focus centers on gauging the precision and dependability of the model’s


predictions, particularly in distinguishing between two pivotal out­
comes: student retention and the potential occurrence of dropout. With
the class labels of "Retention" and "Dropout," our analysis delves into the
model’s prowess in accurately categorizing students into these funda­
mental trajectories. By meticulously scrutinizing the model’s perfor­
mance against these crucial benchmarks, we gain valuable insights into
its ability to anticipate and flag students who are likely to continue their
academic journey and those who may be at risk of discontinuing their
studies.
Throughout this evaluation, where the results can be seen in Table 9,
Fig. 11. Confusion matrix of proposed student progression prediction
model assessment.
Table 9
Class-wise performance analysis for the proposed Progression Prediction Model. we shed light on the accuracy metrics and robustness indicators that
Class Accuracy Precision Recall F-Measure underpin the progression prediction model’s predictive prowess. In
assessing the efficacy of our Student Progression Prediction Model, the
Retention 0.92 0.85 0.90 0.87
Dropout 0.92 0.95 0.92 0.93
Confusion Matrix presented in Fig. 11 offers a detailed breakdown of
Average 0.92 0.90 0.91 0.90 model performance, aiding in the nuanced understanding of

12
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

classification outcomes. Our exploration unveils the intricate interplay


between the model’s predictive power and the complex dynamics un­
derlying student progression. By scrutinizing the performance outcomes
against the backdrop of real-world data, we glean a deeper under­
standing of the model’s role in facilitating informed decision-making
and proactive interventions to enhance student outcomes and aca­
demic success.
In Fig. 12, a Box Plot is employed for the performance analysis of our
Progression Prediction Model, while Fig. 13 showcases the ROC-AUC
Curve, providing a comprehensive evaluation of the model’s predic­
tive capabilities. These visualizations serve as valuable references in our
assessment of the proposed model’s performance.

4.4. Deployment of AI module

The deployment of the proposed AI Student Success Predictor marks Fig. 13. Proposed progression prediction model ROC-AUC curve.
a significant advancement in enhancing the efficiency and effectiveness
of the university’s campus management system. This AI-powered system management. By seamlessly connecting with the university’s relational
is seamlessly integrated with the university’s relational database, database and harnessing the capabilities of ensemble model training,
forming a cohesive ecosystem for data-driven decision-making. By this AI module empowers educational institutions to proactively address
accessing and analyzing relevant student subject data, the AI module academic challenges, optimize student outcomes, and enrich the overall
employs a sophisticated ensemble model trained on comprehensive learning experience.
learning datasets. At its core, the AI module harnesses the power of
machine learning algorithms to predict not only student grades but also 4.4.1. Early Semester Grade, risk, and progression prediction
two crucial aspects: Risk Prediction and Progression Prediction. In this scenario, the AI Student Success Predictor is deployed at the
Leveraging the ensemble model’s learning capabilities, the system pro­ start or mid of the semester, prior to the final exams. The system is
vides insights into students’ academic performance, enabling timely seamlessly integrated with the university’s campus management system,
interventions to mitigate potential risks and enhance overall retention leveraging its relational database to provide holistic insights into stu­
rates. dent performance. This encompasses early grade prediction, risk
One of the notable features of this deployment is its platform inde­ assessment, and progression outlook.
pendence. The AI Student Success Predictor is designed to seamlessly Fig. 14 depict Early Semester Grade, Risk, and Progression Prediction
integrate with various relational databases, making it adaptable to the using the integrated proposed model in the university’s Learning Man­
diverse technological landscapes of educational institutions. This agement System (LMS). Panel (a) features a green background indi­
adaptability ensures that universities utilizing different database sys­ cating students at no risk and retention, (b) showcases an orange
tems can harness the benefits of predictive analytics and proactive stu­ background signifying students at risk but with retention in the current
dent support offered by the AI module. In practice, as students engage semester, and (c) exhibits a red background denoting students at high
with the campus management system, the AI module orchestrates a risk of dropout in the current semester. The AI system accesses the
seamless exchange of information. Relevant student data, including campus management database to extract the student’s enrolled courses,
subject information and academic history, is retrieved from the data­ forming the basis for prediction. For each enrolled course, the system
base. By processing this data through the ensemble model, the AI system identifies prerequisite courses and retrieves the student’s grades in these
generates accurate predictions for student grades, risk of dropout, and subjects. Utilizing an ensemble of historical data, past grades, and
progression status. These predictions equip educators and administra­ interaction patterns, the AI model predicts the student’s upcoming
tors with actionable insights to provide personalized guidance and grades, risk level (no risk, low, moderate, high), and progression forecast
support to students, fostering an environment conducive to academic (retention or dropout). Predicted grades, risk assessment, and progres­
success. sion outlook are presented to both the student and the university,
The deployment of the AI Student Success Predictor represents a enabling tailored interventions and academic support.
pivotal step towards data-driven and student-centric education
4.4.2. Validation of Predicted Performance and Early Warning
In this scenario, the AI system’s predictions are rigorously tested
against the student’s actual outcomes, serving as a vital checkpoint to
assess its accuracy and reliability in real-world conditions.
As the semester concludes and final grades are available, the AI
system retrieves the student’s actual course grades, which provide a
baseline for comparison. The predicted performance, encompassing
grades, risk categorization, and progression forecast, is matched against
the actual outcomes to gauge the AI model’s efficacy. The comparison
not only validates the predictive accuracy but also identifies instances
where early risk and progression alerts could have prompted timely
interventions. The university gains actionable insights into the AI
model’s effectiveness, fostering continuous enhancement of student
success initiatives and data-driven strategies. In Fig. 15, the Validation
of Predicted Performance and Early Warning using the integrated model
in the university’s LMS is demonstrated. Panel (a) represents instances
Fig. 12. Performance Analysis using the Box Plot for Progression Predic­ of no risk, (b) signifies risks with retention, and (c) denotes high-risk
tion Model. scenarios of dropout. This visualization provides a valuable reference

13
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

Fig. 14. Early Semester Grade, Risk, and Progression Prediction using the integrated model in the university’s LMS. (a) Green: No risk, (b) Orange: Risk with
retention, (c) Red: High risk of dropout.

14
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

Fig. 15. Validation of Predicted Performance and Early Warning using the integrated model in the university’s LMS. (a) Green: No risk, (b) Orange: Risk with
retention, (c) Red: High risk of dropout.

in our discussion on the model’s performance assessment. but also enables early risk identification and progression anticipation.
These scenarios illuminate the multifaceted capabilities of the AI Seamlessly integrated with the university’s database application, this
Student Success Predictor. By harnessing the power of data and pre­ AI-driven solution empowers educators to proactively enhance student
dictive analytics, the system not only forecasts academic performance outcomes, ensure timely support, and foster a culture of continuous

15
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

improvement. centric education. By marrying the power of predictive analytics with


established academic infrastructure, the AI Student Success Predictor
4.5. Discussion facilitates data-driven decision-making, early intervention strategies,
and personalized educational experiences. As technology and education
The development of the proposed AI Student Success Predictor is a converge, this integration heralds a future where AI-driven insights are
significant stride in revolutionizing the educational landscape. Designed harnessed to foster student success, enhance learning outcomes, and
with the power of Python, this advanced system seamlessly integrates shape the next generation of educational excellence.
with the existing Oracle relational database, enriching the university’s
Campus Management System (CMS) with predictive analytics and in­ 4.6. Limitations of the study
sights. This integration serves as a pivotal bridge between cutting-edge
technology and established educational infrastructure. The integration of an AI-based Learning Management System (LMS)
into the university environment presented challenges in compatibility,
4.5.1. Leveraging python for advanced insights user acceptance, scalability, and sustainability. Addressing these chal­
At the heart of this innovation lies Python, a versatile and powerful lenges required an iterative refinement process based on user feedback.
programming language renowned for its wide array of libraries and Thorough testing and debugging ensured compatibility with existing
frameworks that cater to data analysis, machine learning, and artificial systems, while user training programs facilitated adoption. Scalability
intelligence. Python’s inherent flexibility allows the AI Student Success and sustainability were maintained through continuous monitoring and
Predictor to efficiently process, analyze, and model complex student optimization. User feedback, gathered through surveys and testing ses­
data, including enrollment records, course details, grading, and marks sions, guided iterative refinements to the LMS interface and function­
data. ality, ensuring it met the evolving needs of faculty and students.
Through this iterative approach, the AI-based LMS integration success­
4.5.2. Integration with oracle relational database fully enhanced teaching and learning outcomes within the university
The proposed AI system seamlessly interfaces with the university’s environment.
Oracle relational database, extracting relevant data to fuel its predictive
algorithms. This integration enables the AI system to draw upon 5. Conclusion
comprehensive historical data, ensuring that predictions are grounded
in the context of each student’s academic journey. This database acts as This research revolutionizes traditional CMSs through the introduc­
the foundation for informed predictions, catering to the unique char­ tion of an innovative AI Student Success Predictor. By harnessing
acteristics and patterns of the student body. advanced machine learning algorithms, the study not only streamlines
grading processes but also proactively identifies student risks and an­
4.5.3. Holistic data analysis and prediction ticipates pivotal decisions such as retention or dropout outcomes. The
Upon extracting data from the Oracle database, the AI system per­ meticulous curation of a standardized dataset, coupled with the intro­
forms in-depth data analysis and employs machine learning techniques duction of a groundbreaking CNN feature learning block and the
to predict various facets of student success. This encompasses early- development of an ensemble classification model, featuring SVM,
grade prediction, risk assessment, and progression forecasts. By Random Forest, and KNN classifiers, showcases the pioneering nature of
analyzing past performance, student interactions, and prerequisite this research. Moreover, the practical integration of the AI-based system
course grades, the AI model generates accurate predictions that provide into the existing CMS infrastructure marks a pivotal advancement. This
invaluable insights into potential outcomes. seamless integration empowers administrators, teaching faculty, and
students with real-time access to predictive insights. The machine
4.5.4. Targeted dissemination of insights learning predictive model becomes an integral component of the LMS,
The outcomes of the AI analysis are seamlessly integrated into the offering a personalized and dynamic educational experience. Adminis­
existing CMS, offering a comprehensive view to students, administrative trators gain a proactive tool for decision-making, teaching faculty obtain
staff, and teaching faculty. Through intuitive interfaces, students gain valuable insights into student risks, and students themselves are
access to early grade predictions, enabling them to make informed de­ equipped with the knowledge to identify and improve areas of academic
cisions and take timely corrective actions if necessary. Meanwhile, ad­ challenge. The rigorous evaluation of the tripartite model using unseen
ministrators and teaching staff can leverage these predictions to identify test data attests to its commendable accuracy, achieving 93% for both
students at risk and tailor interventions to meet individual needs. student grade prediction and risk assessment. Furthermore, the model
demonstrates a robust 92% accuracy in predicting the intricate domain
4.5.5. Empowering student success of retention and dropout outcomes. These compelling results underscore
The proposed system transforms the way educational institutions the potential of the AI Student Success Predictor to not only enhance
engage with student data, shifting from reactive measures to proactive academic decision-making but also empower students and educators
interventions. By predicting student grades, risk levels, and progression alike. The system’s ability to provide valuable insights positions it as a
trajectories, the AI-powered CMS contributes to a personalized educa­ catalyst for educational excellence, fostering a proactive approach to
tional experience that nurtures student success. This, in turn, enhances student success and institutional effectiveness.
student retention rates, academic achievements, and overall
satisfaction. CRediT authorship contribution statement

4.5.6. Driving continuous improvement Muhammad Shoaib: Writing – original draft, Software, Methodol­
The integration of AI-driven insights with the existing CMS opens ogy, Investigation, Data curation, Conceptualization. Nasir Sayed:
new avenues for continuous improvement. The university gains access Writing – original draft, Methodology, Investigation, Formal analysis,
to data-driven decision-making, facilitating the refinement of academic Data curation, Conceptualization. Jaiteg Singh: Writing – review &
programs, support services, and teaching methodologies. The iterative editing, Writing – original draft, Methodology, Investigation. Jana
feedback loop formed by this integration ensures that the system evolves Shafi: Writing – review & editing, Visualization, Validation, Resources,
alongside the needs of the student body and educational objectives. Methodology, Conceptualization. Shakir Khan: Writing – review &
The integration of Python-based AI capabilities with the Oracle editing, Writing – original draft, Methodology, Investigation, Data
relational database presents a transformative leap in advancing student- curation, Conceptualization. Farman Ali: Writing – original draft,

16
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

Supervision, Methodology, Investigation, Conceptualization. Gare, A. (2023). Challenging the dominant grand narrative in global education and
culture. In Field environmental philosophy: Education for biocultural conservation (pp.
309–326). Springer.
Declaration of competing interest Gašević, D., Dawson, S., Rogers, T., & Gasevic, D. (2016). Learning analytics should not
promote one size fits all: The effects of instructional conditions in predicting
The authors declare that they have no conflict of interest. academic success. The Internet and Higher Education, 28, 68–84. https://doi.org/
10.1016/j.iheduc.2015.10.002
Statement Gauss, C. F., Davis, C. H., & Project, M. of A. (1857). Theory of the motion of the heavenly
Authors used ChatGPT, an AI language model to improve readability bodies moving about the sun in conic sections a translation of Gauss’s “Theoria motus.”
and language quality of the manuscript. After using this service, authors with an appendix. Boston: Little, Brown and company.
Gitinabard, N., Xu, Y., Heckman, S., Barnes, T., & Lynch, C. F. (2019). How widely can
reviewed and edited the content as needed and take full responsibility prediction models Be generalized? Performance prediction in blended courses. IEEE
for the content of the publication. Trans. Learn. Technol., 12(2), 184–197. https://doi.org/10.1109/TLT.2019.2911832
He, J., Baileyt, J., Rubinstein, B. I. P., & Zhang, R. (2015). Identifying at-risk students in
massive open online courses. Proc. Natl. Conf. Artif. Intell., 3, 1749–1755. https://doi.
Data availability org/10.1609/aaai.v29i1.9471
Ho, T. K. (1995). Random decision forests. Proceedings of 3rd International Conference on
Data will be made available on request. Document Analysis and Recognition, 1, 278–282. https://doi.org/10.1109/
ICDAR.1995.598994
Hone, K. S., & El Said, G. R. (2016). Exploring the factors affecting MOOC retention: A
References survey study. Computer Education, 98, 157–168. https://doi.org/10.1016/j.
compedu.2016.03.016
Abdullah, M., Alqahtani, A., Aljabri, J., Altowirgi, R., & Fallatah, R. (2015). Learning style II, I., & Bower, B. (2011). Student characteristics that predict persistence in community
classification based on student’s behavior in Moodle learning management system. college online courses. Amer. Jrnl. Distance Educ., 25, 178–191. https://doi.org/
https://doi.org/10.14738/tmlai.31.868 10.1080/08923647.2011.590107
Aldoseri, A., Al-Khalifa, K. N., & Hamouda, A. M. (2023). Re-thinking data strategy and Ismail, H., Hussein, N., Harous, S., & Khalil, A. (2023). Survey of personalized learning
integration for artificial intelligence: Concepts, opportunities, and challenges. software systems: A taxonomy of environments, learning content, and user models.
Applied Science, 13(12). https://doi.org/10.3390/app13127082 Education in Science, 13(7), 741.
Allen, K.-A., Slaten, C. D., Arslan, G., Roffey, S., Craig, H., & Vella-Brodrick, D. A. (2021). Item, J. (2015). OU analyse : Analysing at - risk students at the open.
School belonging: The importance of student and teacher relationships. In The Jayaprakash, S. M., & Lauria, E. J. M. (2014a). Open academic early alert system:
Palgrave handbook of positive education (pp. 525–550). Cham: Springer International Technical demonstration. In Proceedings of the Fourth international conference on
Publishing. learning analytics and knowledge (pp. 267–268).
Alshurideh, M., Al Kurdi, B., Salloum, S. A., Arpaci, I., & Al-Emran, M. (2023). Predicting Jayaprakash, S. M., Laur\’\ia, E. J. M., Gandhi, P., & Mendhe, D. (2016). Benchmarking
the actual use of m-learning systems: A comparative approach using PLS-SEM and student performance and engagement in an early alert predictive system using
machine learning algorithms. Interactive Learning Environments, 31(3), 1214–1228. interactive radar charts. In Proceedings of the sixth international conference on learning
Aminizadeh, S., Heidari, A., Toumaj, S., Darbandi, M., Navimipour, N. J., Rezaei, M., … analytics \& knowledge (pp. 526–527).
Unal, M. (2023). The applications of machine learning techniques in medical data Jayaprakash, S., Moody, E., Lauria, E., Regan, J., & Baron, J. (2014b). Early alert of
processing based on distributed computing and the internet of things. Computer academically at-risk students: An open source analytics initiative. J. Learn. Anal., 1,
Methods and Programs in Biomedicine, 107745. 6–47. https://doi.org/10.18608/jla.2014.11.3
Aulakh, K., Roul, R. K., & Kaushal, M. (2023). E-Learning enhancement through Jones-Khosla, L. A., & Gomes, J. F. S. (2023). Purpose: From theory to practice. Global
educational data mining with covid-19 outbreak period in backdrop: A review. Business and Organizational Excellence, 43(1), 90–103.
International Journal of Educational Development, Article 102814. Kemp, F. (2003). Applied multiple regression/correlation analysis for the behavioral sciences.
Lauria, E. J. M., Baron, J. D., Devireddy, M., Sundararaju, V., & Jayaprakash, S. M. Oxford University Press.
(2012). Mining academic data to improve college student retention: An open source Kew, S. N., & Tasir, Z. (2022). Developing a learning analytics intervention in e-learning
perspective. In Proceedings of the 2nd international conference on learning analytics and to enhance students’ learning performance: A case study. Education and Information
knowledge (pp. 139–142). Technologies, 27(5), 7099–7134.
Bhatt, R. (2023). An analytical review of deep learning algorithms for stress prediction in Laparra, V., Pérez-Suay, A., Piles, M., Muñoz-Marí, J., Amorós, J., Fernandez-Moran, R.,
teaching professionals. Innov. Eng. with AI Appl., 23–39. … Adsuara, J. E. (2023). Assessing the impact of using short videos for teaching at
Burgos, C., Campanario, M., Peña, D., Lara, J., Lizcano, D., & Martínez, M. (2017). Data higher education: Empirical evidence from log-files in a learning management
mining for modeling students’ performance: A tutoring action plan to prevent system. IEEE Rev. Iberoam. Tecnol. del Aprendiz., 1. https://doi.org/10.1109/
academic dropout. Computers & Electrical Engineering, 66. https://doi.org/10.1016/j. RITA.2023.3301411
compeleceng.2017.03.005 Legendre, A. M. (1805). Nouvelles méthodes pour la détermination des orbites des comètes. F.
Chaplot, D. S., Rhim, E., & Kim, J. (2015). Predicting student attrition in MOOCs using Didot.
sentiment analysis and neural networks (Vol. 1432, pp. 7–12). CEUR Workshop Proc.. Lipman, E., Moser, S., & Rodriguez, A. (2023). Explaining differences in voting patterns
Coldwell, J., Craig, a, Paterson, T., & Mustard, J. (2008). Online students: Relationships across voting domains using hierarchical bayesian models (Vol. 2016) [Online].
between participation, demographics and academic performance. Electron. J. e- Available: http://arxiv.org/abs/2312.15049.
Learning, 6(1), 19–28 [Online]. Available: http://iucontent.iu.edu.sa/Scholars/Infor Luna, J. M., Castro, C., & Romero, C. (2017). MDM tool: A data mining framework
mationTechnology/OnlineStudentsRelationshipsbetweenParticipation,Demograph integrated into Moodle. Computer Applications in Engineering Education, 25(1),
icsandAcademicPerformance.pdf. 90–102. https://doi.org/10.1002/cae.21782
Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2017). Predicting student Macfadyen, L. P., & Dawson, S. (2010). Mining LMS data to develop an ‘early warning
performance from LMS data: A comparison of 17 blended courses using Moodle LMS. system’ for educators: A proof of concept. Computer Education, 54(2), 588–599.
IEEE Trans. Learn. Technol., 10(1), 17–29. https://doi.org/10.1109/ https://doi.org/10.1016/j.compedu.2009.09.008
TLT.2016.2616312 Maguvhe, M. O. (2023). Supporting students experiencing barriers to learning in
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), inclusive education settings: A critical requirement for educational success. In Using
273–297. https://doi.org/10.1007/BF00994018 african epistemologies in shaping inclusive education knowledge (pp. 375–393). Springer.
Cox, D. R. (1958). The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B Stat. Márquez-Vera, C., Cano, A., Romero, C., Noaman, A. Y. M., Mousa Fardoun, H., &
Methodol., 20(2), 215–232. Ventura, S. (2016). Early dropout prediction using data mining: A case study with
S. Dawson, “The concept of personalized and adaptive learning has long been touted but high school students. Expert Systems, 33(1), 107–124. https://doi.org/10.1111/
seldom enacted in education at scale. Data Analytics and Adaptive Learning brings exsy.12135
together a compelling set of experts that provide novel and research-informed Moreno-Marcos, P. M., Laet, T. D., Muñoz-Merino, P. J., Van Soom, C., Broos, T.,
insights into contem.”. Verbert, K., & Kloos, C. D. (2019). Generalizing predictive models of admission test
Doolittle, P. E., & Camp, W. G. (1999). Constructivism: The career and technical success based on online interactions. Sustainable Times, 11(18), 1–19. https://doi.
education perspective. Journal of Vocational and Technical Education, 16(1), 23–46. org/10.3390/su11184940
Dougiamas, M., & Taylor, P. C. (2000). Improving the effectiveness of tools for Internet-based Morris, L., Finnegan, C., & Wu, S.-S. (2005). Tracking student behavior, persistence, and
education. achievement in online courses. The Internet and Higher Education, 8, 221–231.
Dougiamas, M., & Taylor, P. (2002). Interpretive analysis of an internet-based course https://doi.org/10.1016/j.iheduc.2005.06.009
constructed using a new courseware tool called Moodle. 2nd Conf. HERDSA (The Oldehinkel, A. J., & Ormel, J. (2023). Annual research review: Stability of
High. Educ. Res. Dev. Soc. Australas., 1–9 [Online]. Available: http://online.dimitra. psychopathology: Lessons learned from longitudinal population surveys. Journal of
gr/sektrainers/file.php/1/MartinDougiamas.pdf. Child Psychology and Psychiatry, 64(4), 489–502.
Dragulescu, B., Bucos, M., & Vasiu, R. (2015). Cvla: Integrating multiple analytics Palmer, S. (2013). Modelling engineering student academic performance using academic
techniques in a custom Moodle report (Vol. 538). analytics. International Journal of Engineering Education, 29, 132–138.
Fei, M., & Yeung, D.-Y. (2015). Temporal models for predicting student dropout in Przegalinska, A., & Jemielniak, D. (2023). Strategizing AI in business and education:
massive open online courses. In 2015 IEEE international conference on data mining Emerging technologies and business strategy.
workshop (ICDMW) (pp. 256–263). https://doi.org/10.1109/ICDMW.2015.174 Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
https://doi.org/10.1007/BF00116251

17
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301

Rish, I. (2001). An empirical study of the naïve Bayes classifier. IJCAI 2001 Work Empir Shoaib, M., Shah, B., Sayed, N., Ali, F., Ullah, R., & Hussain, I. (2023c). Deep learning for
Methods Artif Intell, 3. plant bioinformatics: An explainable gradient-based approach for disease detection.
Romero, C., Ventura, S., & García, E. (2008). Data mining in course management Frontiers of Plant Science, 14(October), 1–17. https://doi.org/10.3389/
systems: Moodle case study and tutorial. Computer Education, 51(1), 368–384. fpls.2023.1283235
https://doi.org/10.1016/j.compedu.2007.05.016 Shu, X., & Ye, Y. (2023). Knowledge Discovery: Methods from data mining and machine
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and learning. Social Science Research, 110, Article 102817.
organization in the brain. Psychological Review, 65(6), 386–408. https://doi.org/ Singh, J., Ali, F., Gill, R., Shah, B., & Kwak, D. (2023). A survey of EEG and machine
10.1037/h0042519. American Psychological Association, US. learning-based methods for neural rehabilitation. IEEE Access, 11, 114155–114171.
Rothes, A., Lemos, M. S., & Gonçalves, T. (2022). The influence of students’ self- https://doi.org/10.1109/ACCESS.2023.3321067
determination and personal achievement goals in learning and engagement: A Starbuck, C. (2023). Data preparation. In The fundamentals of people analytics: With
mediation model for traditional and nontraditional students. Education in Science, 12 applications in R (pp. 79–95). Cham: Springer International Publishing.
(6), 369. Staver, J. R. (1998). Constructivism: Sound theory for explicating the practice of science
The Royal Society is collaborating with JSTOR to digitize, preserve, and extend access to and science teaching. J. Res. Sci. Teach. Off. J. Natl. Assoc. Res. Sci. Teach., 35(5),
Philosophical Transactions (1683-1775). ® www.jstor.org,” Bone, p. 1775. 501–520.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by Talebi, K., Torabi, Z., & Daneshpour, N. (2023). Ensemble models based on CNN and
back-propagating errors. Nature, 323(6088), 533–536. LSTM for dropout prediction in MOOC. Expert Systems with Applications, Article
Saadati, Z., Zeki, C. P., & Vatankhah Barenji, R. (2023). On the development of 121187.
blockchain-based learning management system as a metacognitive tool to support Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent
self-regulation learning in online higher education. Interactive Learning Environments, research. Review of Educational Research, 45(1), 89–125. https://doi.org/10.3102/
31(5), 3148–3171. 00346543045001089
Salah, M., Al Halbusi, H., & Abdelfattah, F. (2023). May the force of text data analysis be Ullah, W., Ullah, A., Haq, I. U., Muhammad, K., Sajjad, M., & Baik, S. W. (2021). CNN
with you: Unleashing the power of generative AI for social psychology research. features with bi-directional LSTM for real-time anomaly detection in surveillance
Comput. Hum. Behav. Artif. Humans, Article 100006. networks. Multimedia Tools and Applications, 80(11), 16979–16995. https://doi.org/
Sharma, S., & Guleria, K. (2022). A deep learning based model for the detection of 10.1007/s11042-020-09406-3
pneumonia from chest X-ray images using VGG-16 and neural networks. Procedia Wang, J., Xu, Z., Zheng, X., & Liu, Z. (2022). A fuzzy logic path planning algorithm based
Computer Science, 218, 357–366. https://doi.org/10.1016/j.procs.2023.01.018 on geometric landmarks and kinetic constraints. Information Technology and Control,
Shoaib, M., et al. (2022a). A deep learning-based model for plant lesion segmentation , 51(3), 499–514. https://doi.org/10.5755/j01.itc.51.3.30016
subtype identi fi cation , and survival probability estimation (pp. 1–15). https://doi.org/ Xing, W., Guo, R., Petakovic, E., & Goggins, S. (2015). Participation-based student final
10.3389/fpls.2022.1095547. December. performance prediction model through interpretable Genetic Programming:
Shoaib, M., Shah, B., Hussain, T., Yang, B., Ullah, A., Khan, J., & Ali, F. (2023a). A deep Integrating learning analytics, educational data mining and theory. Computers in
learning-assisted visual attention mechanism for anomaly detection in videos. Human Behavior, 47, 168–181. https://doi.org/10.1016/j.chb.2014.09.034
Multimedia Tools and Applications. , Article 0123456789. https://doi.org/10.1007/ Yadusky, K., Kheang, S., & Hoggan, C. (2021). Helping underprepared students succeed:
s11042-023-17770-z Minimizing threats to identity. Community College Journal of Research and Practice, 45
Shoaib, M., Shah, B., EI-Sappagh, S., Ali, A., Ullah, A., Alenezi, F., Gechev, T., (6), 423–436.
Hussain, T., & Ali, F. (2023b). An advanced deep learning models-based plant Zacharis, N. (2015). A multivariate approach to predicting student outcomes in web-
disease detection: A review of recent research. Frontiers of Plant Science, 14(March), enabled blended learning courses. The Internet and Higher Education, 27. https://doi.
1–22. https://doi.org/10.3389/fpls.2023.1158933 org/10.1016/j.iheduc.2015.05.002
Shoaib, M., Hussain, T., Shah, B., & Park, S. H. (2022b). Deep learning-based segmentation Zhang, X., Feng, Z., & Zhang, X. (2023). On reachable set problem for impulse switched
and classi fi cation of leaf images for detection of tomato plant disease (pp. 1–18). singular systems with mixed delays. IET Control Theory & Applications, 17(5),
https://doi.org/10.3389/fpls.2022.1031748. October. 628–638. https://doi.org/10.1049/cth2.12390
Shoaib, M., Sayed, N., Amara, N., Latif, A., Azam, S., & Muhammad, S. (2022c). Zheng, L., Wang, C., Chen, X., Song, Y., Meng, Z., & Zhang, R. (2023). Evolutionary
Prediction of an educational institute learning environment using machine learning machine learning builds smart education big data platform Data-driven higher
and data mining. Education and Information Technologies, (123456789)https://doi. education. Applied Soft Computing, 136, Article 110114.
org/10.1007/s10639-022-10970-4

18

You might also like