AI Student
AI Student
A R T I C L E I N F O A B S T R A C T
Handling editor: Paul Kirschner Campus Management Systems (CMSs) are vital tools in managing educational institutions, handling tasks like
student enrollment, scheduling, and resource allocation. The increasing adoption of CMS for online and mixed-
Keywords: learning environments highlights their importance. However, inherent limitations in conventional CMS plat
Learning management system forms hinder personalized student guidance and effective identification of academic challenges. Addressing this
Predictive analytics
crucial gap, our study introduces an AI Student Success Predictor empowered by advanced machine learning
Machine learning
algorithms, capable of automating grading processes, predicting student risks, and forecasting retention or
AI-Driven educational systems
Student success prediction and educational dropout outcomes. Central to our approach is the creation of a standardized dataset, meticulously curated by
technology integrating student information from diverse relational databases. A Convolutional Neural Network (CNN)
feature learning block is developed the extract the hidden patterns in the student data. This classification model
stands as an ensemble masterpiece, incorporating SVM, Random Forest, and KNN classifiers, subsequently
refined by a Bayesian averaging model. The proposed ensemble model shows the ability to predict the student
grades, retention, and risk levels of dropout. The accuracy achieved by the proposed model is assessed using test
data, culminating in a commendable 93% accuracy for student grade prediction and student risk prediction, and
a solid 92% accuracy for the complex domain of retention and dropout forecasting. The proposed AI system
seamlessly integrates with existing CMS infrastructure, enabling real-time data retrieval and swift, accurate
predictions, enhancing academic decision-making efficiency. Our study’s pioneering AI Student Success Pre
dictor bridges the chasm between traditional CMS limitations and the growing demands of modern education.
1. Introduction can also provide a feeling of purpose and meaning in life. In addition to
personal benefits, education also has societal advantages (Jones-Khosla
Education is essential to everyone’s existence because it enables the & Gomes, 2023). Education is crucial for the economic development and
development of the knowledge, skills, and values required for success in progress of a nation, as it fosters a more knowledgeable and involved
life. Education equips individuals with the knowledge, skills, and re population. This can lead to stronger economies and more successful
sources necessary to make enlightened decisions, solve issues, and civilizations. Overall, education is a crucial component of a successful
contribute to their communities (Maguvhe, 2023). It helps people and full life, and it is necessary for the growth of both individuals and
comprehend their place in the world and the world around them, and it civilizations (Gare, 2023).
* Corresponding author.
** Corresponding author.
E-mail addresses: [email protected] (M. Shoaib), [email protected] (N. Sayed), [email protected] (J. Singh), [email protected]
(J. Shafi), [email protected] (S. Khan), [email protected], [email protected] (F. Ali).
https://doi.org/10.1016/j.chb.2024.108301
Received 22 February 2024; Received in revised form 22 April 2024; Accepted 12 May 2024
Available online 13 May 2024
0747-5632/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
Retention is the capacity of a student to remain enrolled in an understanding of student learning and in identifying ways to enhance
educational institution or program. In contrast, achievement refers to a the efficacy of educational programs. A learning management system
student’s academic development and the achievement of their educa (LMS) is a software application or web-based technology that is used to
tional objectives. There are a lot of elements that can influence both organize, implement, and evaluate a particular learning process. A
student retention and achievement (Rothes et al., 2022; Yadusky et al., learning management system often allows an instructor to develop and
2021). Academically unprepared students may struggle to complete deliver course content, monitor student progress, and manage student
their coursework and may be more prone to drop out (Academic pre interactions (Laparra et al., 2023).
paredness). Financial obstacles, such as the expense of tuition or the The core problem we aim to address revolves around the limitations
necessity to work while attending school, may make it more difficult for of traditional Campus Management Systems (CMSs) in providing
students to remain enrolled and make academic progress (Financial personalized guidance to students and effectively identifying and miti
Considerations). Students who feel supported by their professors, gating potential academic challenges. Conventional CMSs often struggle
friends, and school community are more likely to fulfil their academic to adapt to the dynamic landscape of online and mixed-learning envi
goals and remain in school (Allen et al., 2021). Students with personal or ronments, hindering their ability to meet the diverse needs of students
social difficulties, such as mental health concerns or family conflicts, across different academic departments and demographics. Our study
may find it difficult to concentrate on their schoolwork and may be more seeks to bridge this gap by introducing an AI Student Success Predictor
likely to drop out. Schools and educational programs may provide ac empowered by advanced machine learning algorithms. By leveraging
ademic support services, such as tutoring and study skills workshops, student data from various relational databases, our approach aims to
and financial aid or other resources to help students overcome obstacles automate grading processes, predict student risks, and forecast retention
to success in order to increase retention and achievement. Moreover, or dropout outcomes. Through the integration of AI-based systems into
providing a supportive and inclusive learning atmosphere can help kids existing CMS infrastructure, we endeavor to enhance the efficiency and
feel more connected to their school and increase their success motiva efficacy of academic decision-making, ultimately revolutionizing the
tion (Oldehinkel & Ormel, 2023). educational landscape and meeting the growing demands of modern
Computer science and machine learning are applied sciences that education.
have significantly influenced the evolution of educational technologies An artificial intelligence (AI) integrated Campus Management Sys
(Alshurideh et al., 2023). Computer science is the study of computers tem (CMS) serves as a dynamic educational platform that elevates the
and computational systems, and it involves a vast array of topics, such as realms of teaching and learning. Powered by a spectrum of AI technol
algorithms, data structures, programming languages, and computer ogies, encompassing machine learning, natural language processing,
hardware. It is essential to the development of educational technologies and predictive analytics, the CMS orchestrates the automation and
because it provides the fundamental knowledge and tools required to enhancement of diverse aspects within the educational landscape (Salah
design and implement educational software and hardware systems et al., 2023). This AI-enhanced CMS embarks on the creation of tailored
(Aminizadeh et al., 2023). Machine learning is a subject of computer learning paths for each student, leveraging their strengths, weaknesses,
science that entails the creation of algorithms and models that can learn and preferences, thus optimizing their educational journey (Dawson).
from data and make inferences based on that data. In the context of Harnessing the prowess of predictive analytics, the AI-infused CMS
educational technologies, machine learning can be used to evaluate effortlessly adapts test complexities based on a student’s performance.
student data to personalize learning experiences and offer students This intelligent system encompasses automated grading, offering edu
real-time feedback. The convergence of computer science and machine cators respite from routine assessments and enabling them to focus on
learning with educational technologies has resulted in the creation of more strategic educational facets. One of the significant considerations
novel and effective teaching and learning tools (Zheng et al., 2023). in adopting an AI model for educational data mining is the alignment of
Learning analytics is the collecting, analysis, and reporting of data on the model with the unique characteristics of the data (Shoaib et al.,
learners and their settings for the purpose of optimizing learning and its 2023a). Structured data such as grades and test scores may harmonize
environments (Shoaib, Sayed, et al., 2022a,b,c). Learning analytics can well with linear models, while unstructured data like text or audio may
be used to monitor student progress, identify problem areas, and provide necessitate more sophisticated models like neural networks. Tailoring
individualized assistance to help students achieve. It can also be utilized the model to desired outcomes is imperative, with decision tree models
to uncover trends and patterns in student behavior and performance and excelling in forecasting, and clustering models being well-suited for
to improve educational programs and resources (Kew & Tasir, 2022). pattern identification (Przegalinska & Jemielniak, 2023).
Learning analytics is an expanding discipline that is enhancing the ef Developing a machine learning model for predicting student per
ficacy and efficiency of education by offering insights into student formance involves a sequence of meticulous stages. Commencing with
learning and assisting educators in making data-driven decisions. The the collection of relevant student performance and attribute data, fol
practice of identifying patterns and insights from massive databases is lowed by data cleansing and preprocessing to prepare it for machine
known as data mining (Shu & Ye, 2023). Data mining can be utilized in learning utilization (Bhatt, 2023). The division of data into training and
the field of education to evaluate student data in order to comprehend test sets further refines the model’s efficacy, enhancing its performance
and improve learning outcomes. evaluation. This sophisticated AI-enhanced CMS addresses a multitude
Data mining can assist educators in identifying trends and patterns in of educational challenges. The AI-driven individualized learning paths
student data, such as exam performance and course content engagement empower each student to progress at an optimal pace while ensuring
(Aulakh et al., 2023). This can help educators comprehend how pupils necessary support. Automated test difficulty adjustments foster a
are learning and identify potential areas of difficulty. Data mining can be balanced challenge, preventing student disengagement. Furthermore,
used to discover the strengths and weaknesses of individual students and advanced CMS capabilities encompass automated grading and real-time
to give them customized, individualized learning experiences (Ismail tailored feedback, freeing educators from mundane tasks and bolstering
et al., 2023). Data mining can be used to evaluate student data from the teaching and learning process.
earlier iterations of a course in order to identify areas in which the The paper introduces a novel fuzzy rule-based path planning algo
course could be updated. Assessing the performance of educational in rithm for mobile robots in complex terrain (Wang et al., 2022). By
terventions: By examining the influence of educational interventions, integrating spatial point-taking methods with Dijkstra’s and fuzzy logic
such as new teaching techniques or resources, on student learning out algorithms, it addresses issues in existing methods related to motion
comes, data mining can be used to evaluate the effectiveness of educa laws, particularly angular and linear acceleration. The research paper
tional interventions, such as new teaching methods or resources. The use addresses the reachable set problem for impulse switched singular (ISS)
of data mining in education can assist educators in gaining a deeper systems with mixed time-varying delays using Lyapunov theory. It
2
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
introduces a real-bounded lemma to analyze impulse switching points experience. For example, a learning management system might recom
and establish a condition (Zhang et al., 2023), presented as linear matrix mend specific resources or activities based on a student’s progress or
inequalities (LMIs), ensuring the system’s reachable states remain interests. It can also be used to track student engagement with course
within a closed bounded region. Simulation results confirm the effec content and use this information to identify factors that influence
tiveness of the proposed approach. When considering the ideal model, engagement. For example, an instructor might use analytics to identify
CNNs present a potent solution for image and video data analysis when students are most likely to drop out of a course and intervene to
(Shoaib et al., 2022a,b,c, pp. 1–15). Their applications span from image provide support. Students’ progress over time provides feedback on their
classification to object recognition and facial detection. Within the learning are also a goal of learning analytics. This can help instructors to
educational sphere, CNNs can gauge performance metrics, like grades identify areas where a student is struggling and provide targeted
and test scores, to predict student success or identify individuals in need support.
of additional support (Talebi et al., 2023). The AI-infused CMS harnesses Learning analytics can be used to collect data about student learning
the prowess of machine learning algorithms (Singh et al., 2023), outcomes and use this information to evaluate the effectiveness of
fostering an educational experience attuned to each student’s unique different teaching strategies. This can help instructors to improve their
learning styles and preferences. By predicting student performance and teaching and create more effective learning experiences. Learning ana
identifying potential pitfalls, educators and administrators can proac lytics in education can significantly improve the effectiveness of smart
tively provide timely interventions, promoting student success. While education, by providing insights into student learning and engagement
AI-based learning management systems promise a host of benefits, and helping educators to tailor instruction to the needs of individual
including enhanced accessibility through natural language processing learners.
and speech recognition, it’s crucial to align these solutions with specific There are many different theoretical frameworks that have been
educational needs (Saadati et al., 2023). A thorough analysis of the proposed to guide the development and implementation of learning
application’s requirements should steer the selection of the most fitting analytics. Learning analytics should be aligned with theories of learning
AI-based CMS, ensuring a harmonious educational ecosystem. and consider how data can be used to support and enhance learning
Below are the major contributions of this research study. processes. Learning analytics should be based on robust and reliable
data sources, and consider issues of data quality, privacy, and ethics.
• The study’s foundational contribution lies in the meticulous curation Learning analytics should provide timely and actionable feedback to
and integration of diverse student data from various relational da learners, instructors, and other stakeholders, and consider how this
tabases. This comprehensive dataset is harmonized to ensure accu feedback can be used to support learning and improve educational
racy and consistency, forming the bedrock for subsequent analyses. outcomes. Personalization in learning analytics refers to the use of data
• An innovative feature learning block, based on CNN, is introduced. and technology to tailor the learning experience to the needs and pref
This intricate process unveils latent patterns within student data, erences of individual learners. This can involve adapting the content,
enabling the extraction of meaningful insights that significantly pacing, or delivery of instruction to meet the unique needs of each
bolster the performance of predictive models. learner and may be based on data such as a learner’s past performance,
• A distinctive contribution is the development of an ensemble clas interests, or learning style. Personalized learning can improve student
sification model, combining Support Vector Machine (SVM), engagement by providing learners with content and activities that are
Random Forest, and K-Nearest Neighbors (KNN) classifiers. relevant and meaningful to them. Personalized learning can enhance
Augmented by Bayesian averaging, this model showcases a novel motivation by providing learners with a sense of ownership and control
approach to enhancing predictive accuracy. over their own learning, and by making the learning experience more
• The study introduces a holistic prediction framework encompassing relevant and enjoyable. Personalized learning has been shown to
student grade prediction, risk assessment, and progression outcomes improve learning outcomes, particularly for students who may struggle
(retention or dropout). This multifaceted approach demonstrates the with traditional approaches to instruction. Personalization in learning
capability to address complex educational challenges in a unified analytics also presents some challenges, including the need for robust
manner. and reliable data sources, and the potential for biased algorithms or
• Rigorous evaluation of the models using unseen test data results in personalized recommendations to perpetuate existing inequalities.
noteworthy accuracy rates. Specifically, the models achieve 93% Personalization in learning analytics has the potential to significantly
accuracy for grade and risk prediction, and 92% accuracy for improve the effectiveness of education, but it is important to carefully
retention/dropout prediction, underscoring the reliability of the AI- consider the potential benefits and challenges and to ensure that
based system. personalized approaches are fair and equitable for all learners.
Educational data mining is the application of data mining techniques
This article starts by introducing the proposed model and its unique to the field of education, with the goal of discovering patterns and re
contributions. Then in section 2, which is a literature review, it looks at lationships in educational data that can be used to improve teaching and
what other research has found, pointing out the gaps it wants to fill. learning. Educational data mining typically involves the analysis of data
Section 3 explains in detail how the study was done, including how the from educational technologies, such as learning management systems,
data was collected and processed, and how the machine learning models student response systems, and intelligent tutoring systems. Educational
were built using an ensemble classification approach. Section 4 discusses data mining as student modeling can be used to create models of student
the results of the experiments, giving insights into how well the model knowledge, skills, and learning strategies, which can be used to
performs. This section also wraps up with a discussion of the proposed personalize instruction and support learning.
model, and results and highlights the significance of the AI-based LMS. Educational data mining can be used to predict student performance,
identify at-risk students, and forecast the impact of interventions. This
2. Literature review can help educators intervene early and provide targeted support to
students who may be struggling. Educational data mining can be used to
Learning analytics is the measurement, collection, analysis, and evaluate the effectiveness of educational interventions, such as new
reporting of data about learners and their contexts, for the purposes of teaching methods or technologies. This can help educators to identify
understanding and optimizing learning and the environments in which it what works and what doesn’t make informed decisions about how to
occurs. In the context of smart education, learning analytics can be used improve the learning environment. Educational data mining can be used
to collect data about a student’s strengths, weaknesses, and learning to identify patterns and trends in student data, which can inform the
preferences, and use this information to personalize the learning design of new learning resources or the improvement of existing ones.
3
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
This can help educators to create more effective and engaging learning 2.2. Learning management system
materials. Education data mining has the potential to significantly
improve the performance of a learning environment by providing in The ubiquitous educational setting entails the utilization of a
sights into student learning and helping educators tailor instruction to learning management system (LMS), with Moodle serving as an exem
the needs of individual learners. plar of an open-source platform within this domain. Martin Dougiamas
disseminated the prototype of Moodle (Palmer, 2013) in 1999 as an
2.1. Students at risk integral component of his doctoral dissertation at Curtin University
(Dougiamas & Taylor, 2000). The second paper, with Peter Charles
Student retention and performance are the two primary areas of Taylor (Dougiamas & Taylor, 2002), employs Moodle for constructivism
focus for EDM and LA. Some journals connected to learning analytics, graduate courses. Social constructivist pedagogy informs Moodle’s
enterprise data management, and online learning focus on student design and growth. This citation can be found at https://docs.moodle.
retention and identifying predictors of student success, where success is org/35/en/Philosophy. The premise of constructivist learning theory
defined as passing or finishing a course. In the fields of AL and MLE, (Staver, 1998) is that humans generate knowledge via experience. Social
these are considered typical areas of study. Tinto (Tinto, 1975) is a constructivism (Doolittle & Camp, 1999) promotes peer interaction.
highly quoted paper in the literature on student retention that presents a According to constructivist learning theory, humans gain knowledge
theoretical framework for comprehending student behavior. A research more efficiently when they make things. Therefore, the majority of
finding performed by Tinto’s is robust and similar to the social Moodle Type 2 activities adhere to these learning theories. Assessments
constructivist mechanism on the basis of which Moodle (Doolittle & in Moodle vary significantly based on the Moodle activity being evalu
Camp, 1999) is founded. Based on the essay by Tinto’s (Tinto, 1975), ated. It is possible to automatically grade quizzes and provide students
after a student is accepted into a degree or program, social character with comments depending on their responses. The instructor often
istics become more significant than their background. Therefore, social grades assignments manually. Consequently, instructors possess the
integration is one of the most important factors to consider when capability to furnish personalized feedback on student submissions and
attempting to predict which students will drop out of a degree program authorize select students to revise their work. Within Moodle courses,
or field of study. In the LA and MDE literature, using various data adaptive exercises incorporate both content and question sections, with
sources to forecast at-risk students is commonplace. The authors of students’ responses to queries undergoing automatic grading. Moodle
(Romero et al., 2008), research on early student performance using also enables other sorts of assessments that demand greater student
Moodle data to predict students’ final grades, employed multiple clas participation, such as students evaluating the information supplied by
sification algorithms to predict student success. An early warning pre other students in a workshop or evaluating the contributions of other
dictive model for students at risk of dropping out was assessed using a students to grading activities.
large number of variables as predictors (Márquez-Vera et al., 2016), In the second case, contributions can be things like forum posts,
including demographics, prior accomplishment, and high reporting glossary entries, or database activity entries. All grades are kept in the
precision. Establishing weekly predictive models for many courses (He Moodle grade book, which gives teachers different reports and sum
et al., 2015); The study also reported the efficacy of early intervention as maries. You can get to Moodle’s backend systems using the internet or
opposed to intervention in the latter weeks of the course. Caldwell et al. mobile apps. When Moodle runs online courses, student learning actions
(Coldwell et al., 2008) Add demographic information to are logged and saved in a database with granular activity records.
student-generated learning management system activities. Moodle provides a relatively rudimentary reporting capability by
The study discovered a relationship between demographic informa default. Its reporting features enable users, typically instructors, to ac
tion such as gender nationality and performance. OU Analyze is another cess course activity logs, and student or activity group statistics, and to
example of merging demographic and activity data (Item, 2015). The build aggregated data graphs. Moodle LMS does not include analysis
project’s-built technologies automatically generate teacher dashboards. tools to extract data from activity logs and visualizations; nevertheless, a
Chaplot et al. (Chaplot et al., 2015) Using neural networks and senti number of Moodle-compatible plugins can be added independently and
ment analysis, the phenomenon of MOOC dropout was investigated. It are simple to install.
was determined to what extent the content of various MOOCs and the Moodle possesses two notable educational data mining tools: CVLA
interaction between instructors and students caused dropout (Hone & El (Dragulescu et al., 2015), which includes a predictive job submission
Said, 2016). Student performance was also investigated (Xing et al., model, and MDM (Luna et al., 2017). This open-source educational data
2015); the objective of this piece is to enhance teachers’ analysis and mining application facilitates the discovery of new knowledge. Some of
comprehension of performance data. In a recent study (Burgos et al., these tools are open source, while others are equipped with predictive
2017), for instance, the authors utilized various classification methods analytics. To effectively oversee the entire process from hypothesis
to predict academic failure. Other research-based predicting approaches testing to obtaining actionable findings, it is necessary to establish a
include homework submission (Dragulescu et al., 2015) and categori comprehensive framework. At present, the popularity of Moodle has
zation of student learning styles (Abdullah et al., 2015; II & Bower, surged, boasting a user base of 144,413,576 users and offering 15
2011). Although demographic input variables have been included in million courses across over 100,000 Moodle sites worldwide. There are,
certain studies (Jayaprakash et al., 2014a,b; Márquez-Vera et al., 2016; however, other LMSs with significant market share. Blackboard Learn4
Palmer, 2013), variables based on student-generated activities in the provides a variety of exams, assignments, and learning modules.
LMS appear to be the subject of the majority of research. In 2005 (Morris Canvas5 is a contemporary LMS developed by Instructure that shares
et al., 2005), variables such as the number of content pages read, features with Moodle and Blackboard Learn. It delivers the majority of
number of original articles, and course length variables were studied. functionality through external LTI programs that are also compatible
Other studies have expanded the variable lists to include specific LMS with other LMSs. Google Classroom is yet another well-known LMS that
activities such as assignments, quizzes, forums, and wikis (Zacharis, was published in 2014 and was created by Google. It offers assignments,
2015). Other LMS elements, such as message systems, have also been quizzes, and a variety of grading techniques.
evaluated as predictors (Macfadyen & Dawson, 2010). There is consid
erable variation amongst studies, making it difficult to draw solid 2.3. Supervised learning
findings. The predictive effectiveness of LMS activity-based variables
varies by course and by activity type. Machine learning falls under the umbrella of computer science and
encompasses various techniques for modeling data. Within the domain
of supervised learning, algorithms are employed to discern a function
4
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
that delineates the correlation between a dependent variable and a set of component that addresses the transferability of predictive models across
independent variables. Notably, regression analysis and least squares institutions. The results of this project have been published in several
represent early instances of supervised learning algorithms (Gauss et al., articles (Lauria et al., 2012; Jayaprakash, 2014a,b; Jayaprakash et al.,
1857; Legendre, 1805). The linear discriminant (Kemp, 2003) is a pre 2014a,b, 2016). According to these investigations, the predictive per
liminary classifier. Another prominent classifier based on Bayes’ theo formance of datasets from different institutions is high. Subsequent
rem (‘The Royal Society is collaborating) is Naive Bayes (Rish, 2001), research (Gašević et al., 2016) has contended that the transferability of
which assumes that the features are entirely independent and indepen predictive models lacking specific teaching conditions can yield highly
dently linked with the labels. Sadly, this assumption also represents the variable outcomes when applied to different courses, posing potential
greatest shortcoming of these models (Rish, 2001). Logistic regression threats to result validity such as the overestimation or underestimation
(Cox, 1958) was an additional primary classifier; this is characterized as of certain predictors. Recent studies (Conijn et al., 2017) corroborate
a linear separable model employing a sigmoid activation functions. As a these findings. Scholars in the realms of learning analytics and educa
classification technique, it attempts to identify functions which apply to tional data mining are increasingly mindful of the transferability of
the overall training data instances in order to more precisely estimate predictive models across diverse courses. Applying a model trained on
the dependent variable. The decision tree (Quinlan, 1986) is another data from one course to data from another course poses significant
categorization system that has been popular for decades. Every terminal considerations (Gitinabard et al., 2019), the authors report doing so with
node within the tree symbolizes a composite of the edges (also known as at least 60% accuracy. The authors studied two different courses from
input features) encompassing all branches of the tree leading up to the two different courses. Another recently published example of evaluating
root node. Decision trees are closely connected to random forests (Ho, the portability of the author’s predictive model is (Moreno-Marcos et al.,
1995). They attempted to circumvent the overfitting issue associated 2019). In this publication, the author reports high accuracy. They
with decision trees by generating many trees and pooling their results. concluded that predictive models can be successfully transferred to
SVMs (Cortes & Vapnik, 1995) are extensively used supervised model different educational contexts if certain context-relevant conditions are
for classification and regression analysis. SVM segment the linear line met, such as the course having the same or similar users (Mor
(hyperplane) and categorize samples in accordance. Artificial neural eno-Marcos et al., 2019). Another recently published example of eval
networks (ANNs) (Rosenblatt, 1958) are well-known functions for su uating the portability of the author’s predictive model is (Morris et al.,
pervised and unsupervised learning that resemble the human neuro 2005).
system for passing and forwarding of message for task completion and
learning purpose. ANNs enhance their performance during the model 3. Methodology
training phase by tunning the parameters and back propagating the
discrepancies between the data points labels computed by the ANN In this phase, software frameworks are developed for educational
(Rumelhart et al., 1986) and the actual values to the preceding layer of data mining and learning analytics prediction models by providing a set
the network, where the weights of the ANN are modified. of standardized tools and libraries that are used to build and deploy
CNNs are frequently employed in computer vision (Shoaib et al., these models. The schematic representation of the proposed framework
2023b); the visual cortex of animals stimulates CNNs, which analyze for student analysis and future prediction, encompassing Final Grade
nearby characteristics of each feature to discover patterns. For time Prediction, Risk of Dropout Prediction, and Progression Prediction
series problems, recurrent neural networks (RNNs) (Shoaib, Hussain, Model Assessment, is depicted in Fig. 1. One advantage of using a soft
et al., 2022, pp. 1–18) and long-term memory units (LSTMs) (Ullah ware framework is that it can save time and effort by providing pre-built
et al., 2021) are utilized. This form of ANN is employed in systems for components that can be easily reused and customized for specific ap
natural language processing and speech recognition. Instead of a single plications. For example, a software framework might include tools for
feature vector, such a network receives a sequence of vectors as input. data preprocessing, feature extraction, model training and evaluation,
The output is computed by merging each input vector with the internal and visualization, which can be used to build and test prediction models
state generated by the preceding input vector. This network type has more efficiently (Aldoseri et al., 2023). Another advantage of using a
also been employed to predict school dropout (Fei & Yeung, 2015). software framework is that it can help to ensure the quality and reli
In machine learning, data preprocessing is an essential task. Data ability of prediction models by providing a consistent and
preparation approaches include filling in missing values, which is well-documented development process. A software framework can also
essential for sparse datasets, or ensuring that there are no outliers that facilitate collaboration and sharing of prediction models by providing a
could impair the prediction model’s performance (Shoaib, Sayed, et al., common platform that can be used by multiple researchers and de
2022a,b,c). Data cleaning involves the thorough examination of a velopers. A software framework provides a solid foundation for the
dataset to rectify any inaccuracies present in its samples and features, development of educational data mining and learning analytics predic
thereby ensuring their alignment with the topic under investigation. tion models by saving time and effort, ensuring quality and reliability,
Furthermore, normalization and data standardization are additional and facilitating collaboration and sharing.
techniques commonly employed in machine learning to ensure that all This section outlines the research design and procedures employed in
features fall within a specified range of values, thereby enhancing the the development and evaluation of the AI-based Learning Management
efficacy of machine learning algorithms (Starbuck, 2023). System (LMS) for the specific context of Pakistan’s university environ
ment. The research aimed to address the unique challenges faced by
2.4. Predictive model’s portability students in engineering, computer science, management science,
biotechnology, and pharmacy departments by utilizing a mixed-method
Portability stands as a crucial component of any predictive model, research approach, combining elements of experimental and quasi-
ensuring its capability to furnish dependable predictions utilizing data experimental designs. The methodology encompasses multiple phases,
from diverse origins. Both the data utilized for generating predictions starting with data collection from various universities, followed by
and the data employed for training the supervised learning algorithm rigorous data preprocessing and the development CNN features learning
can stem from the same dataset. However, portability requires block with an ensemble classification model. The models were fine-
leveraging data collected from various sources. In this particular case, tuned and evaluated to predict student success and identify those at
different courses are offered through different LMSs. Examining issues risk of academic hazards. Subsequently, the AI-based LMSwas integrated
related to the expansion of learning analytics techniques and solutions in into the university environment, offering individualized learning expe
higher education (Jayaprakash et al., 2014a,b) is the goal of the Open riences, predictive analytics, and automated feedback to support student
Academic Analytics Initiative (OAAI). The program includes a retention and academic achievement. User feedback and iterative
5
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
improvement played a pivotal role in refining the system to meet the AI-based LMS to cater to the specific needs of individual students.
diverse needs of different departments and provide valuable insights for Detailed information regarding the subjects or courses undertaken
future research and implementation of best practices in AI-driven by students within their respective departments was also incorporated
educational technologies. into the dataset which can be seen in Table 1. This included course
codes, course titles, credit hours, and academic semesters during which
the courses were completed. Understanding the subject specifics allowed
3.1. Dataset the AI-based LMS to offer tailored learning experiences, identifying
areas of strength and weakness for each student and providing relevant
The dataset used in this research study was meticulously curated to learning resources accordingly.
comprehensively address the diverse academic challenges faced by Table 2 provides an insightful overview of various student statistics
students in the departments of Engineering, Computer Science, Man across different academic departments. The table includes information
agement Science, Biotechnology, and Pharmacy within esteemed uni on the number of students enrolled, the average age of students, the
versities across Pakistan. The dataset was sourced from multiple count of male and female students, and the average CGPA (Cumulative
educational institutions, representing a diverse student population, and Grade Point Average) achieved by students in each department. This
encompassed comprehensive student records spanning multiple aca table enables a quick comparison of key metrics related to student de
demic years. These records comprised a rich set of academic indicators, mographics and academic performance among various departments.
including course grades, test scores, course enrollments, student in The information presented can be valuable in identifying trends,
teractions within the Learning Management System (LMS), and essential assessing the distribution of students, and understanding the academic
demographic details. The dataset included a wide array of student- achievements of students in each discipline.
specific details, such as age, gender, ethnicity, and prior educational In the above Table 3, we have three predicted attributes: "Final
background. This information offered valuable insights into the diverse Grade," "Student Risk," and "Progression Status." For each attribute, the
composition of the student population across the selected departments. corresponding classes or categories are listed. For "Final Grade," the
By understanding these demographic aspects, the study aimed to classes represent the different possible grades a student can achieve,
address any potential disparities in academic performance and tailor the ranging from "A" to "F." For "Student Risk," the classes represent the
different levels of risk a student may have, which are categorized as
Table 1 "Low," "High," and "Moderate." For "Progression Status," the classes
Attributes of the student dataset. indicate whether a student is "Retained" in the educational program or at
Attribute No. Attribute Name Type/Value risk of "Dropout."
1 Student ID Numeric
These predicted attributes and their classes are essential for machine
2 Department Char/String learning models to make accurate predictions and classifications based
3 Age Numeric on the input data. The models can use these classes as target labels to
4 Gender Char
5 Prior Education String
6 Campus Distance Numeric (Kilometers)
7 Subject Code Char Table 2
8 Subject Name Char Overview of student statistics by department.
9 Credit Hours Numeric
Department Number of Average Male Female Average
10 Semester Numeric
Students Age Students Students CGPA
11 Subject Percentage Numeric
12 GPA Numeric Engineering 350 21.5 240 110 2.7
13 CGPA Numeric Computer 250 20.8 180 70 2.6
14 Prerequisite Subject Name Char Science
15 Prerequisite Subject GPA Numeric Management 300 21.3 150 150 2.8
16 Final Grade Char Science
17 Risk of Dropout Char (No Risk, Low, Moderate, High) Biotechnology 200 21.9 100 100 2.9
18 Progression Char (Retention, Dropout) Pharmacy 180 23.1 90 90 2.7
6
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
Fig. 2. Relational Databases to Flat Database Conversion using the Talend Application.
7
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
8
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
C
∑
Entropy(S) = − pi log2 (pi ) (3)
i=1
where.
where.
instance-based classifier (Sharma & Guleria, 2022). It classifies an The performance of the proposed model is evaluated using four key
instance by considering the class labels of its k-nearest neighbors in the metrics: accuracy, precision, recall, and F1-score. These metrics provide
feature space. KNN is intuitive, easy to implement, and well-suited for a comprehensive understanding of the model’s effectiveness in handling
multi-class classification tasks. equation (2) show the Euclidean distance various aspects of classification tasks. However, it’s important to note
between n data points, whereas minimum distance shows greater that the dataset exhibits class imbalance, meaning that not all classes
similarity. have an equal number of samples for model training. In such scenarios,
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ accuracy alone may not be the most informative metric, as it can be
Distance = (x2 − x1 )2 + (y2 − y1 )2 + …+(n2 − n1 )2 (2) biased towards the majority class. Instead, the F1-score, which balances
precision and recall, becomes crucial. The F1-score considers both false
3.5.1.3. Random forest. Random Forest is an ensemble learning method positives and false negatives, making it a robust measure for evaluating
based on decision trees (Shoaib et al., 2023c). It constructs multiple model performance, particularly in situations with imbalanced data
decision trees during training and combines their predictions through distributions. Therefore, in this study, while accuracy is still considered,
voting to arrive at the final classification. Random Forest is robust, the F1-score takes precedence as a more reliable indicator of the model’s
handles large datasets, and effectively deals with noisy data. The below overall performance.
equations (3) and (4) is used by the decision trees with the random forest
TP + TN
to select the nodes sequence and classify new instances based on the Accuracy = (6)
TP + FP + FN + TN
training data.
9
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
10
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
row and "B+" column (1) implies that 1 instance from class "A" was
misclassified as "B+." These misclassification counts provide insights
into the types of errors made by the model and help assess its perfor
mance in distinguishing between different classes.
The performance analysis of the model demonstrates its effectiveness
in predicting final grades across diverse classes, as illustrated in Table 7.
Notably, the model demonstrates strong accuracy for most classes, with
scores ranging from 0.88 to 0.97. Precision and recall scores indicate the
model’s ability to make accurate positive predictions and capture actual
positive instances, respectively. Overall, the model maintains a balanced
F1-score, emphasizing its consistent predictive capabilities across
different academic categories. These results offer valuable insights into
the model’s performance and its potential to enhance educational
decision-making and outcomes.
Fig. 6 illustrates the performance analysis of the Grade Prediction
Model through a Box Plot, providing a visual representation of key Fig. 7. Proposed grade prediction model ROC-AUC curve.
metrics and insights into the model’s predictive capabilities.
The Receiver Operating Characteristic (ROC) curve is a significant
visual tool employed to evaluate and assess the performance of classi Table 8
fication models. In our study, we utilize the ROC curve to analyze the Class-wise performance analysis for Risk Prediction using the proposed model.
predictive accuracy of the grade prediction model for different classes. Class Accuracy Precision Recall F-Measure
The Area Under the Curve (AUC) values associated with each class
No Risk 0.97 0.81 0.84 0.83
provide an insightful measure of the model’s ability to discriminate Low 0.93 0.88 0.88 0.88
between positive and negative instances. As revealed by the AUC scores, Moderate 0.89 0.96 0.91 0.94
the grade prediction model demonstrates varying levels of discrimina High 0.94 0.91 0.92 0.92
tive power for different classes. Notably, classes B, C, and C+ exhibit Average 0.93 0.89 0.88 0.89
notably high AUC values of 0.96, 0.98, and 0.94, respectively, indicating
the model’s strong ability to distinguish between the corresponding
class labels. Class A, B+, D+, and F also demonstrate respectable AUC
values of 0.88, 0.89, 0.91, and 0.88, respectively, signifying their reli
able predictive capabilities. The ROC curve’s visualization depicted in
Fig. 7, coupled with the AUC scores, provides a comprehensive assess
ment of the model’s performance across different grade categories,
facilitating valuable insights for educators and stakeholders to make
informed decisions and interventions to optimize student outcomes.
insights into its ability to identify students who are more susceptible to
dropout, thereby enabling timely interventions to mitigate potential
attrition. The details performance analysis of the dropout prediction
model can be seen in Table 8. In Fig. 8, the Confusion Matrix for the
Proposed Student at Risk Prediction ensemble model is presented. This
matrix provides values for True Positives (TP), False Positives (FP), False
Negatives (FN), and True Negatives (TN), crucial metrics used to
calculate the model’s performance, including accuracy, precision, recall,
and F1-score.
In the evaluation of our Risk Prediction ensemble model, as depicted
in Fig. 9 through a Box Plot, and further validated by the ROC-AUC
Curve presented in Fig. 10, we observe a comprehensive performance
analysis, providing valuable insights into the model’s robustness and
predictive accuracy.
11
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
Fig. 9. Performance Analysis using the Box Plot for Risk Prediction ensemble model.
12
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
The deployment of the proposed AI Student Success Predictor marks Fig. 13. Proposed progression prediction model ROC-AUC curve.
a significant advancement in enhancing the efficiency and effectiveness
of the university’s campus management system. This AI-powered system management. By seamlessly connecting with the university’s relational
is seamlessly integrated with the university’s relational database, database and harnessing the capabilities of ensemble model training,
forming a cohesive ecosystem for data-driven decision-making. By this AI module empowers educational institutions to proactively address
accessing and analyzing relevant student subject data, the AI module academic challenges, optimize student outcomes, and enrich the overall
employs a sophisticated ensemble model trained on comprehensive learning experience.
learning datasets. At its core, the AI module harnesses the power of
machine learning algorithms to predict not only student grades but also 4.4.1. Early Semester Grade, risk, and progression prediction
two crucial aspects: Risk Prediction and Progression Prediction. In this scenario, the AI Student Success Predictor is deployed at the
Leveraging the ensemble model’s learning capabilities, the system pro start or mid of the semester, prior to the final exams. The system is
vides insights into students’ academic performance, enabling timely seamlessly integrated with the university’s campus management system,
interventions to mitigate potential risks and enhance overall retention leveraging its relational database to provide holistic insights into stu
rates. dent performance. This encompasses early grade prediction, risk
One of the notable features of this deployment is its platform inde assessment, and progression outlook.
pendence. The AI Student Success Predictor is designed to seamlessly Fig. 14 depict Early Semester Grade, Risk, and Progression Prediction
integrate with various relational databases, making it adaptable to the using the integrated proposed model in the university’s Learning Man
diverse technological landscapes of educational institutions. This agement System (LMS). Panel (a) features a green background indi
adaptability ensures that universities utilizing different database sys cating students at no risk and retention, (b) showcases an orange
tems can harness the benefits of predictive analytics and proactive stu background signifying students at risk but with retention in the current
dent support offered by the AI module. In practice, as students engage semester, and (c) exhibits a red background denoting students at high
with the campus management system, the AI module orchestrates a risk of dropout in the current semester. The AI system accesses the
seamless exchange of information. Relevant student data, including campus management database to extract the student’s enrolled courses,
subject information and academic history, is retrieved from the data forming the basis for prediction. For each enrolled course, the system
base. By processing this data through the ensemble model, the AI system identifies prerequisite courses and retrieves the student’s grades in these
generates accurate predictions for student grades, risk of dropout, and subjects. Utilizing an ensemble of historical data, past grades, and
progression status. These predictions equip educators and administra interaction patterns, the AI model predicts the student’s upcoming
tors with actionable insights to provide personalized guidance and grades, risk level (no risk, low, moderate, high), and progression forecast
support to students, fostering an environment conducive to academic (retention or dropout). Predicted grades, risk assessment, and progres
success. sion outlook are presented to both the student and the university,
The deployment of the AI Student Success Predictor represents a enabling tailored interventions and academic support.
pivotal step towards data-driven and student-centric education
4.4.2. Validation of Predicted Performance and Early Warning
In this scenario, the AI system’s predictions are rigorously tested
against the student’s actual outcomes, serving as a vital checkpoint to
assess its accuracy and reliability in real-world conditions.
As the semester concludes and final grades are available, the AI
system retrieves the student’s actual course grades, which provide a
baseline for comparison. The predicted performance, encompassing
grades, risk categorization, and progression forecast, is matched against
the actual outcomes to gauge the AI model’s efficacy. The comparison
not only validates the predictive accuracy but also identifies instances
where early risk and progression alerts could have prompted timely
interventions. The university gains actionable insights into the AI
model’s effectiveness, fostering continuous enhancement of student
success initiatives and data-driven strategies. In Fig. 15, the Validation
of Predicted Performance and Early Warning using the integrated model
in the university’s LMS is demonstrated. Panel (a) represents instances
Fig. 12. Performance Analysis using the Box Plot for Progression Predic of no risk, (b) signifies risks with retention, and (c) denotes high-risk
tion Model. scenarios of dropout. This visualization provides a valuable reference
13
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
Fig. 14. Early Semester Grade, Risk, and Progression Prediction using the integrated model in the university’s LMS. (a) Green: No risk, (b) Orange: Risk with
retention, (c) Red: High risk of dropout.
14
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
Fig. 15. Validation of Predicted Performance and Early Warning using the integrated model in the university’s LMS. (a) Green: No risk, (b) Orange: Risk with
retention, (c) Red: High risk of dropout.
in our discussion on the model’s performance assessment. but also enables early risk identification and progression anticipation.
These scenarios illuminate the multifaceted capabilities of the AI Seamlessly integrated with the university’s database application, this
Student Success Predictor. By harnessing the power of data and pre AI-driven solution empowers educators to proactively enhance student
dictive analytics, the system not only forecasts academic performance outcomes, ensure timely support, and foster a culture of continuous
15
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
4.5.6. Driving continuous improvement Muhammad Shoaib: Writing – original draft, Software, Methodol
The integration of AI-driven insights with the existing CMS opens ogy, Investigation, Data curation, Conceptualization. Nasir Sayed:
new avenues for continuous improvement. The university gains access Writing – original draft, Methodology, Investigation, Formal analysis,
to data-driven decision-making, facilitating the refinement of academic Data curation, Conceptualization. Jaiteg Singh: Writing – review &
programs, support services, and teaching methodologies. The iterative editing, Writing – original draft, Methodology, Investigation. Jana
feedback loop formed by this integration ensures that the system evolves Shafi: Writing – review & editing, Visualization, Validation, Resources,
alongside the needs of the student body and educational objectives. Methodology, Conceptualization. Shakir Khan: Writing – review &
The integration of Python-based AI capabilities with the Oracle editing, Writing – original draft, Methodology, Investigation, Data
relational database presents a transformative leap in advancing student- curation, Conceptualization. Farman Ali: Writing – original draft,
16
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
Supervision, Methodology, Investigation, Conceptualization. Gare, A. (2023). Challenging the dominant grand narrative in global education and
culture. In Field environmental philosophy: Education for biocultural conservation (pp.
309–326). Springer.
Declaration of competing interest Gašević, D., Dawson, S., Rogers, T., & Gasevic, D. (2016). Learning analytics should not
promote one size fits all: The effects of instructional conditions in predicting
The authors declare that they have no conflict of interest. academic success. The Internet and Higher Education, 28, 68–84. https://doi.org/
10.1016/j.iheduc.2015.10.002
Statement Gauss, C. F., Davis, C. H., & Project, M. of A. (1857). Theory of the motion of the heavenly
Authors used ChatGPT, an AI language model to improve readability bodies moving about the sun in conic sections a translation of Gauss’s “Theoria motus.”
and language quality of the manuscript. After using this service, authors with an appendix. Boston: Little, Brown and company.
Gitinabard, N., Xu, Y., Heckman, S., Barnes, T., & Lynch, C. F. (2019). How widely can
reviewed and edited the content as needed and take full responsibility prediction models Be generalized? Performance prediction in blended courses. IEEE
for the content of the publication. Trans. Learn. Technol., 12(2), 184–197. https://doi.org/10.1109/TLT.2019.2911832
He, J., Baileyt, J., Rubinstein, B. I. P., & Zhang, R. (2015). Identifying at-risk students in
massive open online courses. Proc. Natl. Conf. Artif. Intell., 3, 1749–1755. https://doi.
Data availability org/10.1609/aaai.v29i1.9471
Ho, T. K. (1995). Random decision forests. Proceedings of 3rd International Conference on
Data will be made available on request. Document Analysis and Recognition, 1, 278–282. https://doi.org/10.1109/
ICDAR.1995.598994
Hone, K. S., & El Said, G. R. (2016). Exploring the factors affecting MOOC retention: A
References survey study. Computer Education, 98, 157–168. https://doi.org/10.1016/j.
compedu.2016.03.016
Abdullah, M., Alqahtani, A., Aljabri, J., Altowirgi, R., & Fallatah, R. (2015). Learning style II, I., & Bower, B. (2011). Student characteristics that predict persistence in community
classification based on student’s behavior in Moodle learning management system. college online courses. Amer. Jrnl. Distance Educ., 25, 178–191. https://doi.org/
https://doi.org/10.14738/tmlai.31.868 10.1080/08923647.2011.590107
Aldoseri, A., Al-Khalifa, K. N., & Hamouda, A. M. (2023). Re-thinking data strategy and Ismail, H., Hussein, N., Harous, S., & Khalil, A. (2023). Survey of personalized learning
integration for artificial intelligence: Concepts, opportunities, and challenges. software systems: A taxonomy of environments, learning content, and user models.
Applied Science, 13(12). https://doi.org/10.3390/app13127082 Education in Science, 13(7), 741.
Allen, K.-A., Slaten, C. D., Arslan, G., Roffey, S., Craig, H., & Vella-Brodrick, D. A. (2021). Item, J. (2015). OU analyse : Analysing at - risk students at the open.
School belonging: The importance of student and teacher relationships. In The Jayaprakash, S. M., & Lauria, E. J. M. (2014a). Open academic early alert system:
Palgrave handbook of positive education (pp. 525–550). Cham: Springer International Technical demonstration. In Proceedings of the Fourth international conference on
Publishing. learning analytics and knowledge (pp. 267–268).
Alshurideh, M., Al Kurdi, B., Salloum, S. A., Arpaci, I., & Al-Emran, M. (2023). Predicting Jayaprakash, S. M., Laur\’\ia, E. J. M., Gandhi, P., & Mendhe, D. (2016). Benchmarking
the actual use of m-learning systems: A comparative approach using PLS-SEM and student performance and engagement in an early alert predictive system using
machine learning algorithms. Interactive Learning Environments, 31(3), 1214–1228. interactive radar charts. In Proceedings of the sixth international conference on learning
Aminizadeh, S., Heidari, A., Toumaj, S., Darbandi, M., Navimipour, N. J., Rezaei, M., … analytics \& knowledge (pp. 526–527).
Unal, M. (2023). The applications of machine learning techniques in medical data Jayaprakash, S., Moody, E., Lauria, E., Regan, J., & Baron, J. (2014b). Early alert of
processing based on distributed computing and the internet of things. Computer academically at-risk students: An open source analytics initiative. J. Learn. Anal., 1,
Methods and Programs in Biomedicine, 107745. 6–47. https://doi.org/10.18608/jla.2014.11.3
Aulakh, K., Roul, R. K., & Kaushal, M. (2023). E-Learning enhancement through Jones-Khosla, L. A., & Gomes, J. F. S. (2023). Purpose: From theory to practice. Global
educational data mining with covid-19 outbreak period in backdrop: A review. Business and Organizational Excellence, 43(1), 90–103.
International Journal of Educational Development, Article 102814. Kemp, F. (2003). Applied multiple regression/correlation analysis for the behavioral sciences.
Lauria, E. J. M., Baron, J. D., Devireddy, M., Sundararaju, V., & Jayaprakash, S. M. Oxford University Press.
(2012). Mining academic data to improve college student retention: An open source Kew, S. N., & Tasir, Z. (2022). Developing a learning analytics intervention in e-learning
perspective. In Proceedings of the 2nd international conference on learning analytics and to enhance students’ learning performance: A case study. Education and Information
knowledge (pp. 139–142). Technologies, 27(5), 7099–7134.
Bhatt, R. (2023). An analytical review of deep learning algorithms for stress prediction in Laparra, V., Pérez-Suay, A., Piles, M., Muñoz-Marí, J., Amorós, J., Fernandez-Moran, R.,
teaching professionals. Innov. Eng. with AI Appl., 23–39. … Adsuara, J. E. (2023). Assessing the impact of using short videos for teaching at
Burgos, C., Campanario, M., Peña, D., Lara, J., Lizcano, D., & Martínez, M. (2017). Data higher education: Empirical evidence from log-files in a learning management
mining for modeling students’ performance: A tutoring action plan to prevent system. IEEE Rev. Iberoam. Tecnol. del Aprendiz., 1. https://doi.org/10.1109/
academic dropout. Computers & Electrical Engineering, 66. https://doi.org/10.1016/j. RITA.2023.3301411
compeleceng.2017.03.005 Legendre, A. M. (1805). Nouvelles méthodes pour la détermination des orbites des comètes. F.
Chaplot, D. S., Rhim, E., & Kim, J. (2015). Predicting student attrition in MOOCs using Didot.
sentiment analysis and neural networks (Vol. 1432, pp. 7–12). CEUR Workshop Proc.. Lipman, E., Moser, S., & Rodriguez, A. (2023). Explaining differences in voting patterns
Coldwell, J., Craig, a, Paterson, T., & Mustard, J. (2008). Online students: Relationships across voting domains using hierarchical bayesian models (Vol. 2016) [Online].
between participation, demographics and academic performance. Electron. J. e- Available: http://arxiv.org/abs/2312.15049.
Learning, 6(1), 19–28 [Online]. Available: http://iucontent.iu.edu.sa/Scholars/Infor Luna, J. M., Castro, C., & Romero, C. (2017). MDM tool: A data mining framework
mationTechnology/OnlineStudentsRelationshipsbetweenParticipation,Demograph integrated into Moodle. Computer Applications in Engineering Education, 25(1),
icsandAcademicPerformance.pdf. 90–102. https://doi.org/10.1002/cae.21782
Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2017). Predicting student Macfadyen, L. P., & Dawson, S. (2010). Mining LMS data to develop an ‘early warning
performance from LMS data: A comparison of 17 blended courses using Moodle LMS. system’ for educators: A proof of concept. Computer Education, 54(2), 588–599.
IEEE Trans. Learn. Technol., 10(1), 17–29. https://doi.org/10.1109/ https://doi.org/10.1016/j.compedu.2009.09.008
TLT.2016.2616312 Maguvhe, M. O. (2023). Supporting students experiencing barriers to learning in
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), inclusive education settings: A critical requirement for educational success. In Using
273–297. https://doi.org/10.1007/BF00994018 african epistemologies in shaping inclusive education knowledge (pp. 375–393). Springer.
Cox, D. R. (1958). The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B Stat. Márquez-Vera, C., Cano, A., Romero, C., Noaman, A. Y. M., Mousa Fardoun, H., &
Methodol., 20(2), 215–232. Ventura, S. (2016). Early dropout prediction using data mining: A case study with
S. Dawson, “The concept of personalized and adaptive learning has long been touted but high school students. Expert Systems, 33(1), 107–124. https://doi.org/10.1111/
seldom enacted in education at scale. Data Analytics and Adaptive Learning brings exsy.12135
together a compelling set of experts that provide novel and research-informed Moreno-Marcos, P. M., Laet, T. D., Muñoz-Merino, P. J., Van Soom, C., Broos, T.,
insights into contem.”. Verbert, K., & Kloos, C. D. (2019). Generalizing predictive models of admission test
Doolittle, P. E., & Camp, W. G. (1999). Constructivism: The career and technical success based on online interactions. Sustainable Times, 11(18), 1–19. https://doi.
education perspective. Journal of Vocational and Technical Education, 16(1), 23–46. org/10.3390/su11184940
Dougiamas, M., & Taylor, P. C. (2000). Improving the effectiveness of tools for Internet-based Morris, L., Finnegan, C., & Wu, S.-S. (2005). Tracking student behavior, persistence, and
education. achievement in online courses. The Internet and Higher Education, 8, 221–231.
Dougiamas, M., & Taylor, P. (2002). Interpretive analysis of an internet-based course https://doi.org/10.1016/j.iheduc.2005.06.009
constructed using a new courseware tool called Moodle. 2nd Conf. HERDSA (The Oldehinkel, A. J., & Ormel, J. (2023). Annual research review: Stability of
High. Educ. Res. Dev. Soc. Australas., 1–9 [Online]. Available: http://online.dimitra. psychopathology: Lessons learned from longitudinal population surveys. Journal of
gr/sektrainers/file.php/1/MartinDougiamas.pdf. Child Psychology and Psychiatry, 64(4), 489–502.
Dragulescu, B., Bucos, M., & Vasiu, R. (2015). Cvla: Integrating multiple analytics Palmer, S. (2013). Modelling engineering student academic performance using academic
techniques in a custom Moodle report (Vol. 538). analytics. International Journal of Engineering Education, 29, 132–138.
Fei, M., & Yeung, D.-Y. (2015). Temporal models for predicting student dropout in Przegalinska, A., & Jemielniak, D. (2023). Strategizing AI in business and education:
massive open online courses. In 2015 IEEE international conference on data mining Emerging technologies and business strategy.
workshop (ICDMW) (pp. 256–263). https://doi.org/10.1109/ICDMW.2015.174 Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
https://doi.org/10.1007/BF00116251
17
M. Shoaib et al. Computers in Human Behavior 158 (2024) 108301
Rish, I. (2001). An empirical study of the naïve Bayes classifier. IJCAI 2001 Work Empir Shoaib, M., Shah, B., Sayed, N., Ali, F., Ullah, R., & Hussain, I. (2023c). Deep learning for
Methods Artif Intell, 3. plant bioinformatics: An explainable gradient-based approach for disease detection.
Romero, C., Ventura, S., & García, E. (2008). Data mining in course management Frontiers of Plant Science, 14(October), 1–17. https://doi.org/10.3389/
systems: Moodle case study and tutorial. Computer Education, 51(1), 368–384. fpls.2023.1283235
https://doi.org/10.1016/j.compedu.2007.05.016 Shu, X., & Ye, Y. (2023). Knowledge Discovery: Methods from data mining and machine
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and learning. Social Science Research, 110, Article 102817.
organization in the brain. Psychological Review, 65(6), 386–408. https://doi.org/ Singh, J., Ali, F., Gill, R., Shah, B., & Kwak, D. (2023). A survey of EEG and machine
10.1037/h0042519. American Psychological Association, US. learning-based methods for neural rehabilitation. IEEE Access, 11, 114155–114171.
Rothes, A., Lemos, M. S., & Gonçalves, T. (2022). The influence of students’ self- https://doi.org/10.1109/ACCESS.2023.3321067
determination and personal achievement goals in learning and engagement: A Starbuck, C. (2023). Data preparation. In The fundamentals of people analytics: With
mediation model for traditional and nontraditional students. Education in Science, 12 applications in R (pp. 79–95). Cham: Springer International Publishing.
(6), 369. Staver, J. R. (1998). Constructivism: Sound theory for explicating the practice of science
The Royal Society is collaborating with JSTOR to digitize, preserve, and extend access to and science teaching. J. Res. Sci. Teach. Off. J. Natl. Assoc. Res. Sci. Teach., 35(5),
Philosophical Transactions (1683-1775). ® www.jstor.org,” Bone, p. 1775. 501–520.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by Talebi, K., Torabi, Z., & Daneshpour, N. (2023). Ensemble models based on CNN and
back-propagating errors. Nature, 323(6088), 533–536. LSTM for dropout prediction in MOOC. Expert Systems with Applications, Article
Saadati, Z., Zeki, C. P., & Vatankhah Barenji, R. (2023). On the development of 121187.
blockchain-based learning management system as a metacognitive tool to support Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent
self-regulation learning in online higher education. Interactive Learning Environments, research. Review of Educational Research, 45(1), 89–125. https://doi.org/10.3102/
31(5), 3148–3171. 00346543045001089
Salah, M., Al Halbusi, H., & Abdelfattah, F. (2023). May the force of text data analysis be Ullah, W., Ullah, A., Haq, I. U., Muhammad, K., Sajjad, M., & Baik, S. W. (2021). CNN
with you: Unleashing the power of generative AI for social psychology research. features with bi-directional LSTM for real-time anomaly detection in surveillance
Comput. Hum. Behav. Artif. Humans, Article 100006. networks. Multimedia Tools and Applications, 80(11), 16979–16995. https://doi.org/
Sharma, S., & Guleria, K. (2022). A deep learning based model for the detection of 10.1007/s11042-020-09406-3
pneumonia from chest X-ray images using VGG-16 and neural networks. Procedia Wang, J., Xu, Z., Zheng, X., & Liu, Z. (2022). A fuzzy logic path planning algorithm based
Computer Science, 218, 357–366. https://doi.org/10.1016/j.procs.2023.01.018 on geometric landmarks and kinetic constraints. Information Technology and Control,
Shoaib, M., et al. (2022a). A deep learning-based model for plant lesion segmentation , 51(3), 499–514. https://doi.org/10.5755/j01.itc.51.3.30016
subtype identi fi cation , and survival probability estimation (pp. 1–15). https://doi.org/ Xing, W., Guo, R., Petakovic, E., & Goggins, S. (2015). Participation-based student final
10.3389/fpls.2022.1095547. December. performance prediction model through interpretable Genetic Programming:
Shoaib, M., Shah, B., Hussain, T., Yang, B., Ullah, A., Khan, J., & Ali, F. (2023a). A deep Integrating learning analytics, educational data mining and theory. Computers in
learning-assisted visual attention mechanism for anomaly detection in videos. Human Behavior, 47, 168–181. https://doi.org/10.1016/j.chb.2014.09.034
Multimedia Tools and Applications. , Article 0123456789. https://doi.org/10.1007/ Yadusky, K., Kheang, S., & Hoggan, C. (2021). Helping underprepared students succeed:
s11042-023-17770-z Minimizing threats to identity. Community College Journal of Research and Practice, 45
Shoaib, M., Shah, B., EI-Sappagh, S., Ali, A., Ullah, A., Alenezi, F., Gechev, T., (6), 423–436.
Hussain, T., & Ali, F. (2023b). An advanced deep learning models-based plant Zacharis, N. (2015). A multivariate approach to predicting student outcomes in web-
disease detection: A review of recent research. Frontiers of Plant Science, 14(March), enabled blended learning courses. The Internet and Higher Education, 27. https://doi.
1–22. https://doi.org/10.3389/fpls.2023.1158933 org/10.1016/j.iheduc.2015.05.002
Shoaib, M., Hussain, T., Shah, B., & Park, S. H. (2022b). Deep learning-based segmentation Zhang, X., Feng, Z., & Zhang, X. (2023). On reachable set problem for impulse switched
and classi fi cation of leaf images for detection of tomato plant disease (pp. 1–18). singular systems with mixed delays. IET Control Theory & Applications, 17(5),
https://doi.org/10.3389/fpls.2022.1031748. October. 628–638. https://doi.org/10.1049/cth2.12390
Shoaib, M., Sayed, N., Amara, N., Latif, A., Azam, S., & Muhammad, S. (2022c). Zheng, L., Wang, C., Chen, X., Song, Y., Meng, Z., & Zhang, R. (2023). Evolutionary
Prediction of an educational institute learning environment using machine learning machine learning builds smart education big data platform Data-driven higher
and data mining. Education and Information Technologies, (123456789)https://doi. education. Applied Soft Computing, 136, Article 110114.
org/10.1007/s10639-022-10970-4
18