0% found this document useful (0 votes)
16 views15 pages

Neema Mduma - Machine Learning Approach For Reducing

The paper discusses a machine learning approach to reduce student dropout rates, focusing on four supervised learning classifiers applied to the Uwezo Annual Learning Assessment datasets from Tanzania. The study highlights the significance of hyperparameter tuning and various sampling techniques in improving predictive performance, with logistic regression and multilayer perceptron showing the best results. The research aims to provide data-driven algorithm recommendations to enhance understanding and solutions for student dropout issues in developing countries.

Uploaded by

Amani Rukoijo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views15 pages

Neema Mduma - Machine Learning Approach For Reducing

The paper discusses a machine learning approach to reduce student dropout rates, focusing on four supervised learning classifiers applied to the Uwezo Annual Learning Assessment datasets from Tanzania. The study highlights the significance of hyperparameter tuning and various sampling techniques in improving predictive performance, with logistic regression and multilayer perceptron showing the best results. The research aims to provide data-driven algorithm recommendations to enhance understanding and solutions for student dropout issues in developing countries.

Uploaded by

Amani Rukoijo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

The Nelson Mandela AFrican Institution of Science and Technology

NM-AIST Repository https://dspace.mm-aist.ac.tz


Computational and Communication Science Engineering Research Articles [CoCSE]

2019-05-06

Machine learning approach for reducing


students dropout rates

Mduma, Neema
International Journal of Advanced Computer Research

http://dx.doi.org/10.19101/IJACR.2018.839045
Provided with love from The Nelson Mandela African Institution of Science and Technology
International Journal of Advanced Computer Research, Vol 9(42)
ISSN (Print): 2249-7277 ISSN (Online): 2277-7970
Research Article
http://dx.doi.org/10.19101/IJACR.2018.839045

Machine learning approach for reducing students dropout rates


Neema Mduma1*, Khamisi Kalegele2 and Dina Machuve3
Research Scholar, The Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania1
Lecturer and Researcher, The Tanzania Commission of Science and Technology, Arusha, Tanzania2
Lecturer and Researcher, The Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania3

Received: 29-November-2018; Revised: 31-January-2019; Accepted: 05-February-2019


©2019 Neema Mduma et al. This is an open access article distributed under the Creative Commons Attribution (CC BY) License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract
School dropout is a widely recognized serious issue in developing countries. On the other hand, machine learning
techniques have gained much attention on addressing this problem. This paper, presents a thorough analysis of four
supervised learning classifiers that represent linear, ensemble, instance and neural networks on Uwezo Annual Learning
Assessment datasets for Tanzania as a case study. The goal of the study is to provide data-driven algorithm
recommendations to current researchers on the topic. Using three metrics: geometric mean, F-measure and adjusted
geometric mean, we assessed and quantified the effect of different sampling techniques on the imbalanced dataset for
model selection. We further indicate the significance of hyper parameter tuning in improving predictive performance.
The results indicate that two classifiers: logistic regression and multilayer perceptron achieve the highest performance
when over-sampling technique was employed. Furthermore, hyper parameter tuning improves each algorithm's
performance compared to its baseline settings and stacking these classifiers improves the overall predictive performance.

Keywords
Machine learning (ML), Imbalanced learning classification, Secondary education, Evaluation metrics.

1.Introduction Despite several efforts done by previous researchers,


Reducing student dropout rates are one of the there are still challenges which need to be addressed.
challenges faced by many school districts globally. A Most of the widely used datasets are generated from
growing body of literature indicates high rates of developed countries. However, developing countries
students drop-out of school, especially in the are facing several challenges on generating public
developing world [1]. In addressing this problem, datasets to be used in addressing this problem. The
machine learning techniques have gained much study conducted by Mgala, used the primary data
attention in recent years [2−4]. This is attributed by collected from primary schools in Kenya, although
the fact that machine learning provides a convenient the dataset is not publicly available [1]. The
way to solve student dropout problems and delivering limitation of public datasets from developing
good guarantees for the solutions [5, 6]. To this end, countries brought the need to develop more datasets
a substantial amount of literature that focuses on from different geographical locations. However, cost
predicting student dropout has been presented [7−9]. and time must be acquired to accommodate the data
Different machine learning techniques such as collection process. Besides, to the knowledge of
decision tree [2]; artificial neural networks, matrix researchers, there are only few studies which have
factorization [3, 5, 10−12], Deep neural network [13], been conducted in developing countries. Thus,
probabilistic graphical models [14, 15] and survival further research is needed to explore the value of
analysis [7] have been applied to develop predictive machine learning algorithms in cubing dropout in the
algorithms. context of developing countries.

Other approaches such as time series clustering [16, Machine learning techniques have been applied in
17] were presented to perform clustering, which are various platforms such as a massive open on-line
extensively used in recommender systems [3]. course (MOOC). MOOC platforms such as Coursera
and the edX is among popular used platforms for
*Author for correspondence student dropout prediction [9] and other learning

156
International Journal of Advanced Computer Research, Vol 9(42)

management system (LMS) such as Moodle [16]. On personalized linear multi-regression (PLMR) while
addressing the problem of student dropout, several matrix factorization-based methods associate
existing works have focused on supervised learning standard matrix factorization (MF) approach. The
algorithms such as a naive Bayesian algorithm, mentioned approach was applied to the dataset
association rules mining, artificial neural networks- generated from George Mason University (GMU)
based algorithm, logistic regression, CART, C4.5, transcript data, University of Minnesota (UMN)
J48, (BayesNet), SimpleLogistic, JRip, transcript data, UMN LMS data, and Stanford
RandomForest, Logistic regression analysis, ICRM2 University MOOC data [11]. However, one limitation
[6]. However, under the classification techniques, of the standard MF method is that it ignores the
Decision Tree is highly used by researchers due to its sequence in which the students have taken the
simplicity and comprehensibility to uncover small or various courses and as such the latent representation
large data structure and predict the value [2]. of a course can potentially be influenced by the
performance of the students in courses that were
Other techniques such as survival analysis, which taken afterward.
provides various mechanisms to handle such
censored data problems that arise in modeling such In this paper, we present a thorough analysis of four
longitudinal data which occurs ubiquitously in commonly used machine learning algorithms on
various real-world application domains were Uwezo data on learning (http://www.
presented [13]. Ameri et al, developed a survival twaweza.org/go/uwezo-datasets) in Tanzania with the
analysis framework for early prediction using the aim to provide data-driven algorithm
Cox proportional hazards model (Cox) and applied recommendations to current researchers on the topic.
time-dependent Cox (TD-Cox), which captures the This is publicly available national wide dataset in
time-varying factors and can leverage this Tanzania which was generated from developing
information to provide more accurate prediction of country and therefore reflects the local context. Using
student dropout using the dataset of students enrolled new sources of student level dataset from Tanzania as
at Wayne State University (WSU) starting from 2002 a case study, we employ a comprehensive validation
until 2009 [7]. Besides, other researchers proposed a and enhancement to existing algorithms and apply
new data transformation model, which is built upon additional machine learning approaches to improve
the summarized data matrix of link-based cluster their predictive power. Specifically, we take a
ensembles (LCE) using educational dataset obtained detailed analysis of selected popular algorithms and
from the operational database system at Mae Fah analyse their performance on the dataset by first
Luang University, Chiang Rai, Thailand. Like several applying data pre-processing and feature engineering
existing dimension reduction techniques such as techniques which are very critical states for building
principal component analysis (PCA) and kernel high performance dropout prediction algorithm. This
principal component analysis (KPCA), this method was followed by a rigorous comparison of selected
aims to achieve high classification accuracy by machine learning algorithms using evaluation
transforming the original data to a new form. methods as proposed by [18] in which the best
However, the common limitation of these new performing algorithms were selected. Further, we
techniques is the demanding time complexity, such empirically quantified the effect of hyper parameters
that it may not scale up well to a very large dataset. (i.e. algorithm parameters) tuning and ensemble
Whilst worst-case traversal time (WCT-T) is not techniques for the selected algorithms with an aim to
quite for a highly time-critical application, it can be further improve their performance.
an attractive candidate for those qualities-led works,
such as the identification of those students at risk of In summary, the main objective of this study focused
underachievement [5]. Furthermore, matrix on applying machine learning techniques for
factorization as a clustering machine learning method predicting student dropout. In order to attain this
that can accommodate framework with some objective, the following three tasks were performed
variations was presented [10]. In this study, two respectively:
classes of methods for building the prediction models
were described. The first class builds these models by  Building the model and analyzed the performance.
using linear regression approaches and the second  Tuning models that performed well and employed
class builds these models by using matrix ensemble approach to improve the predictive
factorization approaches. Regression-based methods performance.
describe course-specific regression (CSpR) and

157
Neema Mduma et al.

 Evaluated model performance using three metrics: In this dataset, we identified the following columns
geometric mean (Gm), F-measure (Fm) and have missed values as described in Table 1: Pupil
adjusted geometric mean (AGm). Teacher Ratio (PTR), Pupil Classroom Ratio (PCR),
Girl's Pupil Latrines Ratio (GPLR), Boy's Pupil
2.Materials and methods Latrines Ratio (BPLR), Parent Teacher Meeting
2.1Dataset descriptions and pre-processing Ratio (PTMR), Main source of household income
Data pre-processing includes data cleaning, (Income), Enumeration Area type (EAarea), Parent
normalization, transformation, feature extraction and who check his/her child's exercise book once in a
selection, etc., and the product of data pre-processing week (PCCB), Parent who discuss his/her child's
is the final training set. In selection, relevant target progress with teacher last term (PTD), Student who
data are selected from retained data (typically very did read any book with his/her parent in last week
noisy) and subsequently pre-processed. This goes (SPB), School has girl's privacy room (SGR),
hand in hand with the integration from multiple Household meals per day (MLPD). On handling
sources, filtering irrelevant content and structuring of missing values, PTR, PCR, GPLR, BPLR were
data according to a target tool [19]. In developing a imputed with medians and PTMR, Income, EAarea,
generalized algorithm, data pre-processing can often PCCB, PTD, SPB, SGR, MLPD were imputed with
have a significant impact. Based on the nature of zeros. We encoded the nominal features to conform
datasets in many domains, it is well known that data with Scikit-learn and change the dropout code: 1 to
preparation and filtering steps take a considerable represent not drop and 0 to represent dropout.
amount of processing time in ML problems.
The dataset consists of 18 features as described in
In this paper, Uwezo data on learning at the country Table 2 and approximately 61340 samples. Since our
level in Tanzania which was collected in 2015 was target variable is dropout, we checked the distribution
used. This dataset was collected by Twaweza of this variable in the dataset and observed that there
organization with the aim of assessing children’s was an imbalance for target variable with only 1.6%
learning levels across hundreds of thousands of dropout as shown in Figure 1. Data imbalance is a
households in East Africa. The dataset was cleaned serious problem which can be considered during pre-
by removing information from the data that could processing stage [21]. This happens when one class
lead to individuals or specific villages being located is under-represented relative to another [22, 23].
by end-users. Village id column was removed, since Classification of imbalance dataset is common in
it was not required in experimental stage. Various the field of student retention, mainly because the
approaches have been identified in handling missing number of registered students is always larger than
values, outliers, data and numeric values [20]. In this the dropout students. Several re-sampling
study, we converted data samples to numerical values techniques such as under-sampling, over-sampling
and performed PCA for handling outliers. Missing and hybrid methods can be applied to handle this
values were replaced using medians and zeros. problem [24].

Table 1 Features with missing values


No. Feature description Type of data
1. Main source of household income (Income) Multinominal
2. Boy's pupil latrines ratio (BPLR) Numerical
3. School has girl's privacy room (SGR) Binominal
4. Parent who check his/her child's exercise book once in a week (PCCB) Binominal
5. Household meals per day (MLPD) Multinominal
6. Student who did read any book with his/her parent in last week (SPB) Binominal
7. Parent who discuss his/her child's progress with teacher last term (PTD) Binominal
8. Enumeration Area type (EAarea) Multinominal
9. Girl's pupil latrines ratio (GPLR) Numerical
10. Parent teacher meeting ratio (PTMR) Numerical
11. Pupil classroom ratio (PCR) Numerical
12. Pupil teacher ratio (PTR) Numerical

158
International Journal of Advanced Computer Research, Vol 9(42)

Number of students (1 = not-dropout, 0 = dropout)


70000
60000
Number of students

50000
40000
30000
20000
10000
0
1 0
Dropout distribution

Figure 1 Dropout distribution of the dataset

Number of students (1 = not-dropout, 0 = dropout)


700
600
Number of students

500
400
300
200
100
0
0 1
Dropout distribution

Figure 2 Dropout distributions (under-sampling)

Under-sampling is a non-heuristic method that aims Technique) combines both under-sampling and over-
at creating a subset of the original dataset by sampling approaches [30, 31]. In this case, no
eliminating instances until the remaining number of sampling, under-sampling and over sampling
examples is roughly the same as that of the minority (SMOTE-ENN) and random under sampling
class [25, 26]. Over-sampling method create a technique as implemented in Imbalanced-Learn were
superset of the original data set by replicating some applied as demonstrated in Figure 2 and 3. SMOTE-
instances or creating new instances from existing ENN combines over and under sampling using
ones until the number of selected examples, plus the SMOTE and edited nearest neighbour (EN) to
original examples of the minority class is roughly generate more minority class in order to reinforce its
equal to that of the majority class [27−29]. Hybrids signal [32] and random under sampler is a fast
method such as synthetic minority oversampling and easy way to balance the minority class by
(SMOTE). randomly selecting a subset of data for the
targeted classes [33].

159
Neema Mduma et al.

Number of students (1 = not-dropout, 0 = dropout)


36000
35800
35600
Number of students

35400
35200
35000
34800
34600
34400
34200
34000
33800
0 1
Dropout distribution

Figure 3 Dropout distributions (SMOTE-ENN)

Table 2 Summary of all features


No. Feature description Type of data
1. Main source of household income (Income) Multinominal
2. Boy's Pupil Latrines Ratio (BPLR) Numerical
3. School has girl's privacy room (SGR) Binominal
4. Region Nominal
5. District Nominal
6. Village Nominal
7. Student gender (Gender) Binominal
8. Parent who check his/her child's exercise book once in a week (PCCB) Binominal
9. Household meals per day (MLPD) Multinominal
10. Student who did read any book with his/her parent in last week (SPB) Binominal
11. Parent who discuss his/her child's progress with teacher last term (PTD) Binominal
12. Student age (Age) Numerical
13. Enumeration Area type (EAarea) Multinominal
14. Household size (HHsize) Numerical
15. Girl's Pupil Latrines Ratio (GPLR) Numerical
16. Parent Teacher Meeting Ratio (PTMR) Numerical
17. Pupil Classroom Ratio (PCR) Numerical
18. Pupil Teacher Ratio (PTR) Numerical

2.2Feature selection The score was intended to measure the impact of


Feature selection is one of the useful approaches in each feature on the model performance by permuting
data pre-processing for finding accurate data models the values of each feature and measuring how much
[34]. The experiments aim at identifying the the permutation decreases the model performance. In
contribution of each feature on the prediction this experiment, important measures were randomly
performance by automatically selecting features that computed by permutations of feature value
are most relevant to the dropout predictive modeling. depending on the contribution of each feature to the
This was accomplished by measuring permutation of predictive performance of a model. This was
the feature importance score (pfi) as defined in followed by measuring the deviation after permuting
Equation 1. the values of that feature using Gm as an evaluation
metric. The overall experiment was conducted by
(1) creating a random permutation of feature through the
Where: shuffling of values and evaluates model performance,
 is the base performance metric score which was done iteratively for each feature, one at a
 is the performance metric score after shuffling time to observe the list of the feature variables and
their corresponding importance scores. The

160
International Journal of Advanced Computer Research, Vol 9(42)

importance score is defined as the reduction in dropout prediction performance. Thereafter, the same
performance after shuffling the feature values. When experiment was repeated using six well performed
evaluation metric was used to measure the accuracy features obtained in the previous experiment. The
of the prediction, a higher value implies the feature is results show clearly that a student's gender has a
more important. strong contribution to the dropout prediction
performance as presented in Figure 5. These
The results presented in Figure 4, show that student experimental results, support researchers' findings on
gender (Gender), PCCB, MLPD, SPB, PTD and dropout rate with gender association [35].
Student age (Age) have a strong contribution to the

14
Feature selection score (%)

12
10
8
6
4
2
0

Features

Figure 4 Feature selection with all features

25
Feature selection score (%)

20

15

10

0
Gender PCCB MLPD
Best features

Figure 5 Feature selection with best features

2.3Experimental procedures model performance. This was followed by the second


In this study, the dataset was separated into a train experiment of tuning the well performed models and
(60%), validation (20%) and test (20%). We applied employed ensemble techniques in order to improve
the sampling techniques to the training set and the predictive performance. We then combined train
conducted the first experiment of building the model. and validation sets formulate a big train set and
The model was built using 60% of training set and applied sampling technique to this train set.
then 20% of validation set was used to validate the Thereafter, we evaluate the model using unseen 20%

161
Neema Mduma et al.

of test set in order to observe how the model will repeated 5 times using different train/test/validation
behave in a real environment which is an imbalance. partitions of the data set. This cross-validation
The overall experimental procedure is summarized in procedure divides the data set into 5 roughly equal
Figure 6 wherein each experiment stratified k-fold parts. For each part, it trains the model using the four
cross validation was used. In this experiment, k=5 remaining parts and computes the test error by
fold out-of-bag overall cross validation was used and classifying the given part. Finally, the results for the
the entire process involves executing all selected five test partitions were averaged.
classification algorithms in which all executions were

Figure 6 Experiment procedure

2.4Evaluation metrics
In measuring the performance of student dropout AGm = {
algorithms, several researchers use various evaluation
metrics [1, 7, 8]. With respect to evaluation (4)
measures, we used Gm, Fm and AGm as evaluation Where:
criteria. The Gm is a measure of the ability of a  TN is true negative, TP is true positive, FN is false
classifier to balance TPrate (sensitivity) and TNrate negative and FP is false positive.
(specificity) [36] as defined in Equation 2. This  TPrate = the percentage of positive
measure is maximum when the true positive rate instances correctly classified.
(TPrate) and the true negative rate (TNrate) are  TNrate = the percentage of negative
equal. Furthermore, in order to ensure TPrate to
the changes in the positive predictive value instances correctly classified.
(precision) than in TPrate, Fm was used as  Positive predictive value (PPV) =
defined in Equation 3. This is the weighted
harmonic mean of the TPrate and precision [7, 3.Results
37, 38]. Besides, AGm as defined in Equation 4 3.1Experiment 1: model selection
was used to obtain the highest TPrate without The aim of this experiment was to identify
decreasing too much the TNrate [18]. classifier with the best performance for this
Gm = √ (2) problem. In this phase, selection of classifiers was
Fm = 2PPV. (3) based with all domains, including linear, ensemble,
instance and neural network classifiers with
consideration of the classification and nature of the
162
International Journal of Advanced Computer Research, Vol 9(42)

dataset. Linear models were represented by a actual dataset which is an imbalance. From the
logistic regression classifier (LR), ensemble models result presented in Figure 7 and 10, two classifiers:
were represented by random forest (RF), instance LR and MLP show better generalization results.
model was represented by K-Nearest-Neighbors They show better validation results for the three
(KNN) and neural networks models were metrics used. Considering the case when under-
represented by a multilayer perceptron (MLP). The sampling is used as observed in Figure 8 and 11,
experiment was repeated for three different cases: all classifiers have considerably the same
SMOTE-ENN and results in both three cases are generalization results for both metrics. The
presented. Results were presented on a separate experiment conducted without sampling as
graph based on the scale of evaluation metrics. For presented in Figure 9 and 12 reveal that, only LR
GM and Fm metrics with scale range between 0 classifier show better performance than others.
and 1, the results were combined in the same However, the score rates are less than 1 for AGm as
graphs (Figure 7-9), while AGm metric with scale compared to when LR is used with oversampling
range above 0 were presented in a separate graph case. Therefore, for the next experiment the
(Figure 10-12). To select the best classifiers, following two classifiers: LR and MLP were
validation results were considered because it gives considered with oversampling case.
an estimate on how the classifier will perform on

0.9
0.8
0.7
Accuracy (%)

0.6
0.5
0.4 Gm
0.3 Fm
0.2
0.1
0
KNN LR RF MLP
Classifiers

Figure 7 Validation results (over-sampling)

0.9
0.8
0.7
Accuracy (%)

0.6
0.5
0.4 Gm
0.3 Fm
0.2
0.1
0
KNN LR RF MLP
Classifiers

Figure 8 Validation results (under-sampling)

163
Neema Mduma et al.

0.7

0.6

0.5
accuracy (%)

0.4

0.3 Gm

0.2
Fm

0.1

0
KNN LR RF MLP
Classifiers

Figure 9 Validation results (no sampling)

1.4
1.2
1
Accuracy (%)

0.8
0.6
AGm
0.4
0.2
0
KNN LR RF MLP
Classifiers

Figure 10 Validation results (over-sampling)

1.4

1.2

1
Accuracy (%)

0.8

0.6
AGm
0.4

0.2

0
KNN LR RF MLP
Classifiers

Figure 11 Validation results (under-sampling)


164
International Journal of Advanced Computer Research, Vol 9(42)

1
0.9
0.8
0.7
Accuracy (%)

0.6
0.5
0.4 AGm
0.3
0.2
0.1
0
KNN LR RF MLP
Classifiers

Figure 12 Validation results (no sampling)

3.2Experiment 2: Hyper-parameter optimization performance of the models. This method is one of


This experiment aims to show the significance of the popular approaches for improving machine
hyper-parameter tuning on improving predictive learning algorithms.
performance. It involved a combination of hyper-
parameter values for a machine learning algorithm Table 3 Model parameters
that performs the best as measured on a validation Classifier Parameter
dataset. Most ML algorithms contain several hyper LR fit intercept: True, tol:1,
parameters that can affect performance C:0.001, Penalty:’l1’
significantly (for example, the number hidden MLP solver:’adam’, learning rate
layers in MLP classifier) [39, 40]. In this int:0.001, shuffle: True, hidden
layer size:10, alpha:1, early
experiment two selected classifiers were tuned: LR
stopping: True
and MLP to further improve their performance. The
grid search approach was employed so as to set a
The ensemble approach creates multiple models
grid of hyper parameter values and for each
and then combines them to produce improved
combination, train a model and score on the
results. Several ensemble techniques such as
validation data. In order to evaluate each
bagging, boosting and voting have been extensively
combination of hyper parameter values, we scored
used in the literature [41, 42]. For this problem, the
them on a validation set. Hyper-parameter tuning
voting ensemble technique was appropriate. Voting
via cross-validation was implemented using 5-fold
(stacking) was employed by soft combined the two
cross validation and identified the best parameters
tuned classifiers LR2 and MLP2. The tuned
for each classifier as presented in Table 3.
classifiers where then trained on the new training
set obtained by combining validation and training
The experimental results allow us to measure the
set used in previous experiments. On evaluating the
extent to which hyper parameter tuning improves
generalization performance, models were tested on
each algorithm's performance compared to its
unseen tested data and evaluation of the models
baseline settings. Ensemble technique was
was done by comparing validation results.
employed in order to improve the overall predictive
Table 4 Experiment 2: results
LR LR2 MLP MLP2 ENB
Validation scores Gm 0.724 0.726 0.613 0.711 0.735
AGm 1.261 1.372 1.211 1.324 1.370
Fm 0.841 0.894 0.723 0.827 0.891
Test scores Gm 0.721 0.783 0.621 0.706 0.779
AGm 1.320 1.332 1.278 1.281 1.335
Fm 0.823 0.831 0.726 0.732 0.847

165
Neema Mduma et al.

Results presented in Table 4 reveal that, dropout problem and hyper parameter tuning
performance of the tuned algorithms (LR2 and improves algorithm performance. Compared to the
MLP2) was improved compared to untuned results presented by [2] as described in Table 5, J48
algorithms (LR and MLP). Furthermore, the showed better results on proposing student advising
stacking classifier (ENB) shows considerably better model for enhancing students’ academic
validation and test results followed by the tuned performance and decreasing dropout. Three
logistic regression model (LR2). decision tree classification algorithms, namely J48,
random tree and reduces error pruning (REP) tree
4.Discussion were used in a real dataset representing students'
Although a number of literatures have shown the records in a managerial higher institute in Giza
feasibility of explaining student dropout, few works Egypt. The approach used in our presented study,
have actually attempted to predict student dropout. focused on analyzing four supervised learning
In this study, we use machine learning techniques classifiers that represent linear, ensemble, instance
that are able to automatically identify features that and neural networks rather than focusing only on
are relevant. With the right model, it was possible decision tree classification algorithms.
to predict students’ dropout as well as explain the
variables that are likely to be useful in the Furthermore, on investigating prediction algorithm
prediction. We achieved this by employing the for academic performance on tackling the problem
ensemble classifier that tends to do better than a of student dropout [1]. LR achieved the highest
single individual classifier. This classifier which performance on comparison results of classification
was produced by soft combining the tuned LR2 and performance for all the six classifiers which are LR,
MLP2 achieved better results followed by the tuned MLP, sequential minimal optimization (SMO),
LR2. The machine learning approach of combining naive Bayes (NB), J48 and RF using six metrics on
multiple classifiers has been proposed for the dataset collected from rural and peri-urban
improving predictive performance [43], and primary schools in Kenya as shown in Table 6. LR
generates better results [44]. Furthermore, we achieved better results in our presented study.
observed student gender as the leading feature
which shows high contribution to the student

Table 5 Classification results for three algorithms [Comparison from [2]]


Algorithm Time (sec) Model evaluation
Correctly classified Incorrectly classified
# % # %
J48 0.05 7081 87.64 999 12.36
Random tree 0.02 7065 87.43 1015 12.56
REP tree 0.03 7065 87. 44 1015 12.56

Table 6 A comparison of the classifiers' performance using the six selected metrics [Comparison from [1]]
Model Recall Specificity ROC F-Measure Kappa RMSE
LR 0.924 0.686 0.887 0.897 0.6345 0.3375
MLP 0.873 0.660 0.851 0.865 0.5407 0.4124
SMO 0.911 0.703 0.807 0.894 0.6309 0.3893
NB 0.701 0.801 0.846 0.784 0.4403 0.4264
J48 0.905 0.670 0.822 0.884 0.5941 0.3720
RF 0.907 0.684 0.870 0.888 0.6082 0.3471

5.Conclusions researchers who wish to apply machine learning


In this paper, a case study has been presented that algorithms to their data with consideration of the
shows application of machine learning approach on data imbalanced problem. The two classifiers LR
addressing the problem of student dropout. Four and MLP have proven superior to all the other
supervised classification algorithms were classifiers by achieving highest performance
empirically assessed on a set of approximately metrics when over-sampling technique was
61340 supervised classification dataset in order to employed. Furthermore, the results show that the
provide a contemporary set of recommendations to hyper-parameter tuning improves each algorithm’s

166
International Journal of Advanced Computer Research, Vol 9(42)

performance compared to its baseline settings and [8] Aulck L, Velagapudi N, Blumenstock J, West J.
stacking these classifiers improves the overall Predicting student dropout in higher education. arXiv
predictive performance. Also, the contribution of preprint arXiv:1606.06364. 2016.
each feature on the prediction performance with [9] Chen Y, Chen Q, Zhao M, Boyer S, Veeramachaneni
K, Qu H. DropoutSeer: visualizing learning patterns in
student gender being the leading feature was massive open online courses for dropout reasoning
shown. For future work, we plan to explore and prediction. In conference on visual analytics
different datasets so as to show and compare results science and technology 2016 (pp. 111-20). IEEE.
of different train, test and validation and evaluate [10] Hu Q, Polyzou A, Karypis G, Rangwala H. Enriching
several imbalance techniques for student dropout course-specific regression models with content
prediction using more measures for results features for grade prediction. In international
comparison. This will include extending the conference on data science and advanced analytics
experiment by applying under sampling approach 2017 (pp. 504-13). IEEE.
with penalized models on resolving the imbalance [11] Elbadrawy A, Polyzou A, Ren Z, Sweeney M, Karypis
G, Rangwala H. Predicting student performance using
issue. Besides, we will generalize the study and add personalized analytics. Computer. 2016; 49(4):61-9.
more features so as to evaluate feature subsets for [12] Iqbal Z, Qadir J, Mian AN, Kamiran F. Machine
better understanding of the underlying process. learning based student grade prediction: a case study.
arXiv preprint arXiv:1708.08744. 2017.
Acknowledgment [13] Wang W, Yu H, Miao C. Deep model for dropout
The authors would like to thank the African Development prediction in MOOCS. In proceedings of the
Bank (AfDB), Data for Local Impact (DLi), Eagle international conference on crowd science and
Analytics Company, Late Dr. Yaw-Nkansah Gyekye and engineering 2017 (pp. 26-32). ACM.
Anthony Faustine for supporting this study. [14] Hamedi A and Dirin A. A Bayesian approach in
students' performance analysis. International
Conflicts of interest conference on education and new learning
The authors have no conflicts of interest to declare. technologies. 2018.
[15] https://icsh.es/2017/11/12/i-congreso-internacional-
References multidisciplinario-de-educacion-superior/. Accessed
[1] Mgala M. Investigating prediction modelling of 26 October 2018.
academic performance for students in rural schools in [16] Hung JL, Wang MC, Wang S, Abdelrasoul M, Li Y,
He W. Identifying at-risk students for early
Kenya (Doctoral dissertation, University of Cape
interventions-a time-series clustering approach. IEEE
Town). 2016. Transactions on Emerging Topics in Computing.
[2] Mohamed MH, Waguih HM. A proposed academic 2017; 5(1):45-55.
advisor model based on data mining classification [17] Młynarska E, Greene D, Cunningham P. Time series
techniques. International Journal of Advanced clustering of MOODLE activity data. In Irish
Computer Research. 2018; 8(36):129-36. conference on artificial intelligence and cognitive
[3] KH, Van Der Schaar M. A machine learning approach science University College Dublin, Dublin, Ireland,
for tracking and predicting student performance in 2016.
degree programs. IEEE Journal of Selected Topics in [18] Yan J, Han S. Classifying imbalanced data sets by a
Signal Processing. 2017; 11(5):742-53. novel re-sample and cost-sensitive stacked
[4] Feng W, Tang J, Liu TX. Understanding dropouts in generalization method. Mathematical Problems in
MOOCs. Association for the Advancement of Engineering.2018.
Artificial Intelligence. 2019. [19] Alasadi SA, Bhaya WS. Review of data preprocessing
[5] Iam-On N, Boongoen T. Generating descriptive model techniques in data mining. Journal of Engineering and
for student dropout: a review of clustering approach. Applied Sciences. 2017; 12(16):4102-7.
Human-Centric Computing and Information Sciences. [20] Shahul S, Suneel S, Rahaman MA, Swathi JN. A
2017; 7(1). study of data pre-processing techniques for machine
[6] Kumar M, Singh AJ, Handa D. Literature survey on learning algorithm to predict software effort
educational dropout prediction. International Journal estimation. Imperial Journal of Interdisciplinary
of Education and Management Engineering. 2017; Research. 2016; 2(6):546-50.
7(2):8-19. [21] Krawczyk B. Learning from imbalanced data: open
[7] Ameri S, Fard MJ, Chinnam RB, Reddy CK. Survival challenges and future directions. Progress in Artificial
analysis based framework for early prediction of Intelligence. 2016; 5(4):221-32.
student dropouts. In proceedings of the ACM [22] Galar M, Fernández A, Barrenechea E, Bustince H,
international on conference on information and Herrera F. New ordering-based pruning metrics for
knowledge management 2016 (pp. 903-12). ACM. ensembles of classifiers in imbalanced datasets. In
proceedings of the international conference on

167
Neema Mduma et al.

computer recognition systems 2016 (pp. 3-15). [36] Márquez‐Vera C, Cano A, Romero C, Noaman AY,
Springer, Cham. Mousa Fardoun H, Ventura S. Early dropout
[23] Borowska K, Topczewska M. New data level prediction using data mining: a case study with high
approach for imbalanced data classification school students. Expert Systems. 2016; 33(1):107-24.
improvement. In proceedings of the international [37] Rovira S, Puertas E, Igual L. Data-driven system to
conference on computer recognition systems 2015 (pp. predict academic grades and dropout. PLoS one. 2017;
283-94). Springer, Cham. 12(2):e0171207.
[24] Rout N, Mishra D, Mallick MK. Handling imbalanced [38] Aulck L, Aras R, Li L, L'Heureux C, Lu P, West J.
data: a survey. In international proceedings on STEM-ming the tide: predicting STEM attrition using
advances in soft computing, intelligent systems and student transcript data. arXiv preprint
applications 2018 (pp. 431-43). Springer, Singapore. arXiv:1708.09344. 2017.
[25] Saini AK, Nayak AK, Vyas RK. ICT Based [39] Rojas-Domínguez A, Padierna LC, Valadez JM, Puga-
Innovations. Proceedings of CSI. 2015. Soberanes HJ, Fraire HJ. Optimal hyper-parameter
[26] Dattagupta SJ. A performance comparison of tuning of SVM classifiers with application to medical
oversampling methods for data generation in diagnosis. IEEE Access. 2018; 6:7164-76.
imbalanced learning tasks (Doctoral dissertation). [40] Probst P, Wright MN, Boulesteix AL.
2017. Hyperparameters and tuning strategies for random
[27] Stefanowski J. On properties of undersampling forest. Wiley Interdisciplinary Reviews: Data Mining
bagging and its extensions for imbalanced data. In and Knowledge Discovery. 1804:e1301.
proceedings of the international conference on [41] Dalvi PT, Vernekar N. Anemia detection using
computer recognition systems 2016 (pp. 407-417). ensemble learning techniques and statistical models.
Springer, Cham. In international conference on recent trends in
[28] Moreno MF. Comparing the performance of electronics, information & communication technology
oversampling techniques for imbalanced learning in 2016 (pp. 1747-51). IEEE.
insurance fraud detection (Doctoral dissertation). [42] Feng W, Huang W, Ren J. Class imbalance ensemble
2017. learning based on the margin theory. Applied
[29] Santoso B, Wijayanto H, Notodiputro KA, Sartono B. Sciences. 2018; 8(5):815.
Synthetic over sampling methods for handling class [43] Abuassba AO, Zhang D, Luo X, Shaheryar A, Ali H.
imbalanced problems: a review. In IOP conference Improving classification performance through an
series: earth and environmental science 2017 (p. advanced ensemble based heterogeneous extreme
012031). IOP Publishing. learning machines. Computational Intelligence and
[30] Skryjomski P, Krawczyk B. Influence of minority Neuroscience. 2017.
class instance types on SMOTE imbalanced data [44] Afolabi LT, Saeed F, Hashim H, Petinrin OO.
oversampling. In first international workshop on Ensemble learning method for the prediction of new
learning with imbalanced domains: theory and bioactive molecules. PloS one. 2018; 13(1):e0189538.
applications 2017 (pp. 7-21).
[31] Ahmed S, Mahbub A, Rayhan F, Jani R, Shatabda S, Neema Mduma is a PhD fellow in the
Farid DM. Hybrid methods for class imbalance department of Information and
learning employing bagging with sampling techniques. Communication Science and
In international conference on computational systems Engineering (ICSE) at the Nelson
and information technology for sustainable solution Mandela African Institution of Science
2017 (pp. 1-5). IEEE. and Technology (NM-AIST). Her focus
[32] Douzas G, Bacao F. Geometric SMOTE: effective is on supporting education and
oversampling for imbalanced learning through a currently she is conducting a study on
geometric extension of SMOTE. arXiv preprint developing a machine learning approach for predicting
arXiv:1709.07377. 2017. student dropout.
[33] Elhassan T, Aljurf M. Classification of imbalance data Email: [email protected]
using tomek link (T-Link) combined with random
under-sampling (RUS) as a data reduction method. Khamisi Kalegele is a Lecturer and
Global Journal of Technology and Optimization. 2016, Researcher at the Tanzania
S1: 111. Commission of Science and
[34] Khaldy MA, Kambhampati C. Resampling Technology (COSTECH). He
imbalanced class and the effectiveness of feature graduated with a PhD in Information
selection methods for heart failure dataset. Sciences from Tohoku University,
International Robotics & Automation Journal. 2018; Japan in 2013; MEng in Computer
4(1):1-10. Science from Ehime University in
[35] Kim D, Kim S. Sustainable education: analyzing the Japan and BSc Computer Engineering and IT from
determinants of university student dropout by University of Dar Es Salaam. His research areas are Data
nonlinear panel data models. Sustainability. 2018; Science, E-health and Machine Learning in Education.
10(4):1-18. Email: [email protected]
168
International Journal of Advanced Computer Research, Vol 9(42)

Dina Machuve is a Lecturer and


Researcher at the Nelson Mandela
African Institution of Science and
Technology (NM-AIST) in Tanzania.
She graduated with a PhD in
Information and Communication
Science and Engineering from NM-
AIST in 2016, and with a MS in
Electrical Engineering from Tennessee Technological
University, USA in 2008 and BSc Electrical Engineering
degree from the University of Dar Es Salaam in 2001. She
serves on the organizing committee of Data Science Africa,
an organization that runs an annual data science and
machine learning summer school and workshop in Africa.
Her research interests are Data Science, Bioinformatics,
Agriculture Informatics on Food Value Chains and STEM
Education in Schools.
Email: [email protected]

169

You might also like