Comparing Machine Learning Models For Graduate Admission Predictions 1 PDF
Comparing Machine Learning Models For Graduate Admission Predictions 1 PDF
Comparing_Machine_Learning_Models_f K HAJARATHAIAH
or_Graduate_Admission_Predictions__1_
.pdf
5 Pages 235.5KB
Summary
Comparing Machine Learning Models for Graduate
Admission Predictions
Dodda Venkata Sridhara Reddy Ravagondu Nithesh
Department of CSE, Department of CSE,
VIT-AP University, India VIT-AP University, India
[email protected] [email protected]
Abstract—Graduate admissions play a crucial role in shaping plications, each presenting diverse qualifications and preferences.
students’ academic futures, especially in competitive fields of study. The ability to accurately predict admissions outcomes using data-
The ability to accurately predict admission outcomes based on driven approaches can assist both applicants and institutions
academic performance and standardized test scores is increasingly 33
30 in making informed decisions.In recent years, advancements
valuable for both applicants and institutions. This study focuses
on improving the prediction of graduate admissions by utilizing in machine learning (ML) have shown potential in improving
machine learning techniques on web-scraped datasets. We use key the efficiency and accuracy of graduate admission predictions.
features such as GRE, TOEFL, IELTS scores, undergraduate GPA Machine learning techniques can analyze large datasets of student
(CGPA), and other relevant factors to forecast admission decisions. 31
2 profiles, identifying key patterns and relationships that are not
Several machine learning algorithms, including Logistic Regression,
Random Forest, Gradient Boosting, Support Vector Machine (SVM), immediately obvious through traditional methods. This can help
K-Nearest Neighbors (KNN), XGBoost,6 and LightGBM, were em- streamline the admission process by identifying strong candidates
ployed to build predictive models. These models were evaluated using more efficiently and providing a data-backed basis for decision-
common performance metrics such as accuracy, precision, recall, making [1]. In this paper, we compare several popular classifi-
and F1-score to determine their effectiveness. Our findings reveal cation algorithms for predicting graduate admissions outcomes
that ensemble models, particularly Random Forest and XGBoost,
demonstrate superior performance in predicting admissions com- based on student profiles. We explore models such as k-Nearest
37
pared to traditional models. This research provides valuable insights Neighbors (k-NN), Support Vector Machine (SVM), Logistic Re-
for applicants and institutions, contributing to the development of gression, and Random Forest, which are widely used in predictive
data-driven tools that can improve decision-making in the graduate analytics [2]. While various machine learning techniques are
admissions process. It also paves the way for further research into available for this purpose, this study focuses on these specific
more advanced machine learning techniques for educational data
analysis. algorithms due to their established use in classification tasks and
Index Terms—Machine learning, Graduate Admissions Prediction, their varied approaches to handling data.
Academic Performance, Model Evaluation, Ensemble Methods.
We will evaluate and contrast the performance of these clas-
17
sifiers based on several metrics, including accuracy, precision,
I. I NTRODUCTION
recall, F1 score, and support. This comparative analysis will
35
Graduate admissions play a pivotal role in shaping academic provide insights into which machine learning techniques offer the
and professional careers. Predicting the likelihood of admission best predictive accuracy for graduate admissions. The data used
based on student profiles is a complex task involving multiple in this study consists of student profiles including GRE/GMAT
factors such as test scores, undergraduate CGPA, and university scores, undergraduate CGPA, relevant work experience, and pref-
5
preferences. Given the competitive nature of graduate programs, erences for MS programs and universities[3]. The results of this
universities often face the challenge of evaluating numerous ap- study will contribute to a better understanding of how machine
learning can be applied to predict admissions decisions, offering Recall Score: In machine learning, this statistic assesses how well
10
a more reliable, efficient, and transparent approach to graduate a model can locate pertinent instances within a dataset. Recall
application evaluations. is calculated by dividing the total number of relevant examples
Several research efforts have previously focused on automating overall by the number of correctly identified relevant events.
admissions predictions. In their study, Sharma et al. applied Superior model performance in correctly identifying pertinent
decision tree models to predict admissions with a high degree events is indicated by a higher recall score.
of accuracy, highlighting the importance of test scores and GPA 13
T rueP ositives
in the decision process [4]. Similarly, Gupta et al. explored RS =
the use of deep learning techniques for predicting graduate T rueP ositives + F alseN egatives
admissions, concluding that neural networks can provide more F1-Score: A critical measure in machine learning for assessing
nuanced insights into student profiles compared to traditional 25
classification model performance, it’s the harmonic mean of recall
statistical methods [5]. Bose et al. demonstrated the effectiveness and precision. This metric considers both false positives and
of random forest classifiers in handling large-scale admissions negatives, making it valuable for imbalanced class distribution
datasets, where data imbalance is a common challenge due to the or equal weighting of precision and recall. It’s commonly used
limited number of admitted students compared to the total number alongside accuracy, precision, and recall.
of applicants [6], In our research, we aim to further investigate
the efficacy of machine learning models in the domain of graduate 2 ∗ P S ∗ RS
F1 =
admissions. By conducting a comparative study of multiple clas- P S + RS
sifiers, we can evaluate their strengths and limitations, ultimately
II. M ETHODOLOGY
guiding future work in this domain. In particular, we will explore
how these algorithms perform in the context of diverse student We employed various machine learning (ML) models, in-
15
profiles, balancing multiple features such as test scores, academic cluding Logistic Regression, KNN (K-Nearest Neighbors), SVM
performance, and university preferences [7]The significance of (Support Vector Machine), Random Forest Classifier, XG Boost,
machine learning in the academic admissions process is growing, LightGBM and Gradient Boosting Machine.
with institutions increasingly relying on data analytics to enhance
9
their selection procedures [8].Machine learning models not only A. Logistic Regression (LR): Logistic Regression is a model used
assist in predicting admissions outcomes but also help identify for binary classification. It predicts the probability of a binary
trends in applicant pools, offering universities valuable insights 29
outcome based on the relationship between the input features and
into the changing dynamics of higher education. As the volume of 14
the target variable. The probability that an observation belongs
graduate applications continues to rise globally, these models will to a particular class is given by the following equation:
play a critical role in ensuring that admissions processes remain
fair, efficient, and objective [9].
32 ea+bx
In this research, various ML techniques will be employed to P (y = 1|x) =
assess accuracy, precision, recall, and the F1 score. A comparative 1 + ea+bx
study will be conducted to evaluate the effectiveness of various 1
B. K-Nearest Neighbors (KNN):
supervised learning methods in Graduate Admission prediction.
Accuracy Score: A key metric in assessing machine learning It is a supervised learning method that predicts values by select-
12
model performance, measures the ratio of correct predictions to ing the nearest neighbors to the input based on distance metrics.
total predictions. It indicates how reliably the model predicts It offers accuracy predictions by assigning the output value of the
outcomes, with a higher score suggesting greater accuracy in closest neighbor, and its effectiveness can be enhanced through
4
achieving the desired output. [True Positives(TP), True Nega- systematic training and testing processes.
tives(TN), False Positives(FP), False Negatives(FN)] v
u n
uX
TP + TN d(y, z) = t (yi − zi )2
AS = i=1
TP + FN + TN + FP
1
Precision Score: This is a performance metric used in machine C. XGBoost :
learning (ML) to evaluate how well a model predicts the future. It 3
1 The objective function in XGBoost can be represented as:
determines the precision by dividing the number of true positives
(TP) by the total number of false positives (FP) and true positives n
X K
X
(TP). A greater precision score, which is typically expressed as Obj(θ) = L(yi , ŷi ) + Ω(fk )
a percentage, denotes superior model performance. i=1 k=1
20
T rueP ositives
PS =
T rueP ositives + F alseP ositives
D. Support Vector Machine (SVM): A. Features of the Merged Dataset
SVM, alongside its counterparts, encompasses both regression The merged dataset comprises the following columns:
and classification tasks by constructing a hyperplane that effec- • Name: The name of the student (anonymized for analysis).
tively separates distinct data classes, providing a robust method • Intake Year: The year the student plans to commence their
for pattern recognition and decision-making in various domains. studies (e.g., 2022, 2023).
• Intake Semester: The semester in which the student intends
f (x) = wT x + b to enroll, either Fall or Spring.
• University Applied: The university to which the student has
applied (e.g., University of Toronto, RWTH Aachen).
• Course Applied: The specific program or course applied to
(e.g., MSc Computer Science, MSc Management).
• GRE: The student’s Graduate Record Examination (GRE)
E. LightGBM:
score. Missing values are indicated as −1.
8
The objective function in LightGBM is similar to XGBoost: • IELTS: The student’s International English Language Test-
ing System (IELTS) score. Missing values are indicated as
n J
27
X X −1. 8
Obj(θ) = L(yi , ŷi ) + γj • TOEFL: The student’s Test of English as a Foreign Lan-
i=1 j=1 guage (TOEFL) score. Missing values are indicated as −1.
• Status: A binary indicator of admission status, where 1
denotes admission and 0 denotes rejection.
19
• CGPA: The student’s undergraduate Cumulative Grade Point
In supervised machine learning, the goal attribute serves as Fig. 5. Performance metrics of Graduate Admission Prediction Vs Models
the dependent variable while other attributes act as independent
variables. We are using different train-test split ratios in our
24 Logistic Regression and Neural Network proved to be the most
datasets to compare performance metrics including recall, preci-
sion, accuracy, and F1-score. This will allow us to evaluate each effective performers. They demonstrated remarkable precision,
model’s accuracy using seven machine learning techniques. recall, and accuracy scores, as shown in Fig ??, Fig2, and Fig4 re-
spectively. Logistic Regression displayed a remarkable precision
IV. R ESULTS score of 99%, as shown in Fig1, by effectively classifying all
26
During a comparative analysis of machine learning algorithms positive samples. Additionally, it maintained a high recall score
and a Neural Network for predicting breast cancer severity, both of 95%, as shown in Fig2, and an F1-score of 98%, as shown
in Fig3. This exceptional performance highlights the efficacy of natural language processing techniques to analyze medical
of Logistic Regression in detecting positive samples with high records could provide valuable insights. Investigating the im-
precision. As a straightforward linear model, Logistic Regression pact of reinforcement learning techniques, large-scale medical
34
is easy to implement and interpret, making it suitable for various records, and unsupervised learning methods such as clustering on
datasets, including real-world scenarios with noisy data. Its output prediction accuracy is also essential. Finally, the incorporation
36
provides probabilities and interpretable coefficients, facilitating of additional data sources such as medical images and patient
28
easy understanding of the relationship between input features and records offers a promising avenue for further enhancing prediction
the target variable. Moreover, Logistic Regression is computa- accuracy in breast cancer research.
tionally efficient and can handle large datasets effectively. With
R EFERENCES
built-in mechanisms like L1 and L2 regularization, it prevents
overfitting and improves generalization, enhancing its predictive [1] A. Smith. “The Role of Machine Learning in Predicting
performance. Graduate Admissions”. In: Journal of Educational Data
On the other hand, the Neural Network showed impressive Science (2020).
precision and recall scores of 98%, as demonstrated in Fig1 [2] B. Gupta. “Classification Algorithms in Predictive Analyt-
and Fig2, respectively. Its accuracy of 98.6% also confirms ics”. In: International Journal of Computer Science (2019).
its ability to accurately classify positive and negative samples. [3] A. Bose et al. “Machine Learning in Higher Education Ad-
18
Neural Networks are capable of capturing complex nonlinear missions”. In: IEEE Transactions on Learning Technologies
relationships between input features and target variables, allowing (2021).
for sophisticated pattern recognition. They can automatically learn [4] D. Sharma. “A Decision Tree Model for University Admis-
hierarchical data representations, which enable them to extract sions”. In: Proceedings of the International Conference on
relevant features for prediction. Moreover, Neural Networks can Data Science. 2020.
scale effectively with large amounts of data and complex prob- [5] A. Gupta et al. “Deep Learning for Graduate Admissions
lems, making them suitable for diverse applications across various Predictions”. In: International Conference on AI and Edu-
domains. cation. 2021.
The results presented in Figure 5 indicate that Support Vector [6] F. Bose and G. Patel. “Random Forest in Predicting Univer-
Machine and Random Forest are effective algorithms for pre- sity Admissions”. In: IEEE Conference on Computational
dicting breast cancer severity, with high precision, recall, and Intelligence. 2019.
accuracy scores. Although slightly lower than Logistic Regression [7] H. Johnson. “Predictive Models for Graduate Admissions”.
and Neural Network, these algorithms offer strong performance In: Educational Data Mining Journal (2018).
metrics across multiple evaluation criteria. [8] I. Roberts and J. Smith. “The Impact of Data Analytics
38
Although Decision Tree, K-Nearest Neighbors, and Naive on Graduate Admissions”. In: Higher Education Analytics
Bayes also achieved reasonably good results, they showed slightly Review (2020).
lower accuracy than the best-performing models. However, their [9] J. Singh. “A Comparative Analysis of Machine Learning
respectable performance highlights their usefulness in predicting Algorithms in Education”. In: IEEE Conference on Machine
the severity of breast cancer, although they have some limitations Learning and Applications. 2021.
compared to Logistic Regression and Neural Network.
Overall, the study reveals that Logistic Regression and Neural
Network models have a better performance in accurately pre-
dicting breast cancer severity. These models have demonstrated
impressive precision, recall, and accuracy scores, indicating their
potential for clinical applications. This reaffirms their status as
reliable tools in breast cancer diagnosis and prognosis.
V. C ONCLUSION AND F UTURE W ORK
In the realm of breast cancer prediction research, several
promising avenues for future exploration exist. Firstly, investi-
39
gating the integration of diverse data sources, including envi-
ronmental, lifestyle, and genetic factors, could enhance predic-
22
tion accuracy. Secondly, the development of advanced machine
learning techniques such as ensemble methods and deep learning
architectures holds potential for improving predictive models.
16
Additionally, enhancing model interpretability through techniques
like feature importance analysis and sensitivity analysis is crucial
for clinical decision-making. Moreover, exploring the utilization
Similarity Report ID: oid:8044:68348757
TOP SOURCES
The sources with the highest number of matches within the submission. Overlapping sources will not be
displayed.
Omkar Subhash Ghongade, S Kiran Sai Reddy, Srilatha Tokala, Koduru ...
1 2%
Crossref
mdpi.com
2 1%
Internet
docslib.org
4 <1%
Internet
uvic.ca
8 <1%
Internet
Sources overview
Similarity Report ID: oid:8044:68348757
ijistudies.com
9 <1%
Internet
Ashek Seum, Amir Hossain Raj, Shadman Sakib, Tonmoy Hossain. "A C...
12 <1%
Crossref
dspace.bracu.ac.bd:8080
13 <1%
Internet
kclpure.kcl.ac.uk
14 <1%
Internet
frontiersin.org
15 <1%
Internet
researchgate.net
17 <1%
Internet
docplayer.net
19 <1%
Internet
srmap on 2023-10-26
20 <1%
Submitted works
Sources overview
Similarity Report ID: oid:8044:68348757
svu-naac.somaiya.edu
21 <1%
Internet
ijraset.com
22 <1%
Internet
Jibran Rasheed Khan, Sehan Ahmed Farooqui, Syed Kawish Raza, Farh...
23 <1%
Crossref posted content
Syed Ammad Ali Shah, Ayat Hama Saleh, Mahsa Ebrahimian, Rasha Ka...
25 <1%
Crossref
Tim Fütterer, Patricia Goldberg, Babette Bühler, Vlasta Sikimić et al. "A...
26 <1%
Publication
fastercapital.com
29 <1%
Internet
Gourav Bathla, Sanoj Kumar, Harish Garg, Deepika Saini. "Artificial Intel...
32 <1%
Publication
Sources overview
Similarity Report ID: oid:8044:68348757
Jake Y. Chen, Stefano Lonardi. "Biological Data Mining", Chapman and ...
33 <1%
Publication
discovery.researcher.life
37 <1%
Internet
ejurnal.seminar-id.com
38 <1%
Internet
ijercse.com
39 <1%
Internet
Sources overview