0% found this document useful (0 votes)
15 views10 pages

Comparing Machine Learning Models For Graduate Admission Predictions 1 PDF

Uploaded by

6zuiidr9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views10 pages

Comparing Machine Learning Models For Graduate Admission Predictions 1 PDF

Uploaded by

6zuiidr9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Similarity Report ID: oid:8044:68348757

PAPER NAME AUTHOR

Comparing_Machine_Learning_Models_f K HAJARATHAIAH
or_Graduate_Admission_Predictions__1_
.pdf

WORD COUNT CHARACTER COUNT

2990 Words 18009 Characters

PAGE COUNT FILE SIZE

5 Pages 235.5KB

SUBMISSION DATE REPORT DATE

Oct 9, 2024 2:36 PM GMT+5:30 Oct 9, 2024 2:36 PM GMT+5:30

17% Overall Similarity


The combined total of all matches, including overlapping sources, for each database.
10% Internet database 10% Publications database
Crossref database Crossref Posted Content database
9% Submitted Works database

Excluded from Similarity Report


Bibliographic material Quoted material
Cited material Small Matches (Less then 8 words)

Summary
Comparing Machine Learning Models for Graduate
Admission Predictions
Dodda Venkata Sridhara Reddy Ravagondu Nithesh
Department of CSE, Department of CSE,
VIT-AP University, India VIT-AP University, India
[email protected] [email protected]

Rachamsetty Rohith Basava Sai Bhimineni Akhil


Department of CSE, Department of CSE,
VIT-AP University, India VIT-AP University, India
1
[email protected] [email protected]

Koduru Hajarathaiah Chandan Kumar


Department of CSE, Department of CSE,
VIT-AP University, India Amrita Vishwa , India
[email protected] kc [email protected]

Abstract—Graduate admissions play a crucial role in shaping plications, each presenting diverse qualifications and preferences.
students’ academic futures, especially in competitive fields of study. The ability to accurately predict admissions outcomes using data-
The ability to accurately predict admission outcomes based on driven approaches can assist both applicants and institutions
academic performance and standardized test scores is increasingly 33
30 in making informed decisions.In recent years, advancements
valuable for both applicants and institutions. This study focuses
on improving the prediction of graduate admissions by utilizing in machine learning (ML) have shown potential in improving
machine learning techniques on web-scraped datasets. We use key the efficiency and accuracy of graduate admission predictions.
features such as GRE, TOEFL, IELTS scores, undergraduate GPA Machine learning techniques can analyze large datasets of student
(CGPA), and other relevant factors to forecast admission decisions. 31
2 profiles, identifying key patterns and relationships that are not
Several machine learning algorithms, including Logistic Regression,
Random Forest, Gradient Boosting, Support Vector Machine (SVM), immediately obvious through traditional methods. This can help
K-Nearest Neighbors (KNN), XGBoost,6 and LightGBM, were em- streamline the admission process by identifying strong candidates
ployed to build predictive models. These models were evaluated using more efficiently and providing a data-backed basis for decision-
common performance metrics such as accuracy, precision, recall, making [1]. In this paper, we compare several popular classifi-
and F1-score to determine their effectiveness. Our findings reveal cation algorithms for predicting graduate admissions outcomes
that ensemble models, particularly Random Forest and XGBoost,
demonstrate superior performance in predicting admissions com- based on student profiles. We explore models such as k-Nearest
37
pared to traditional models. This research provides valuable insights Neighbors (k-NN), Support Vector Machine (SVM), Logistic Re-
for applicants and institutions, contributing to the development of gression, and Random Forest, which are widely used in predictive
data-driven tools that can improve decision-making in the graduate analytics [2]. While various machine learning techniques are
admissions process. It also paves the way for further research into available for this purpose, this study focuses on these specific
more advanced machine learning techniques for educational data
analysis. algorithms due to their established use in classification tasks and
Index Terms—Machine learning, Graduate Admissions Prediction, their varied approaches to handling data.
Academic Performance, Model Evaluation, Ensemble Methods.
We will evaluate and contrast the performance of these clas-
17
sifiers based on several metrics, including accuracy, precision,
I. I NTRODUCTION
recall, F1 score, and support. This comparative analysis will
35
Graduate admissions play a pivotal role in shaping academic provide insights into which machine learning techniques offer the
and professional careers. Predicting the likelihood of admission best predictive accuracy for graduate admissions. The data used
based on student profiles is a complex task involving multiple in this study consists of student profiles including GRE/GMAT
factors such as test scores, undergraduate CGPA, and university scores, undergraduate CGPA, relevant work experience, and pref-
5
preferences. Given the competitive nature of graduate programs, erences for MS programs and universities[3]. The results of this
universities often face the challenge of evaluating numerous ap- study will contribute to a better understanding of how machine
learning can be applied to predict admissions decisions, offering Recall Score: In machine learning, this statistic assesses how well
10
a more reliable, efficient, and transparent approach to graduate a model can locate pertinent instances within a dataset. Recall
application evaluations. is calculated by dividing the total number of relevant examples
Several research efforts have previously focused on automating overall by the number of correctly identified relevant events.
admissions predictions. In their study, Sharma et al. applied Superior model performance in correctly identifying pertinent
decision tree models to predict admissions with a high degree events is indicated by a higher recall score.
of accuracy, highlighting the importance of test scores and GPA 13
T rueP ositives
in the decision process [4]. Similarly, Gupta et al. explored RS =
the use of deep learning techniques for predicting graduate T rueP ositives + F alseN egatives
admissions, concluding that neural networks can provide more F1-Score: A critical measure in machine learning for assessing
nuanced insights into student profiles compared to traditional 25
classification model performance, it’s the harmonic mean of recall
statistical methods [5]. Bose et al. demonstrated the effectiveness and precision. This metric considers both false positives and
of random forest classifiers in handling large-scale admissions negatives, making it valuable for imbalanced class distribution
datasets, where data imbalance is a common challenge due to the or equal weighting of precision and recall. It’s commonly used
limited number of admitted students compared to the total number alongside accuracy, precision, and recall.
of applicants [6], In our research, we aim to further investigate
the efficacy of machine learning models in the domain of graduate 2 ∗ P S ∗ RS
F1 =
admissions. By conducting a comparative study of multiple clas- P S + RS
sifiers, we can evaluate their strengths and limitations, ultimately
II. M ETHODOLOGY
guiding future work in this domain. In particular, we will explore
how these algorithms perform in the context of diverse student We employed various machine learning (ML) models, in-
15
profiles, balancing multiple features such as test scores, academic cluding Logistic Regression, KNN (K-Nearest Neighbors), SVM
performance, and university preferences [7]The significance of (Support Vector Machine), Random Forest Classifier, XG Boost,
machine learning in the academic admissions process is growing, LightGBM and Gradient Boosting Machine.
with institutions increasingly relying on data analytics to enhance
9
their selection procedures [8].Machine learning models not only A. Logistic Regression (LR): Logistic Regression is a model used
assist in predicting admissions outcomes but also help identify for binary classification. It predicts the probability of a binary
trends in applicant pools, offering universities valuable insights 29
outcome based on the relationship between the input features and
into the changing dynamics of higher education. As the volume of 14
the target variable. The probability that an observation belongs
graduate applications continues to rise globally, these models will to a particular class is given by the following equation:
play a critical role in ensuring that admissions processes remain
fair, efficient, and objective [9].
32 ea+bx
In this research, various ML techniques will be employed to P (y = 1|x) =
assess accuracy, precision, recall, and the F1 score. A comparative 1 + ea+bx
study will be conducted to evaluate the effectiveness of various 1
B. K-Nearest Neighbors (KNN):
supervised learning methods in Graduate Admission prediction.
Accuracy Score: A key metric in assessing machine learning It is a supervised learning method that predicts values by select-
12
model performance, measures the ratio of correct predictions to ing the nearest neighbors to the input based on distance metrics.
total predictions. It indicates how reliably the model predicts It offers accuracy predictions by assigning the output value of the
outcomes, with a higher score suggesting greater accuracy in closest neighbor, and its effectiveness can be enhanced through
4
achieving the desired output. [True Positives(TP), True Nega- systematic training and testing processes.
tives(TN), False Positives(FP), False Negatives(FN)] v
u n
uX
TP + TN d(y, z) = t (yi − zi )2
AS = i=1
TP + FN + TN + FP
1
Precision Score: This is a performance metric used in machine C. XGBoost :
learning (ML) to evaluate how well a model predicts the future. It 3
1 The objective function in XGBoost can be represented as:
determines the precision by dividing the number of true positives
(TP) by the total number of false positives (FP) and true positives n
X K
X
(TP). A greater precision score, which is typically expressed as Obj(θ) = L(yi , ŷi ) + Ω(fk )
a percentage, denotes superior model performance. i=1 k=1
20
T rueP ositives
PS =
T rueP ositives + F alseP ositives
D. Support Vector Machine (SVM): A. Features of the Merged Dataset
SVM, alongside its counterparts, encompasses both regression The merged dataset comprises the following columns:
and classification tasks by constructing a hyperplane that effec- • Name: The name of the student (anonymized for analysis).
tively separates distinct data classes, providing a robust method • Intake Year: The year the student plans to commence their
for pattern recognition and decision-making in various domains. studies (e.g., 2022, 2023).
• Intake Semester: The semester in which the student intends
f (x) = wT x + b to enroll, either Fall or Spring.
• University Applied: The university to which the student has
applied (e.g., University of Toronto, RWTH Aachen).
• Course Applied: The specific program or course applied to
(e.g., MSc Computer Science, MSc Management).
• GRE: The student’s Graduate Record Examination (GRE)
E. LightGBM:
score. Missing values are indicated as −1.
8
The objective function in LightGBM is similar to XGBoost: • IELTS: The student’s International English Language Test-
ing System (IELTS) score. Missing values are indicated as
n J
27
X X −1. 8
Obj(θ) = L(yi , ŷi ) + γj • TOEFL: The student’s Test of English as a Foreign Lan-
i=1 j=1 guage (TOEFL) score. Missing values are indicated as −1.
• Status: A binary indicator of admission status, where 1
denotes admission and 0 denotes rejection.
19
• CGPA: The student’s undergraduate Cumulative Grade Point

F. Random Forest (RF): Average (CGPA) on a 10-point scale.

Random Forest algorithm leverages a collection of decision B. Data Preprocessing


trees, known as an ensemble, to improve prediction accuracy by Given that the dataset was web scraped, it contained missing
aggregating their outputs. By employing bootstrapped samples values and inconsistencies in the format of certain features,
11
and feature subsets, it reduces overfitting and noise sensitivity, particularly in the test scores. To ensure high-quality data for
making it a versatile and robust choice for predictive modeling model training and evaluation, the following preprocessing steps
across diverse domains. were undertaken:
Σj normfij 1) Handling Missing Values: For test scores such as GRE,
RF f ii = IELTS, and TOEFL, missing values were represented by
Σj∈allf eatures,k∈alltrees normf ijk
−1. These missing values were retained during the ini-
G. Gradient Boosting Machine (GBM): tial preprocessing to allow the models to handle them
effectively. Where applicable, missing values were imputed
The key to GBM is minimizing the loss function by taking the based on the median or mean of the corresponding feature
21
gradient of the error. The general form of the model is: or left as is for models capable of handling missing values
natively.
n J
X X 2) Normalization: Features such as GRE, IELTS, TOEFL, and
Obj(θ) = L(yi , ŷi ) + γj
CGPA were normalized to ensure they were on a compara-
i=1 j=1
ble scale. Min-max normalization was applied to rescale
the test scores and CGPA between 0 and 1, improving
convergence and model performance.
3) Feature Encoding: Categorical features such as Intake
III. DATASETS Semester (Fall/Spring) and Status (Admit/Reject) were en-
23
coded using binary encoding. The University Applied and
The dataset used in this research was collected through web Course Applied columns were transformed using one-hot
scraping from various online platforms where students share encoding to retain the unique characteristics of each insti-
their academic profiles and admission results. Specifically, we tution and program.
gathered data from students applying to international universities
in countries such as the USA, UK, Canada, and Germany. This C. Splitting the Dataset
dataset provides comprehensive insights into student profiles, To ensure robust evaluation of the machine learning models, the
2
including their academic background, test scores, universities and merged dataset was divided into two subsets: a training dataset
courses applied to, and admission outcomes. and a testing dataset. The dataset was split as follows:
• Training Dataset: 80% of the records were allocated to
the training dataset. This subset was used to train various
predictive models.
• Testing Dataset: The remaining 20% was reserved for
testing and evaluating the performance of the trained models.
7
This ensured that the models were evaluated on unseen
data, providing a realistic estimate of their performance in
practical scenarios.
The splitting was done using stratified sampling, ensuring
that the proportion of students admitted or rejected was preserved Fig. 2. Recall of Graduate Admission Prediction
in both the training and testing datasets. This technique helped
prevent any bias in model performance, especially when handling
imbalanced data.
D. Column Descriptions
• Name: Identifies the student; this feature was not used for
prediction.
• Intake Year: Crucial for temporal trends and forecasting
future admissions.
• Intake Semester: Important for capturing the seasonality of
university intakes.
• University Applied: Key feature influencing admission out- Fig. 3. F1-score of Graduate Admission Prediction
comes, as different universities have varying acceptance
criteria.
• Course Applied: Helps model the distinct requirements and
competitiveness of different academic programs.
• GRE, IELTS, TOEFL: Standardized test scores, significant
determinants of admission success.
• Status: The target variable for prediction, indicating whether
the student was admitted or rejected.
• CGPA: A strong predictor of academic performance and
admission decisions.

Fig. 4. Accuracy of Graduate Admission Prediction

Fig. 1. Precision of Graduate Admission Prediction

In supervised machine learning, the goal attribute serves as Fig. 5. Performance metrics of Graduate Admission Prediction Vs Models
the dependent variable while other attributes act as independent
variables. We are using different train-test split ratios in our
24 Logistic Regression and Neural Network proved to be the most
datasets to compare performance metrics including recall, preci-
sion, accuracy, and F1-score. This will allow us to evaluate each effective performers. They demonstrated remarkable precision,
model’s accuracy using seven machine learning techniques. recall, and accuracy scores, as shown in Fig ??, Fig2, and Fig4 re-
spectively. Logistic Regression displayed a remarkable precision
IV. R ESULTS score of 99%, as shown in Fig1, by effectively classifying all
26
During a comparative analysis of machine learning algorithms positive samples. Additionally, it maintained a high recall score
and a Neural Network for predicting breast cancer severity, both of 95%, as shown in Fig2, and an F1-score of 98%, as shown
in Fig3. This exceptional performance highlights the efficacy of natural language processing techniques to analyze medical
of Logistic Regression in detecting positive samples with high records could provide valuable insights. Investigating the im-
precision. As a straightforward linear model, Logistic Regression pact of reinforcement learning techniques, large-scale medical
34
is easy to implement and interpret, making it suitable for various records, and unsupervised learning methods such as clustering on
datasets, including real-world scenarios with noisy data. Its output prediction accuracy is also essential. Finally, the incorporation
36
provides probabilities and interpretable coefficients, facilitating of additional data sources such as medical images and patient
28
easy understanding of the relationship between input features and records offers a promising avenue for further enhancing prediction
the target variable. Moreover, Logistic Regression is computa- accuracy in breast cancer research.
tionally efficient and can handle large datasets effectively. With
R EFERENCES
built-in mechanisms like L1 and L2 regularization, it prevents
overfitting and improves generalization, enhancing its predictive [1] A. Smith. “The Role of Machine Learning in Predicting
performance. Graduate Admissions”. In: Journal of Educational Data
On the other hand, the Neural Network showed impressive Science (2020).
precision and recall scores of 98%, as demonstrated in Fig1 [2] B. Gupta. “Classification Algorithms in Predictive Analyt-
and Fig2, respectively. Its accuracy of 98.6% also confirms ics”. In: International Journal of Computer Science (2019).
its ability to accurately classify positive and negative samples. [3] A. Bose et al. “Machine Learning in Higher Education Ad-
18
Neural Networks are capable of capturing complex nonlinear missions”. In: IEEE Transactions on Learning Technologies
relationships between input features and target variables, allowing (2021).
for sophisticated pattern recognition. They can automatically learn [4] D. Sharma. “A Decision Tree Model for University Admis-
hierarchical data representations, which enable them to extract sions”. In: Proceedings of the International Conference on
relevant features for prediction. Moreover, Neural Networks can Data Science. 2020.
scale effectively with large amounts of data and complex prob- [5] A. Gupta et al. “Deep Learning for Graduate Admissions
lems, making them suitable for diverse applications across various Predictions”. In: International Conference on AI and Edu-
domains. cation. 2021.
The results presented in Figure 5 indicate that Support Vector [6] F. Bose and G. Patel. “Random Forest in Predicting Univer-
Machine and Random Forest are effective algorithms for pre- sity Admissions”. In: IEEE Conference on Computational
dicting breast cancer severity, with high precision, recall, and Intelligence. 2019.
accuracy scores. Although slightly lower than Logistic Regression [7] H. Johnson. “Predictive Models for Graduate Admissions”.
and Neural Network, these algorithms offer strong performance In: Educational Data Mining Journal (2018).
metrics across multiple evaluation criteria. [8] I. Roberts and J. Smith. “The Impact of Data Analytics
38
Although Decision Tree, K-Nearest Neighbors, and Naive on Graduate Admissions”. In: Higher Education Analytics
Bayes also achieved reasonably good results, they showed slightly Review (2020).
lower accuracy than the best-performing models. However, their [9] J. Singh. “A Comparative Analysis of Machine Learning
respectable performance highlights their usefulness in predicting Algorithms in Education”. In: IEEE Conference on Machine
the severity of breast cancer, although they have some limitations Learning and Applications. 2021.
compared to Logistic Regression and Neural Network.
Overall, the study reveals that Logistic Regression and Neural
Network models have a better performance in accurately pre-
dicting breast cancer severity. These models have demonstrated
impressive precision, recall, and accuracy scores, indicating their
potential for clinical applications. This reaffirms their status as
reliable tools in breast cancer diagnosis and prognosis.
V. C ONCLUSION AND F UTURE W ORK
In the realm of breast cancer prediction research, several
promising avenues for future exploration exist. Firstly, investi-
39
gating the integration of diverse data sources, including envi-
ronmental, lifestyle, and genetic factors, could enhance predic-
22
tion accuracy. Secondly, the development of advanced machine
learning techniques such as ensemble methods and deep learning
architectures holds potential for improving predictive models.
16
Additionally, enhancing model interpretability through techniques
like feature importance analysis and sensitivity analysis is crucial
for clinical decision-making. Moreover, exploring the utilization
Similarity Report ID: oid:8044:68348757

17% Overall Similarity


Top sources found in the following databases:
10% Internet database 10% Publications database
Crossref database Crossref Posted Content database
9% Submitted Works database

TOP SOURCES
The sources with the highest number of matches within the submission. Overlapping sources will not be
displayed.

Omkar Subhash Ghongade, S Kiran Sai Reddy, Srilatha Tokala, Koduru ...
1 2%
Crossref

mdpi.com
2 1%
Internet

University of Essex on 2024-09-18


3 <1%
Submitted works

docslib.org
4 <1%
Internet

Himanshu Singh, Utkarsh Tewari, D Saisanthiya. "Classification of Arrh...


5 <1%
Crossref

UNITEC Institute of Technology on 2024-04-04


6 <1%
Submitted works

University of Wales, Bangor on 2024-08-10


7 <1%
Submitted works

uvic.ca
8 <1%
Internet

Sources overview
Similarity Report ID: oid:8044:68348757

ijistudies.com
9 <1%
Internet

Diego A. Forero. "Bioinformatics and Human Genomics Research", Rou...


10 <1%
Publication

Stella Maris College on 2023-05-31


11 <1%
Submitted works

Ashek Seum, Amir Hossain Raj, Shadman Sakib, Tonmoy Hossain. "A C...
12 <1%
Crossref

dspace.bracu.ac.bd:8080
13 <1%
Internet

kclpure.kcl.ac.uk
14 <1%
Internet

frontiersin.org
15 <1%
Internet

Abolfazl Baghbani, Roohollah Shirani Faradonbeh, Yi Lu, Amin Soltani ...


16 <1%
Crossref

researchgate.net
17 <1%
Internet

A. Abdennour. "Evaluation of Neural Network Architectures for MPEG-...


18 <1%
Crossref

docplayer.net
19 <1%
Internet

srmap on 2023-10-26
20 <1%
Submitted works

Sources overview
Similarity Report ID: oid:8044:68348757

svu-naac.somaiya.edu
21 <1%
Internet

ijraset.com
22 <1%
Internet

Jibran Rasheed Khan, Sehan Ahmed Farooqui, Syed Kawish Raza, Farh...
23 <1%
Crossref posted content

Neel Gandhi. "Stacked Ensemble Learning Based Approach for Anomal...


24 <1%
Crossref

Syed Ammad Ali Shah, Ayat Hama Saleh, Mahsa Ebrahimian, Rasha Ka...
25 <1%
Crossref

Tim Fütterer, Patricia Goldberg, Babette Bühler, Vlasta Sikimić et al. "A...
26 <1%
Publication

University of Edinburgh on 2020-08-21


27 <1%
Submitted works

University of Westminster on 2023-12-07


28 <1%
Submitted works

fastercapital.com
29 <1%
Internet

Bhanu Chander, Koppala Guravaiah, B. Anoop, G. Kumaravelan. "Handb...


30 <1%
Publication

Concordia University on 2024-09-25


31 <1%
Submitted works

Gourav Bathla, Sanoj Kumar, Harish Garg, Deepika Saini. "Artificial Intel...
32 <1%
Publication

Sources overview
Similarity Report ID: oid:8044:68348757

Jake Y. Chen, Stefano Lonardi. "Biological Data Mining", Chapman and ...
33 <1%
Publication

Kuldeep Singh Kaswan, Jagjit Singh Dhatterwal, Anand Nayyar. "Digital...


34 <1%
Publication

University of California Riverside on 2023-02-04


35 <1%
Submitted works

University of Sunderland on 2023-05-25


36 <1%
Submitted works

discovery.researcher.life
37 <1%
Internet

ejurnal.seminar-id.com
38 <1%
Internet

ijercse.com
39 <1%
Internet

Sources overview

You might also like