0% found this document useful (0 votes)
17 views38 pages

Final Internship Report GAP

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views38 pages

Final Internship Report GAP

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Abstract

With the increase in the number of graduates who wish to pursue their education, it has become
more challenging to get admission for the students in their dream university. Usually, newly
graduate students are not knowledgeable of the requirements and the procedures of the
postgraduate admission and might spent a considerable amount of money to get advice from
consultancy organisations to help them identify their admission chances. Giving the limited
number of universities that can be considered by a human consultant, however, this approach
might be bias and inaccurate. Higher education in abroad universities generally means we have
many options like Canada, USA, UK Germany, Italy, Australia etc. But we are focusing on
only the students who want to do their Masters in America. Students who want to do masters
in America have to write GRE (Graduate Records Examination) and TOEFL (Test of English
as a Foreign Language). Once they have attended the exams they have to prepare their SOP
(statement of purpose) and LOR (letter of recommendation) which are one of the crucial factors
they have to consider. These LOR and SOP plays a vital role if the student was looking for any
scholarship. Prospective graduate students always face a dilemma deciding universities of their
choice while applying to master's programs. While there are a good number of predictors and
consultancies that guide a student, they aren't always reliable since decision is made on the
basis of select past admissions. So, with increasing demand of further education, one must not
be confused in where to apply. Then the students have to choose the universities they want to
study or apply, we cannot apply to all the universities that will lead to lot of application fees.
Here comes the problem that the student doesn’t know to which university he might get
admission. There are some online blogs which help in these matters but they are not that much
accurate and don't consider all the factors and there are some consultancy offices which will
take lot of our money and time and sometimes they will give some false information.so our
goal is to develop a model which will tell the students their chance of admission into a
respective university. This model should consider all the crucial factors which plays a vital role
in student admission process and should have high accuracy.

i
Table of Contents

Abstract ................................................................................................................................... i

Table of Contents ....................................................................................................................... ii

List of Figures ........................................................................................................................... iii

List of Tables ............................................................................................................................. iii

CHAPTER 1 .............................................................................................................................. 1

INTRODUCTION ...................................................................................................................... 1

1.1 Project Description ........................................................................................................... 2

1.2 Problem Statement ............................................................................................................ 2

1.3 Objectives ......................................................................................................................... 3

1.4 Scope ................................................................................................................................ 3

1.2 Company Profile ............................................................................................................... 4

CHAPTER 2 .............................................................................................................................. 6

LITERATURE SURVEY............................................................................................................ 6

2.1 Existing System ................................................................................................................ 7

2.2 Feasibility Study ............................................................................................................... 9

2.3 Tools and Technologies ................................................................................................... 10

CHAPTER 3 ............................................................................................................................ 12

SOFTWARE REQUIREMENTS SPECIFICATION ................................................................. 12

3.1 User specifications .......................................................................................................... 12

3.2 Functional Requirements ................................................................................................ 13

3.3 Interface Requirements ................................................................................................... 15

3.4 Non-Functional Requirements ........................................................................................ 16

CHAPTER 4 ............................................................................................................................ 18

DESIGN ................................................................................................................................... 18

ii
4.1 Data Flow ....................................................................................................................... 18

4.2 Sequence Diagram .......................................................................................................... 19

CHAPTER 5 ............................................................................................................................ 22

IMPLEMENTATION ............................................................................................................... 22

5.1 Code Snippet ................................................................................................................. 23

5.2 Snapshots........................................................................................................................ 25

CHAPTER 6 ............................................................................................................................ 32

CONCLUSION AND FUTURE ENHANCEMENTS ............................................................... 32

7.1 Conclusion ...................................................................................................................... 32

7.2 Future Enhancements ..................................................................................................... 32

BIBLIOGRAPHY .................................................................................................................... 34

List of Figures
Fig 3.1: Predicting the chance of admission .............................................................................. 15
Fig 4.1: Data Flow Diagram ..................................................................................................... 18
Fig 4.2: Sequence Diagram....................................................................................................... 19
Fig 5.1: Initial Exploration of Dataset ....................................................................................... 26
Fig 5.2: Data Preparation and Model Training .......................................................................... 27
Fig 5.3: Visualization of Feature Correlations ........................................................................... 28
Fig 5.4: Distribution of Admission Factors ............................................................................... 29
Fig 5.5: Frequency Distribution of Admission Probabilities ...................................................... 30
Fig 5.6: Scatter Plot of GRE Score and Chance of Admit .......................................................... 30

List of Tables

Table 3.1: First five Rows of the Dataset………………………………………………………..16


Table 6.1: Performance Analysis………………………………………………………………...33

iii
Graduate Admission Prediction using ML Techniques 1

CHAPTER 1
INTRODUCTION
For anyone pursuing their postgraduate studies, it would be difficult for them to find out what
college they may join, based on their GPA, Quants, Verbal, TOEFL and AWA Scores. People
may apply to many universities that look for candidates with a higher score set, instead of
applying to universities at which they have a chance of getting into. This would be detrimental
to their future. It is very important that a candidate should apply to colleges that he/she has a
good chance of getting into, instead of applying to colleges that they may never get into. There
aren’t many efficient ways to find out the colleges that one can get into, relatively quickly.
The Education Based Prediction System helps a person decide what colleges they can apply
to with their scores. The dataset that is used for processing consists of the following
parameters: University name, Quants and Verbal Scores (GRE) TOEFL and AWA Scores. The
GRE Test (Graduate Record Examinations) is a standardized test used by many universities
and graduate schools around the world as part of the graduate admissions process. Other
factors are also taken into consideration while applying to colleges, such as Letter of
Recommendation (a formal document that validates someone's work, skills or academic
performance), Statement of Purpose (a critical piece of a graduate school application that tells
admissions committees who you are, what your academic and professional interests are, and
how you'll add value to the graduate program you're applying to), Co-curricular activities and
Research papers as well (research papers from journals that are not well known or have a high
percentage of plagiarism are not taken into consideration for this case). When a person has
completed their undergraduate degree and wants to pursue a Postgraduate degree in a field of
their choice, more often than not, it is very confusing for the person to figure out what colleges
they should apply to with the scores that they have obtained in GRE and TOEFL, along with
their GPA at the time of their graduation. Many candidates may apply to colleges that do not
fall under their score requirements and hence waste a lot of time. Applying to many colleges
with scores also increases the cost. There are not many efficient methods that are available to
help address this issue and hence an Education Predictor System has been developed.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 2

In the system proposed, a person can enter their scores in the respective fields provided. The
system then processes the data entered and produces an output of the list of colleges that a
person could get into, with their scores. This is relatively quick and helps conserve time and
money. In order to achieve this we have proposed a novel method utilising Machine Learning
algorithms. To maximize the accuracy of our model, we have taken into consideration not one;
but several machine learning algorithms. These algorithms include Linear Regression,
Gradient Boosting and Random Forest. More about these algorithms will be covered in the
Algorithms section of this paper. These Algorithms are then compared and the algorithm
which has the best key performance indicators will be used to develop the Prediction System.
We also look forward to incorporate clustering of universities based on a profile and then
classifying them as less likely, highly likely acceptance etc.

1.1 Project Description


This study aims to contribute a valuable tool for both prospective students and educational
institutions, offering insights into the intricate dynamics of the admission process. Through
the lens of regression modeling, we delve into the significance of CGPA and GRE scores as
predictors for admission chances. The response variable, termed "Chance of Admit," serves
as the focal point for our predictive model.

1.2 Problem Statement


Educational organizations have always played an important and vital role in society for
development and growth of any individual. There are different college prediction apps and
websites being maintained contemporarily, but using the mistedious to some extent, due to
the lack of articulate information regarding colleges, and the time consumed in searching the
best deserving college.
The problem statement, hence being tackled, is to design a college prediction/prediction
system and to provide a probabilistic insight into college administration for overall rating, cut-
offs of the colleges, admission intake and preferences of students. Also, it helps students avoid
spending time and money on counsellor and stressful research related to finding a suitable
college.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 3

It has always been a troublesome process for students in finding the perfect university and
course for their further studies. At times they do know which stream they want to get into, but
it is not easy for them to find colleges based on their academic marks and other performances.
We aim to develop and provide a place which would give a probabilistic output as to how
likely it is to get into a university given upon their details.

1.3 Objectives
The objectives of graduate admissions prediction encompass to bring students closer to their
university of choice through a robust evaluation of their profiles. This paper contains
parameters that are all relevant for graduate admissions. Barring a few exceptional cases in
which a student may unexpectedly fetch an admit in a top school, most of the results are as
expected and give a fair idea about the selection criteria.
i. To contribute a valuable tool for both prospective students and educational institutions,
offering insights into the intricate dynamics of the admission process.
ii. To demonstrate how data-driven methodologies can improve decision-making for
applicants and institutions in competitive admission scenarios.
iii. To aid universities in efficiently filtering and evaluating applications, thereby optimizing
the admissions process.
iv. To compare various machine learning algorithms, evaluate their accuracy, and select the
most effective model for graduate admission prediction.

1.4 Scope
i. Applicability to Prospective Students: The project is designed to assist graduate school
applicants in assessing their chances of admission to specific programs based on their
academic and non-academic profiles. This insight can help them make informed decisions
regarding applications and improve their preparation strategies.
ii. Utility for Educational Institutions: The predictive system can be used by universities
and colleges to streamline the application evaluation process. It can act as a supportive
tool to pre-screen applications, thereby saving time and resources.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 4

iii. Data-Driven Decision Making: The project emphasizes the use of historical data and
machine learning algorithms to derive meaningful patterns and predictions, ensuring
objective and consistent evaluation.
iv. Versatility Across Fields: The model can be adapted to various graduate programs,
including technical, management, and research-oriented courses, provided the relevant data
is available.

1.2 Company Profile


STARVIC EDU TECH LLP, established in 2020, operates from its office on the first floor of
the Incubation and Placement Cell at KLE Institute of Technology, located opposite the airport
on Gokul Road, Hubli. The company employs a team of 11-20 dedicated professionals.
1.2.1 Organisation of Company
i. Different Departments: STARVIC EDU TECH LLP operates through various departments
to ensure smooth operations and quality deliverables:
ii. Research & Development (R&D): Focuses on creating innovative solutions and enhancing
existing programs.
iii. Training & Education: Designs and delivers high-quality educational programs,
internships, and industry-relevant courses.
iv. Sales & Marketing: Drives business growth by reaching out to colleges, students, and
professionals for partnerships and enrollments.
v. Technology: Responsible for the development, deployment, and maintenance of tech
solutions used in educational programs.
vi. Operations & Support: Ensures seamless execution of programs, manages the LMS and
addresses learner queries.

1.2.2 Working Domains of Company


1. Technologies Used:
STARVIC EDU TECH LLP utilizes cutting-edge technologies to provide high-quality training
and projects
i. Python (Core, Libraries: NumPy, Pandas, Matplotlib, SciPy, etc.)

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 5

ii. Machine Learning (Supervised & Unsupervised Learning)


iii. Data Science Tools (Jupyter Notebooks, Anaconda, etc.)
iv. No-Code Tools (Bubble, Glide, etc.)
v. GenAI & AI Tools (LLMs, APIs for integration, ChatGPT integration)

2. Application Domains Handled by the Company:


i. Education Technology (EdTech): Delivering online and hybrid training programs in Data
Science, GenAI, and No-Code Development.
ii. Project Development: Helping startups build and deploy real-world applications using no-
code tools and AI technologies.
iii. Career Mentorship: Providing career-oriented mentorship programs for students and
professionals.
1.2.3 Recent projects handled by company
i. Data Science Internship Training Program.
ii. No-Code Development Projects.
iii. GenAI Integration for EdTech Tools.
iv. Custom LMS Development.
v. Mentorship Platform for Career Guidance.

Organization of report
This chapter explains about the project introduction and also the basic functionalities of the
project and also implementation of the project and what are the basic terminologies used for
the working of the project and later we discuss about the company profile and also the working
of the company and also the collaborations of the company. Chapter 1 completely introduces
the purpose of the project. Chapter 2 discusses about literature survey of the existing system
and how to overcome in the proposed system. Chapter 3 provides details about the system
requirements i.e., all functional, user and non-functional requirements about the system.
Chapter 4 details about the design of the architecture and it’s Data Flow Diagram. Chapter 5
gives the information of the code that is being implemented to build the interface and some
screenshots. Conclusion and future work details about how the ideas that the proposed system
can be further modified.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 6

CHAPTER 2
LITERATURE SURVEY
Many aspiring graduate students want to complete their studies, prepare for the next stage,
which is a master's degree. Many of them may wonder about the basic requirements for
admission to universities, and about the universities where they can be admitted based on their
requirement [1].
The literature contains several studies that perform statistical analyses on admissions decisions.
For example authors in [2], presents an expert system, called PASS, in which Logistic
Regression is used to predict the potential of high school students in Greece to pass the national
exam for entering higher education institutes. The authors in [3] used predictive modeling to
assess admission policies and standards based on features like GPA score, ACT score, residency
race, etc. Limitations of this research include not taking into consideration other important
factors such as past work experience, technical papers of the students, etc.
These researchers' authors in [4] have used data mining and ML techniques to analyze the
current scenario of admission by predicting the enrolment behavior of students. They have used
the Apriori technique to analyze the behavior of students who are seeking admission to a
particular college. They have also used the Naïve Bayes algorithm which will help students to
choose the course and help them in the admission procedure. In their project, they were
conducting a test for students who were seeking admissions and then based on their
performance, they were suggesting students a course branch using Naïve Bayes Algorithm. But
human intervention was required to make the final decision on the status.
Acharyaet al. [1] proposed a comparative approach by developing four machine learning
regression models: linear regression, support vector machine, decision tree and random forest
for predictive analytics of graduate admission chances. Then compute error functions for the
developed models and compare their performances to select the best performing model out of
these developed models the linear regression is the best performing model with R2 score of
0.72. Janani Pet al. [2] proposed a developed project uses machine learning technique
specifically a decision tree algorithm based on the test attributes like GRE, TOEFL, CGPA,
research papers etc. According to their scores the possibilities of chance of admit is calculated.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 7

The developed model has 93% accuracy. Navoneel Chakrabartyet al. [3] proposed a comparison
of different regression models. The developed models are gradient boosting regress or and linear
regression model. Gradient boosting regress or have to score of 0.84. That surpassing the
performance of linear regression model. They computed different other performance error
metrics like mean absolute error, mean square error, and root mean square error. Chithra
Apoorva et al. [4] proposed different machine learning algorithms for predicting the chances of
admission. The models are Linear Regression, Ridge Regression, Random Forest. These are
trained by features have a high impact on the probability of admission. Out of the generated
models the linear regression model have 79% accuracy.

2.1 Existing System


Machine learning has been used in prior work to predict admissions decisions for graduate
programs [2]; undergraduate programs [3]; MBA programs [4]; and computer science
undergraduate [5] and graduate [6, 7] programs. One of these studies of graduate computer
science admissions was quite extensive and considered 150, 000 computer science graduate
admissions applications spanning 3000 institutions [7]. However, the diversity of the
applications prevented a direct analysis of the application components and instead the study
relied on self-reported outcomes for a few components such as test scores and undergraduate
grades, while textual data sources like LORs or resumes were not incorporated.
A distinctive aspect of our study is the use of textual admissions components, especially the
LORs. There is some prior work that also considers the LORs in the context of making
admissions decisions. We begin by describing those studies that utilize manual rating systems
to characterize the LORs. One study of graduate students in various programs at the University
of California at Los Angeles assigned a manually determined numerical score to each LOR and
found that the LOR was the least significant factor of the seven features that were employed in
the predictive model [2]. A study of thoracic surgery residency programs had program directors
manually rate the importance of various admissions criteria and the study found that applicant
interview performance, letters of recommendation, and professionalism were found to be very
important [8]. A small study of twenty-four orthopaedic residency graduate programs had
reviewers manually rate each LOR as “strong” or “exceptional” based on guidelines that they

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 8

developed and found that applicants with three or more strong letters of recommendation had
slightly higher admission scores [9].
One large-scale study of LORs performed a meta-analysis of previously published research
spanning undergraduate and graduate education [10]. Unlike our study, the goal was not to use
LORs as predictors of past admission decisions, but rather as predictors of future performance.
The study found that LORs have low, but positive, correlations with standardized test scores
and moderate (i.e., 0.26) correlations with prior grades. LORs also had low but positive
correlations with future performance, including a correlation of 0.10 with research
productivity, 0.28 with undergraduate GPA, 0.13 with graduate GPA, and 0.19 with completion
of a doctoral degree. An analysis of the incremental validity of LORs over the other predictors
like test scores and past GPA showed that the LORs do not substantially help with predicting
future graduate GPA but do help with predicting graduate degree completion. As degree
completion is one of our key goals when making admissions decisions, this is quite notable.
None of the prior studies described thus far utilized NLP techniques to characterize the LORs,
indicating a significant gap in research in this field. Waters and Miikkulainen presented an
admission-decision study, where they applied NLP techniques to LORs using a statistical
machine learning approach to facilitate large-scale Ph.D. admissions [6]. Their system
incorporated numerical, categorical, and textual data, with the LOR text transformed into a 50-
dimensional feature vector using a bag-of-words representation (i.e., word order is not
considered) and Latent Semantic Analysis [11] techniques. The study found that LORs
containing words such as “best,” “award,” “research,” “PhD,” etc., were predictive of
admission, while letters containing words like “good,” “class” “programming,” “technology,”
etc., were indicative of rejection. According to the authors, this pattern reflects the faculty’s
preference for candidates with strong research potential. The use of NLP in this study was
relatively straightforward as the focus was on specific words.
Several studies utilized more advanced NLP techniques on LORs, but these studies were
specifically in the context of investigating gender and racial bias in LORs. These bias-related
studies used NLP software to assess the linguistic characteristics of the LORs, including those
related to emotional content (e.g., sadness, excitement). The majority of these studies focused
on graduate medical programs [12–15], while a few studies considered graduate STEM

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 9

disciplines [16, 17] and one focused on undergraduate admissions [18]. The study on
undergraduate admissions [18] for the University of California at Berkeley showed that LORs
written for students in underrepresented racial groups were weaker than those for other
students. However, this study also assessed the impact of LORs on admission decisions and
showed that even though the LORs were weaker for these underrepresented groups, the
inclusion of LORs nonetheless improved the admission outcomes for these students.

2.2 Feasibility Study


A feasibility study is a detailed analysis that considers all of the critical aspects of a proposed
project in order to determine the likelihood of it succeeding. A feasibility study is an assessment
of the practicality of a proposed plan or project. A feasibility study analyzes the viability of a
project to determine whether the project or venture is likely to succeed. The study is also
designed to identify potential issues and problems that could arise while pursuing the project.

2.2.1 Technical Feasibility


Publicly available datasets like the Graduate Admission dataset from Kaggle provide sufficient
data for model training and testing. Features such as GRE/TOEFL scores, CGPA, and research
experience are well-documented in these datasets. Machine learning libraries such as Scikit-
learn, TensorFlow, or PyTorch can be used for model development.

2.2.2 Economic Feasibility


The project primarily uses open-source tools and publicly available datasets, minimizing
financial investment. Hardware requirements are minimal and can be fulfilled using a standard
computer with sufficient processing power. The project offers valuable insights to students and
universities, potentially saving time and resources during the admission process. The predictive
system can be further commercialized as a software tool for educational institutions.

2.2.3 Operational Feasibility


Developing the machine learning model and running predictions is straightforward with
existing tools and frameworks. The project’s deliverables (e.g., predictive model, analysis
report) can be used immediately by applicants or universities with minimal adaptation.to use

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 10

the system within one hour time. So, the system can easily be accepted by any kind of end-
user. Hence the proposed system is technically feasible.

2.3 Tools and Technologies


Here we discuss about different tools and technologies used for the working of the project and
also the various algorithms used and also the software and hardware requirements.

2.3.1 Google Colab


Google Colab is a cloud-based platform that provides free access to GPU resources for running
and experimenting with machine learning models. It's built on Jupyter Notebooks and allows
seamless collaboration as users can share and edit their work in real-time. Colab eliminates
the need for local hardware with its online interface, making it convenient for individuals or
teams working on data science and AI projects. Google Colab is a powerful tool for
researchers, students, and professionals looking for a cost-effective and collaborative
environment for their coding and experimentation needs.

2.3.2 Python
Python is a multi-paradigm programming language. Object-oriented programming and
structured programming are fully supported, and many of their features support functional
programming and aspect-oriented programming (including metaprogramming[71] and
metaobjects).[72] Many other paradigms are supported via extensions, including design by
contract[73][74] and logic programming.[75] Python is known as a glue language,[76] able to
work very well with many other languages with ease of access.

2.3.3 Pandas
Pandas is an open-source Python Library providing high-performance data manipulation and
analysis tool using its powerful data structures. The name Pandas is derived from the word
Panel Data – an Econometrics from Multidimensional data.

2.3.4 Numpy
Numpy is a general-purpose array-processing package. It provides a high-performance
multidimensional array object, and tools for working with these arrays. It is the fundamental

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 11

package for scientific computing with Python. Besides its obvious scientific uses, Numpy can
also be used as an efficient multi-dimensional container of generic data.

2.3.5 Matplotlib
Matplotlib is a Python library for creating static, interactive, and animated visualizations. It
provides a flexible interface for plotting a wide range of charts, such as line plots, bar charts,
histograms, and scatter plots. With extensive customization options, it is widely used for data
visualization in scientific computing and analytics.

2.3.6 Seaborn
Seaborn is a Python library for creating visually appealing statistical graphics, built on
Matplotlib. It simplifies complex plots like scatter plots, heatmaps, and box plots with concise
code. Seamlessly integrating with Pandas, it supports direct data visualization and offers
customizable themes and color palettes, making it ideal for data analysis and presentation.

2.3.7 Scikit-learn
Scikit-learn is an open-source Python library that implements a range of machine learning,
pre-processing, cross-validation, and visualization algorithms using a unified interface. It is
an open-source machine-learning library that provides a plethora of tools for various machine-
learning tasks such as Classification, Regression and Clustering.
2.3.8 Flask
Flask is a lightweight and flexible Python web framework used to build web applications. It
is known for its simplicity, scalability, and ease of use, making it ideal for beginners and small
to medium-sized projects. Flask allows developers to create routes, handle requests, and
integrate templates with minimal overhead.

• Summary

This chapter includes the literature survey and the references used by the different authors, the
existing system and the tools and technologies used for the running the project.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 12

CHAPTER 3
SOFTWARE REQUIREMENTS SPECIFICATION

The Software Requirement Specification (SRS) for "Predicting Graduate Admissions Using
Machine Learning" outlines the functional and non-functional requirements necessary for its
development. The system must allow users to input data such as GRE scores, TOEFL scores,
CGPA, and other related parameters to predict the probability of admission. It requires seamless
integration with pre-trained machine learning models, a user-friendly web interface for input
and visualization, and the capability to display prediction results and relevant graphs. The
software must ensure compatibility with browsers, responsiveness across devices, and efficient
data handling. It should also meet technical requirements like support for Flask for backend
operations, joblib for model integration, and libraries like Matplotlib and Seaborn for graph
rendering.

3.1 User specifications


The User Specifications section describes the specific needs and expectations of the users who
will interact with the system. These specifications outline the features and functionalities
required to meet user needs effectively

3.1.1 Applicants specifications


i. Data Input: Applicants should be able to input personal and academic details (e.g., GRE
score, TOEFL score, GPA, work experience).
ii. Validation Feedback: The system should validate the entered data and notify users of
missing or incorrect information.
iii. Prediction Output: The system should display the predicted likelihood of admission in
a user-friendly format (e.g., percentage or categorized as High/Medium/Low).
iv. User Interface: The input and output interfaces should be simple and intuitive, accessible
via web browsers or mobile devices.
v. Confidentiality: Applicant data must be kept secure and confidential.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 13

3.2 Functional Requirements


The functional requirements define the specific capabilities the system must provide. The
application should enable users to input parameters such as GRE scores, TOEFL scores,
CGPA, and other relevant features for admission prediction. It should load a pre-trained
machine learning model to process the input data and generate a prediction of the likelihood
of admission. The system must provide a clear and accurate display of the prediction results.
Additionally, the interface should include an option to visualize graphs, such as pair plots,
heatmaps, boxplots, and scatter plots, which give insights into the data and model
performance. Error handling, validation for input data, and compatibility with various devices
and browsers are also critical functionalities.

3.2.1 Linear Regression Model


Linear Regression is a supervised learning machine learning algorithm and one of the most
well-known algorithms in machine learning and statistics. Linear Regression is an attractive
model for researchers because its representation is simple, and it works well for many
problems. Learning algorithms are used to estimate the coefficients of the LR model [5]. The
objective of Linear regression model is to figure out the relationship between two variables
by fitting a linear model to the training data. It is a predictive algorithm that provides a Linear
relationship between Prediction (Call it ‘Y’) and Input (Call is ‘X’). The simplest form of the
regression model with one faeture and one target variable is defined by the formula:

Where y is the target variable value, c is a y-intercept, b is the slope , and x is the value of the
feature variable [5]. To train a Linear Regression model, you need to find the value of θ that
minimizes the RMSE by the Equation MSE cost function for a Linear Regression model:

MSE (X, hθ) = 𝒎𝒎 ∑𝒊𝒊=𝟏𝟏( θT .x(i) – y(i) )2

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 14

Where hθ =is the hypothesis function using the model parameters θ, m= number of samples
in dataset, θT = is the transpose of θ , x = is the instance’s feature vector , θT.x(i) = is the dot
product of θT and x(i) , and y = expected value [6].

3.2.2 Random Forest Regressor

Random Forest is a powerful and versatile supervised machine learning algorithm that grows
and combines multiple decision trees to create a “forest.” It can be used for both classification
and regression problems in R and Python. As we know, the Random Forest model grows and
combines multiple decision trees to create a “forest.” A decision tree is another type of
algorithm used to classify data. In very simple terms, you can think of it like a flowchart that
draws a clear pathway to a decision or outcome; it starts at a single point and then branches
off into two or more directions, with each branch of the decision tree offering different
possible outcomes.

3.2.3 Gradient Boosting Regressor

Gradient Boosting is a powerful boosting algorithm that combines several weak learners into
strong learners, in which each new model is trained to minimize the loss function such as
mean squared error or cross-entropy of the previous model using gradient descent. In each
iteration, the algorithm computes the gradient of the loss function with respect to the
predictions of the current ensemble and then trains a new weak model to minimize this
gradient. The predictions of the new model are then added to the ensemble, and the process is
repeated until a stopping criterion is met.

3.2.4 Dataset

The dataset is available at [1]. At the time of writing this paper, the dataset has over 400
downloads and more than 2000 views. This dataset contains parameters that are considered
carefully by the admissions committee. First section contains scores including GRE, TOEFL
and Undergraduate GPA. Statement of Purpose and Letter of Recommendation are two other
important entities. Research Experience is highlighted in binary form. All the parameters are
normalized before training to ensure that values lie between the specified range. A few profiles

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 15

in the dataset contain values that have been previously obtained by students. A unique feature
of this dataset is that it contains equal number of categorical and numerical features. The data
has been collected and prepared typically from an Indian student’s perspective. However, it
can also be used by other grading systems with minor modifications. A second version of the
dataset will be released which will have an additional two hundred entries.

Table 3.1:First five Rows of the Dataset

3.3 Interface Requirements

Fig 3.1: Predicting the chance of admission

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 16

Below is the screen-shot of the user interface after submitting the student profile. Based on
the information you have provided, our machine learning model has calculated the likelihood
of admission to a master's program. This prediction is intended to give you an idea of how
your academic and personal metrics align with the requirements of competitive universities.

Input: The dataset interface accepts structured data in CSV format. This file contains the
features such as GRE Score, TOEFL Score, University Rating, SOP (Statement of Purpose
Strength), LOR (Letter of Recommendation Strength), CGPA, Research Experience (Binary:
0 for No, 1 for Yes). The data is loaded using a data processing module, which ensures proper
handling of missing values, outliers, and normalization where required.
Output: The processed data is passed to the machine learning module for predicting the
chances of graduate admission. The output format includes:
A feature matrix (X) containing input attributes such as GRE Score, TOEFL Score, University
Rating, SOP Strength, LOR Strength, CGPA, and Research Experience. A target vector (y)
representing the actual chance of admission. Predicted probabilities indicating the likelihood
of admission.

3.4 Non-Functional Requirements


The non-functional requirements focus on the system's performance, usability, and reliability.
The application must ensure a fast response time for predictions and visualizations, offering
seamless interaction for users. It should be secure, ensuring data integrity and safeguarding
user inputs from breaches. Scalability is crucial to handle varying user loads, while cross-
platform compatibility ensures the system operates smoothly on different devices and
browsers. The user interface should be intuitive and visually appealing, enhancing user
experience. Additionally, the system should be maintainable, with clear documentation and
modular code for future updates and enhancements.

3.4.1 Operational Requirements


The system must allow users to input details like GRE scores, TOEFL scores, and CGPA
through simple fields. It should provide clear instructions and error messages for invalid
inputs. Once the data is submitted, the system must process it and generate predictions along

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 17

with visual graphs. The interface should work seamlessly on both local and web environments.
It must be compatible with common browsers, ensuring a smooth user experience.

3.4.2 Economical Requirements


The project should use cost-effective, open-source technologies like Python, Flask, Streamlit,
and scikit-learn to minimize costs. Cloud hosting options such as AWS Free Tier or Google
Cloud can be used for deployment. The project can leverage freely available datasets to avoid
purchasing proprietary data. Standard hardware is sufficient for development and running the
application. The system should be designed to remain within a reasonable budget while
providing all necessary features.

3.4.3 Technical Requirements


The project uses Python 3.x, Flask for web application development, and joblib for managing
the machine learning model. Scikit-learn is used for training and prediction, while Matplotlib
and Seaborn handle data visualization. The model should be stored in a specific directory, and
graphs saved in a separate folder for display. The system can run on a computer with at least
4GB of RAM and a standard processor. Dependencies must be installed in the Python
environment, ensuring smooth integration.

 Summary
This chapter includes software requirements specifications and the functional and
nonfunctional requirements and also the other software and hardware tools required for the
project and what are the feasibility study and also other constraints required for the functional
and non-functional requirements of the project.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 18

CHAPTER 4
DESIGN
This chapter provides a comprehensive overview of the system's structural and functional
architecture, emphasizing the flow of data and interactions between various components. This
chapter plays a crucial role in illustrating how the system is organized and how it operates to
meet the project requirements.

4.1 Data Flow

Fig 4.1:Data Flow Diagram

The above diagram illustrates the workflow of a typical machine learning project, particularly
relevant to your graduate admissions prediction task. It outlines the key stages involved, starting
from data acquisition to model evaluation and prediction.
The process begins with Finding Data, where you gather relevant information about applicants,
such as their academic records, test scores, and other relevant factors. This data is then subjected
to Data Cleaning to handle missing values, inconsistencies, and outliers. Subsequently, Data
Analysis is performed to understand the characteristics of the data, identify patterns, and gain
insights into the relationships between different variables. Data Visualization techniques are
employed to visually represent the data and its distributions, aiding in further understanding
and identifying potential trends.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 19

Next, the model selection and training phase begins. You can experiment with various machine
learning algorithms, such as Linear Regression, Random Forest, and Gradient Boosting
Regressor. Each algorithm has its own strengths and weaknesses, and the choice depends on
the specific characteristics of your data and the desired level of accuracy.
Once a model is trained, it is used to make Admission Predictions. The model takes the input
data of a new applicant and predicts their likelihood of admission based on the patterns learned
from the training data. Finally, the Model Decision using KPIs step involves evaluating the
model's performance using relevant metrics (Key Performance Indicators) such as accuracy,
precision, recall, and F1-score. This evaluation helps assess the model's effectiveness and
identify areas for improvement.
In essence, this diagram provides a roadmap for your graduate admissions prediction project,
guiding you through the essential steps from data collection to model deployment and
evaluation. By following this workflow and iteratively refining your approach, you can
develop a robust and accurate prediction system.

4.2 Sequence Diagram

Fig 4.2: Sequence Diagram

This diagram illustrates the workflow of a graduate admission prediction system, a


sophisticated solution that leverages machine learning to streamline the evaluation process for
both applicants and admissions committees. The process commences when an applicant inputs
their relevant details, such as their GPA and GRE scores, into the Frontend System. This could

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 20

be a user-friendly web interface or a mobile application. The Frontend System then acts as a
gateway, transmitting this raw input data to the Backend System.
Within the Backend System, the data undergoes a rigorous validation and preprocessing phase.
This crucial step ensures data integrity and prepares it for consumption by the Machine
Learning Model. Validation involves checking for errors, inconsistencies, and missing values,
ensuring the data adheres to defined standards and constraints. Preprocessing encompasses a
range of techniques, such as data cleaning to handle missing values or outliers, and data
transformation to normalize or scale the data into a format suitable for the model's analysis.
The preprocessed data is then seamlessly transferred to the Machine Learning Model, the core
of this system. This model, trained on a vast historical dataset of admitted and rejected
applicants, employs sophisticated algorithms to analyze the input data and identify patterns
and relationships between various factors, such as GPA, GRE scores, research experience, and
other relevant criteria. Based on this analysis, the Machine Learning Model generates a
predicted likelihood of admission for the current applicant.
The predicted likelihood of admission is then communicated back to the Backend System.
The Backend System fulfills two vital functions at this stage. Firstly, it stores the applicant's
data, including their input details and the predicted outcome, in a secure and organized
database. This data repository serves as a valuable resource for future analysis, trend
identification, and system improvements. Secondly, the Backend System transmits the
predicted likelihood percentage back to the Frontend System. Finally, the Frontend System
receives the predicted likelihood percentage and presents it to the applicant in a clear and
concise manner. This provides the applicant with an immediate assessment of their admission
prospects, empowering them to make informed decisions regarding their application strategy.
By automating this critical aspect of the admission process, the system offers several key
advantages. It streamlines the evaluation process, enabling admissions committees to process
applications more efficiently. It provides applicants with a data-driven and objective
assessment of their chances, fostering transparency and fairness. Furthermore, the system can
be continuously improved by refining the Machine Learning Model with new data and
incorporating feedback from stakeholders.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 21

In essence, this graduate admission prediction system represents a powerful application of


machine learning, demonstrating how technology can be leveraged to enhance decision-
making processes in education and other domains.

 Summary
The design chapter provides a detailed representation of the system’s architecture, showcasing
how data flows through various processes and how different components interact to achieve
the desired functionality. The inclusion of the Data Flow Diagram and Sequence Diagram
ensures a clear understanding of the system's operation and helps visualize the implementation
details effectively.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 22

CHAPTER 5
IMPLEMENTATION
The implementation involves a structured approach that leverages machine learning techniques
to predict the probability of a student being admitted to a graduate program. This begins with
data collection, where a dataset containing various attributes related to graduate admissions is
obtained. Common datasets include features like GRE scores, TOEFL scores, undergraduate
GPA, ratings of the statement of purpose and letters of recommendation, work or research
experience, and other related attributes.
Next, the data undergoes preprocessing to ensure it is clean and ready for analysis. This
involves handling missing values, encoding categorical data, and normalizing numerical
features to ensure they are scaled appropriately. Feature engineering may also be conducted to
create new meaningful variables or remove redundant ones. An exploratory data analysis
(EDA) phase is performed to understand the relationships between variables and identify
patterns or trends in the dataset. Visualizations such as scatter plots, correlation matrices, and
histograms are used to gain insights into the data distribution.
After preprocessing, the project transitions to the model development phase, where various
machine learning algorithms are applied. Initial models may include Linear Regression for a
baseline, followed by more advanced techniques such as Decision Trees, Random Forest,
Gradient Boosting (e.g., XGBoost), or Neural Networks, depending on the complexity and size
of the dataset. The dataset is split into training and testing subsets to evaluate the models'
performance. Cross-validation is employed to ensure the model generalizes well to unseen
data.
The final phase involves building a user-friendly interface to make predictions accessible to
end-users. This could be implemented as a web-based application using frameworks like Flask
or Django, where users can input their details (e.g., GRE scores, GPA, etc.) and receive an
admission probability. Additionally, the model can be integrated with visualization dashboards
to provide detailed insights into the predictions and factors influencing the results. This project
not only demonstrates practical machine learning implementation but also serves as a valuable
tool for students to make informed decisions about their graduate applications.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 23

5.1 Code Snippet


Importing the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

Training the model


x=data.drop(['Chance of Admit'], axis=1)
y=data['Chance of Admit']
x.shape, y.shape
from sklearn.model_selection import train_test_split
train_x, test_x, train_y, test_y=train_test_split(x,y,random_state=56)
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(train_x, train_y)
y_pred_lr = lr.predict(test_x)
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Random Forest Regressor


rf = RandomForestRegressor(random_state=42)
rf.fit(train_x, train_y)
y_pred_rf = rf.predict(test_x)
# Gradient Boosting Regressor
gbr = GradientBoostingRegressor(random_state=42)
gbr.fit(train_x,train_y)
y_pred_gbr = gbr.predict(test_x)

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 24

The provided code outlines the process of training multiple machine learning models to predict
the "Chance of Admit" for graduate admissions based on various features. It begins by
importing essential Python libraries, such as pandas and numpy for data manipulation,
matplotlib and seaborn for visualization, and machine learning tools from sklearn. The dataset
is prepared by splitting it into the input variables (x) and the target variable (y), where x contains
all features except the "Chance of Admit" column, which is set as the target (y). Three different
regression models are trained: Linear Regression, Random Forest Regressor, and Gradient
Boosting Regressor. Finally, evaluation metrics such as Mean Absolute Error (MAE), Mean
Squared Error (MSE), and R² Score are imported to assess the accuracy and performance of the
trained models

Testing the model


mse = mean_squared_error(test_y, y_pred_lr)
lr_rmse = np.sqrt(mse)
lr_r2 = r2_score(test_y, y_pred_lr)
# Compile results
results = pd.DataFrame({
'Model': ['Linear Regression', 'Random Forest', 'Gradient Boosting'],
'MAE': [lr_mae, rf_mae, gbr_mae],
'RMSE': [lr_rmse, rf_rmse, gbr_rmse],
'R2 Score': [lr_r2, rf_r2, gbr_r2]
})
print(results)

The above code is used to evaluate the performance of different regression models. First, it
calculates the mean squared error (MSE) between the actual test labels (test_y) and the
predicted values from the Linear Regression model (y_pred_lr). Then, it computes the root
mean squared error (RMSE) by taking the square root of the MSE. RMSE is a commonly
used metric to assess the accuracy of a regression model, with lower values indicating better
performance. The R² score is also calculated for the Linear Regression model using the
r2_score() function, which measures how well the model fits the data (with 1 being a perfect

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 25

fit and values closer to 0 indicating a worse fit). After that, a Pandas DataFrame is created to
compile the performance results of all three models

Building the model


from sklearn.linear_model import LinearRegression as LR
from sklearn.metrics import mean_absolute_error as mae
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
train_x = scaler.fit_transform(train_x)
test_x = scaler.transform(test_x)
lr = LR()
lr.fit(train_x, train_y)
model = LinearRegression()
model.fit(train_x, train_y)
joblib.dump(model, 'linear_regression_model.pkl')

This code demonstrates the process of building and saving a Linear Regression model using
scikit-learn. First, it imports necessary libraries. The code then creates an instance of the
StandardScaler class and uses it to standardize the training and test features (train_x and
test_x). Standardization scales the data so that each feature has a mean of 0 and a standard
deviation of 1. Next, the code initializes the Linear Regression model (lr = LR()) and fits it to
the standardized training data (train_x and train_y). Finally, the code uses joblib.dump() to
save the trained Linear Regression model to a file (linear_regression_model.pkl).

5.2 Snapshots
A snapshot is a new instance of an existing project. Some key points to remember about
snapshots are as follows: A snapshot is a separate project. Making a change to one snapshot
in a snapshot set does not affect the other snapshots in the set. A snapshot is an executable
project. Snapshots are generally created for data protection, but they can also be used for
testing application software and data mining. A storage snapshot can be used for disaster
recovery (DR) when information is lost due to human error.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 26

In this section, a sample screenshots of project pages are included with the description.
Description explains about when this screen appears and what the actions it will do are and
outcome of that screen is explained.

Fig 5.1: Initial Exploration of Dataset

The above code snippet demonstrates the initial steps in a "Graduate Admission Prediction"
project using the Python programming language and the Pandas library.
Firstly, data = pd.read_csv('GAdata.csv') reads data from a CSV file named "GAdata.csv" and
stores it in a Pandas DataFrame called data. This DataFrame is a 2-dimensional, labeled data
structure that efficiently holds and manipulates tabular data.
Next, data.columns displays the names of the columns in the DataFrame. These column names
likely represent features relevant to graduate admissions, such as GRE Score, TOEFL Score,
University Rating, Statement of Purpose (SOP), Letter of Recommendation (LOR), CGPA,
Research experience, and the target variable "Chance of Admit."
Finally, data.head() displays the first five rows of the DataFrame. This provides a quick
overview of the data, allowing for an initial inspection of the values and data types.
This code snippet represents the initial data loading and exploration phase of the project. The
subsequent steps would involve data cleaning, feature engineering, model selection, training,
evaluation, and deployment.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 27

Fig 5.2: Data Preparation and Model Training

The above code outlines the process of building and evaluating a linear regression model to
predict the "Chance of Admit" based on various features. Initially, the data is prepared by
separating the features (independent variables) from the target variable ("Chance of Admit").
Then, the data is divided into training and testing sets to evaluate the model's performance on
unseen data. Next, a linear regression model is created and trained on the training data. The
trained model is then used to make predictions on the testing set. Finally, the model's
performance is evaluated using metrics such as Mean Absolute Error (MAE), Mean Squared
Error (MSE), Root Mean Squared Error (RMSE), and R-squared score. These metrics provide
insights into the accuracy and reliability of the model's predictions.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 28

Fig 5.3: Visualization of Feature Correlations

The heatmap visualizes the correlation matrix among the various features and the target
variable ("Chance of Admit") in the graduate admissions dataset. It reveals that CGPA, GRE
Score, and TOEFL Score exhibit strong positive correlations with the "Chance of Admit,"
suggesting these are highly influential factors. University Rating, SOP, and LOR also show
moderate positive correlations, indicating their significance in the admission process.
Research experience appears to have a weaker correlation, implying it might not be as
influential as the other factors. This heatmap provides valuable insights into the relative
importance of each feature, guiding feature selection and model development for more
accurate predictions in the graduate admission prediction project.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 29

Fig 5.4: Distribution of Admission Factors

The boxplot provides a visual summary of the distribution of five key features: University
Rating, Statement of Purpose (SOP), Letter of Recommendation (LOR), Cumulative Grade
Point Average (CGPA), and Research Experience. The boxplots reveal that CGPA is skewed
towards higher values, suggesting a majority of applicants have a high CGPA. University
Rating, SOP, and LOR distributions are relatively similar with medians around 3.5. The
Research Experience feature is binary, with a majority of applicants lacking research
experience. This boxplot information can be used to guide data preprocessing steps such as
scaling and outlier handling, as well as feature engineering for your graduate admission
prediction project.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 30

Fig 5.5: Frequency Distribution of Admission Probabilities

The histogram visualizes the distribution of the "Chance of Admit" variable in the graduate
admissions dataset. The distribution appears to be roughly bell-shaped, indicating a normal or
near-normal distribution. The "Chance of Admit" values range from approximately 0.4 to 1.0,
with the majority of applicants having a "Chance of Admit" between 0.6 and 0.8. This
histogram provides valuable insights into the distribution of the target variable, which can
guide the selection of appropriate machine learning models and evaluation metrics. It can also
help identify potential outliers and aid in interpreting the model's predictions in the context of
the graduate admission prediction project.

Fig 5.6: Scatter Plot of GRE Score and Chance of Admit

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 31

The scatter plot visualizes the relationship between GRE Score and Chance of Admit in the
graduate admissions dataset. The plot reveals a positive correlation between the two variables,
indicating that higher GRE scores are generally associated with higher chances of admission.
However, the points are not perfectly aligned, suggesting that GRE Score is not the sole
determinant of admission and that other factors also significantly influence the admission
decision. The scatter plot confirms the importance of GRE Score as a predictor and provides
valuable insights for model selection and feature engineering in the graduate admission
prediction project.

 Summary
This chapter contains the implementation and screenshots of the project. The implementation
which includes the packages that are used to build our model and code snippet of the model.
The implementation chapter which includes screenshots and their functionality of that page are
explained.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 32

CHAPTER 6
CONCLUSION AND FUTURE ENHANCEMENTS
7.1 Conclusion
After evaluating all four models on the dataset, we compare the performances to find out which
model predicts better. MSE and R2 Scores are tabulated for all the models.

Regression Models MAE RMSE R2 Score

Linear Regression 0.046265 0.064036 0.794558

Random Forest 0.047624 0.064971 0.788510


Regression
Gradient Boosting 0.048946 0.065782 0.783198
Regressor

Table 6.1: Performance Analysis

It is clear that Linear Regression performs the best on our dataset, with a low MAE and high
R2 score, closely followed by Random Forest Regressor. This can be attributed to the linear
dependencies of features in the dataset. Higher values of test scores, GPA and other factors
generally result in greater chances of admission. The inclusion of a few outliers has influenced
the Linear model to some extent. The overall objective of the research was achieved
successfully as the system allow the students to save the extra amount of time and money that
they would spend on education consultants and application fees for the universities where they
have fewer chances of securing admission. Also, it will help the students to make better and
faster decision regarding application to the universities.

7.2 Future Enhancements


Additional profiles (unseen data) were evaluated using Linear Regression, the best-performing
model, to assess its performance. The profile of an applicant with the test scores, starting from
GRE and TOEFL, College Ranking, Statement of Purpose, Letter of Recommendation, GPA,
and Research Experience, is displayed as follows:

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admission Prediction using ML Techniques 33

[335, 117, 5, 5, 5, 9.7, 1] -> 0.93


[324, 110, 4, 4, 5, 9.04, 1] -> 0.82
[296, 95, 2, 1.5, 2, 7.1, 0] -> 0.43
The results obtained closely resemble the actual chances of admission. The performance of the
model clearly indicates that the algorithm works well on unseen data. A plausible solution was
developed for the problem by considering various factors that affect the chances of admission.
Although the entire admission process is subjective, the solution provides satisfactory results
for the dataset used. As indicated, the dataset will be expanded, and the number of profiles will
be increased with some variations. The number of outliers (profiles that do not seem impressive
but had a high chance of admission) will be significantly increased to reduce the linear
dependency of features. Deep Neural Networks will also be used as another plausible model
to understand the subjective nature of admission.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025


Graduate Admissions Prediction using ML Techniques 34

BIBLIOGRAPHY

[1] Mohan S Acharya, “Graduate Admissions”, Predicting admission from important


paramaters, Kaggle, April 2021. https://www.kaggle.com/mohansacharya/datasets.
[2] Al-Debagy, O., & Ullah, A. (2021). "Machine Learning Approaches for Predicting
Graduate Admissions." International Journal of Advanced Computer Science and
Applications, 12(5), 14-22.
[3] Patel, R., & Mehta, A. (2020). "A Study on Graduate Admission Prediction Using
Regression Models." Journal of Educational Data Mining, 8(3), 24-36.
[4] Zhang, H., & Zhang, X. (2022). "Predictive Modeling in Higher Education: Graduate
Admission as a Case Study." Educational Data Mining Conference Proceedings, 112-120.
[5] Abhishek, K., & Singh, N. (2021). "Evaluation of Machine Learning Algorithms for
Graduate Admission Prediction." International Journal of Artificial Intelligence and
Applications, 10(2), 1-11.

KLE Institute of Technology, Hubballi Dept. of MCA 2024-2025

You might also like