Education Loan Prediction Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Volume 7, Issue 4, April – 2022 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Education Loan Prediction Analysis


Sanskruti Naik Ganesh Manerkar
Department of Information Technology and Engineering Department of Information Technology and Engineering
Goa College of Engineering Goa College of Engineering
Goa, India Goa, India

Abstract:- Education loans help students to cover the cost  Applicant who wishes to pursue higher education Outside
of tuition, books and supplies, and living expenses while India must have obtained 60% or more marks in the
in the process of pursuing a degree. Education loans are qualifying examination(for ST/SC/OBC marks will be
granted by private banks and by government relaxed by 10 percent).
organizations. This paper isan analysis on student loan  Family Income should not exceed 7LPA for applicants
data for the interest free education loans granted to taking courses within India. In the event that brother or
students as per the standards and rules of Goa sister of the applicant is also pursuing studies at
Education Development Corporation(GEDC). The Higher/Technical education level(whether or not such
dataset is prepared complying the standards of criteria sibling has applied for, or availed , loan under this
mentioned by organization. The accuracy of prediction is scheme), the eligibility limit for family will be raised to
compared using models like Support Vector 8LPA.
Machine(SVM), Random forest(RF), Logistic  Family Income should not exceed 12LPA for applicants
regression(LR), Decision tree classifier and XG-boost. taking courses Outside India. In the event that brother or
sister of the applicant is also pursuing studies at
Keywords:- Loan, Prediction, Support Vector Machine, Ran- Higher/Technical education level(whether or not such
Dom Forest, Logistic Regression, Decision Tree Classifier, sibling has applied for , or availed loan under this scheme),
XG- Boost. the eligibility limit for familywill be raised to 14LPA.

I. INTRODUCTION This paper aims to provide loan to a deserving


applicant adhering to all the above criterias. The loan
It is education that uplifts the society at the macro approval history of past applicant forms is considered for
level and the individuals at the micro level from their all round training the model and an efficient, non-biased system is
backwardness, whether social or economical , cultural or formulated to reduce the institutions time employed in
political. Education loan helps promote pursuit of higher and checking every application for granting loan on a priority
technical education by younger population to ensure that basis . The analysis of parameters such as residence,
economic and financial difficulties do not come in the way Category, Education etc. which are linkedto each other is
of such pursuit. The Government of Goa launched the Interest Visualized in this paper. Section II shows literature survey of
free Education Loan Scheme under which eligible candidates systems and approaches for granting of loans in various
can undertake approved degree and diploma courses at un- domains . We discuss the Framework of the proposed system
dergraduate and post graduate levels in India or Abroad. The in section III. Results obtained, Comparison of models is
Rules and Standards are maintained by the Goa Education carried out in Section IV. Finally we conclude in section V.
Development Corporation(GEDC) for a Candidate to Apply
for such Loan. The Various parameters like Residence, II. RELATED WORKS
familyincome, percentage of marks obtained in (10th ,12th,
Diploma), Whether any sibling of the applicant has taken a Predictive analytics is a branch of advanced analytics
education loan earlier are involved in the processing of Loan that uses many techniques from data mining, statistics,
underGEDC. The following are the primary Criteria followed modeling, machine learning, and artificial intelligence to
by GEDC in granting student Loan. analyze current data to make predictions. “Adyan Nur
Alfiyatin, Hilman Taufiq and their friends have worked on the
 The Applicant necessarily should be the resident of Goa house price predic- tion. They have used regression analysis
for not less than 15 years. and Particle Swarm Optimization (PSO) to predict house
 Maximum 5 years of study period/course duration is price”. “Mohamed El Mohadab, Belaid Bouikhalene [3] and
coveredin India and maximum 2 years of Study is covered Safi have put a work to predict the rank for scientific research
for Abroad under this scheme. paper using supervised learning”. “Kumar Arun, Garg Ishan
 Any person below age of 30 years, shall be entitled to and Kaur Sanmeet [1] have worked on bank loan prediction
applyfor and receive loans under this Scheme. on how to approve a loan and proposed a model with the
 Applicant who wishes to pursue higher education in India help of SVM and Neural networks like machine learning
must have obtained 55% or more marks in the algorithms”. ”Anshika Gupta, Vinay Pant , Sudhanshu
qualifying examination(for ST/SC/OBC marks will be Kumar and Pravesh Kumar Bansal[4] worked on Bank loan
relaxed by 10 percent). prediction has implemented algorithms like random
forest(RF), logistic regression(LR) to make predictions”.
These literature reviews helped us to carry out this work and

IJISRT22APR259 www.ijisrt.com 258


Volume 7, Issue 4, April – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
propose a reliable Education loan prediction model. also listed in the Introduction section of this paper.
Dataset in total comprises of 290 rows and 11 columns.
III. PROPOSED SYSTEM The column attributesare named such as Loan id, Gender,
Highest Degree, Board, Sibling loan, Annual income,
A. System Model – Using Flow Graph caste, Loan amount, Residence, Status and Age.

2) Preprocessing: The data obtained from the dataset


Preparation Step (step 1) is then preprocessed by
removing the unwanted data. Pre-processing data
transformation operations, are used to transform the
dataset into a structure suitable for machine learning. This
step basically helps in cleaning the dataset by removing
irrelevant or corrupted data that can affect the accuracy
of the dataset, which makesit more efficient. In this
process removal of missing data, duplicate entries and
normalization is carried out. Removal of missing data is
the process, where null values such as missing values and
Nan values are replaced by 0. Label encoding is done in
order to convert the categorical data to a form thatthe
machine understands i.e., numerical data. The columns
having answers of Yes/No are converted to 1/0
respectively. Similarly, the Gender column attributes
Male/Female are converted to 0/1 respectively and so on.

3) Feature Extraction: Feature Extraction reduces the


number of features in a dataset by creating new features
from the existing dataset (and then discarding the original
features). These new reduced set of features should then
be able to give most of the information contained in the
original set of features. In this case, unnecessary columns
are dropped whose removal do not affect the loan
procedure. Ex. Loan idin our case.

4) Data Splitting: Data splitting splits the data into a train,


test, or validation set. The train set would contain the data
which will be fed into the model or in other words model
Fig. 1. System architecture
would learn from this data. The validation set is usedto
validate the trained model. The test set contains the
The data is initially prepared complying with the policy
data on which we test the trained and validated model.
document viewed on GEDC portal. The loans of students are
It tells us the efficiency/performance using evaluation
approved only when they meet the criteria required by
metrics (like precision, recall, accuracy, etc.
GEDC.The data then is treated for preprocessing where the
outliers like missing values are handled and unnecessary
5) Model Comparison: The models used for evaluation of
attributes are dropped. Label encoding is done to convert the
accuracy scores are - Support vector Machine (SVM),
categorical data to a numeric form wherever needed. The
Logisticregression (LR), Random Forest (RF), Decision
feature extraction step begins by thus selecting only the
tree classifier, and XG-boost. A comparison of the
attributes which are essential for predicting grant of loan to a
models help us in using the best algorithm for
student and thereby dropping unnecessary columns. The data
prediction purpose yielding the best output on system
is then split into test and train. The system is trained using
under consideration.
various Machine Learn- ing models such as support vector
Machine(SVM), Logistic regression(LR), Random
B. Visualization
Forest(RF), Decision tree classifier and XG-boost. The
Visualization is carried out in order to graphically
model accuracy is thus compared of allthe above models.
representvarious attributes and their link to granting of loan
Data Visualization using the pandas library Seaborn is also
procedure. It will also help in the survey process. Similarly
achieved which helps to get an analysis of various parameters
graphical count of candidates belonging to Categories such as
that are involved in granting of loan.
General, OBC, SC, ST can be observed who have got the loan
approval etc.
1) Dataset Used: Dataset is prepared by complying the
policy document for granting education loan to a student
available on GEDC portal. It is prepared following the
criteria’s mentioned in GEDC brochure. This rules are

IJISRT22APR259 www.ijisrt.com 259


Volume 7, Issue 4, April – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
C. Algorithms Implemented IV. RESULTS
The various supervised Machine learning algorithms
used for prediction of availing education loans to students are 1) Model Comparison: The accuracy score on test data
as follows: obtained is highest by XG-boost model. However the
lowest accuracy is obtained by Support Vector Machine.
1) Support Vector Machine: The support vector machine It can also be observed that Decision tree too performs
(SVM) is used primarily for classification problems in well for the dataset under consideration. The comparison
Machine learning. SVM create the best line or decision of the models helps us employ the best algorithm for
boundary that can segregate n-dimensional space into prediction purpose on the system under consideration.
classes so that we can easily put the new data point in the
correct category in the future. This best decision
boundary is called a hyperplane. SVM chooses the
extreme points/vectors that help in creating the
hyperplane. These extreme cases are called as support
vectors. and hence the algorithm is termedas Support
Vector Machine.
2) Logistic regression: Logistic regression is used for
predicting the categorical dependent variable using a
given set of independent variables. So, it predicts the Fig. 2. Model Comparison
output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can 2) Confusion matrix: The Confusion matrix is a summary of
be either Yes or No, 0 or 1, true or False, etc. but instead prediction results on a given classification problem. Here
of giving the exact value as 0 and 1, it gives the heatmap describes that ‘0’(Loan Not Granted ) samples
probabilistic values which lie between 0 and 1. In this classi- fied correctly were 26 and incorrectly classified
analysis, it will specify whether the education loan to the samples were
applicant is granted or not. 3) Similarly ‘1’ (Loan Granted) samples classified correctly
3) Decision Tree: It is a tree-structured classifier, where were 29 and incorrectly classified were 0.
internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents
the outcome. In a decision tree, for predicting the class of
the given dataset, the algorithm starts from the root node
of the tree. This algorithm compares the values of the root
attribute with the record attribute or real dataset
attribute and, based on the comparison, follows the
branch and jumps to the next node. For the next node, the
algorithm again compares the attribute value with the
other sub-nodes and moves further. It continues the
process until it reaches the leaf node of the tree. The
complete process can be better understood using the
below algorithm
4) Random Forest: Random Forest (RF) is a popular
machine learning algorithm that belongs to the supervised
learning technique. Random Forest is a classifier that
contains a number of decision trees on various subsets
of the given dataset and takes the average to improve the
predictive accu- racy of that dataset. The greater number Fig. 3. Confusion matrix
of trees in the forest leads to higher accuracy.
5) XG-Boost: XG-Boost is an algorithm that has been 4) Data Analysis based on Visualization: Data Visualiza-
widely known for prediction of results faster. In this tion using the panda’s library Seaborn is also achieved
model each tree is built only after the previous one using which helps get an analysis of various parameters that
all cores. This makes XG-Boost a very fast algorithm. are involved in granting of the loan. The Data
Visualization may help the organization to keep a track
of various parameters suchas family income, category,
highest education, etc. that help students access the loan.

The graph below depicts the volume of male and female


candidates who either are been granted the loan(Orange) and
not been granted the loan(blue). The 0 on the x-axis is
themale candidates and the 1, is for female candidates.

IJISRT22APR259 www.ijisrt.com 260


Volume 7, Issue 4, April – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 4. Represents volume of male and female who either Fig. 6. Representation of candidates of each category been
are been granted the loan(Orange) and not granted the granted and not granted with the loans
loan(blue)
ACKNOWLEDGMENT
The below graph shows the clear result that only
candidatesabove 15 years of residence are been granted the I thank my Project Guide, Mr. Ganesh Manerkar for mo-
loan. tivating and guiding me to carry out this research seminar.
I express my gratitude and earnest thanks to Dr. Nilesh Fal
Dessai, Head of Information Technology Department, Goa
College of Engineering, in providing me with all the facilities
throughout the research seminar work. My sincere and kind
thanks to the Principal of our college, Dr. Rajesh Basant
Lohani for providing all the facilities and resources to me. I
also heatly thank the personell’s of GEDC Goa for providing
me the necessary inputs. I am indebted to my Parents and
my Husband for motivating me in partial fulfillment of this
research seminar work.

REFERENCES

[1]. K. Arun, G. Ishan, and K. Sanmeet, “Loan Approval


Fig. 5. Representation that only candidates above 15 years Prediction based on Machine Learning Approach”,
of residance are been granted the loan. IOSR Journal of Computer Engineering, pp. 18-21,
2009.
The graph underneath gives a virtual representation of [2]. Adyan Nur Alfiyatin, Hilman Taufiq, Ruth Ema Febrita,
can- didates of each category been granted and not granted Wayan Firdaus Mahmudy, ‘Modeling House Price
with the loans. The 0 on the x-axis(caste-ST) shows Prediction using Regression Analysis and Particle
candidates of that category been availed the loan Swarm Optimization: International Journal of
facility(orange) and not been availed the loan Advanced Computer Science and Applications (Vol. 8,
facility(blue).same goes for 1(SC),2(OBC) and 3(General) No. 10, 2017).
Categories. [3]. Mohamed El Mohadab, Belaid Bouikhalene, Said Safi,
‘Predicting rank for scientific research papers using
V. CONCLUSION supervised learning applied Computing and Informatics
15 (2019) 182–190.
The Education Loan Prediction System is trained using [4]. Anshika Gupta, Vinay Pant, Sudhanshu Kumar, Pravesh
various ML models such as SVM, Logistic regression, RF, Kumar Bansal, Bank Loan Prediction System using
Decision tree classifier and XG-boost. The accuracy score on Machine Learning: 9th International Conference on
test data obtained is highest by XG-boost model. However the System Modeling and Advancement in Research Trends
low- est accuracy is obtained by Support Vector 4th–5th December 2020
Machine(SVM). This suggest that the boosting algorithm can [5]. Vishal Singh, Ayushman Yadav, Rajat Awasthi,
be used for most of the prediction based environment as it best N.Partheeban, ’ Predic- tion of Modernized Loan
yields the output. Confusion met- rics was visualized using a Approval system based on Machine Learning Approach
heatmap. Data Visualization using the pandas library Seaborn ’2021 International Conference on Intelligent
is also achieved which helps get an analysis of various Technologies (CONIT)
parameters that are involved in granting of loan. The [6]. Mohamed Alaradi, Sawsan Hilal, ’Tree-Based Methods
dataset can be increasedwith more entries of rows to get for Loan Ap- proval,2020 International Conference on
even better accuracy scores and train the model in a more Data Analytics for Business and Industry: Way
better manner. Towards a Sustainable Economy (ICDABI)

IJISRT22APR259 www.ijisrt.com 261


Volume 7, Issue 4, April – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[7]. L. Al-Blooshi and H. Nobanee, “Applications of
Artificial Intelligence in Financial Management
Decisions: A Mini-Review”,
[8]. R. Kumar, V. Jain, P.S. Sharma, S. Awasthi, and G.
Jha. “Prediction of Loan Approval using Machine
Learning”, International Journal of Advanced Science
and Technology, vol. 28, pp. 455-460, 2019. SSRN
Electronic Journal, 2020.
[9]. Rising Odegua,” Predicting Bank Loan Default with
Extreme Gradient Boosting
[10]. ”Goa Education and Development Cooperation”,
https://gedc-goa.org/.

IJISRT22APR259 www.ijisrt.com 262

You might also like