Loan Prediction
Loan Prediction
FINAL REPORT
ON
ABSTRACT
Mortgage prediction is very helpful for employee of banks in addition to for the applicant
also. The mortgage Prediction machine can can robotically calculate the load of each features
taking element in loan processing and on new take a look at data equal features are processed
with admire to their related weight .A time limit may be set for the applicant to test whether
his/her mortgage may be sanctioned or now not. mortgage Prediction machine lets in jumping
to particular application so that it can be check on priority basis. in this undertaking we
understand an idea approximately how actual commercial enterprise troubles are solved the
usage of EDA. on this take a look at, we will also broaden a simple information of hazard
analytics in banking and monetary services and understand how facts is used to minimise the
danger of losing money while lending to customers. In India, the wide variety of human
beings making use of for the loans receives expanded for diverse reasons in latest years. The
financial institution employees aren't able to examine or are expecting whether the consumer
can payback the amount or not (suitable consumer or horrific client) for the given hobby
price. The purpose of this paper is to discover the character of the purchaser making use of for
the personal mortgage. An exploratory facts evaluation method is used to cope with this
hassle. The dataset that makes use of EDA undergoes the procedure of normalisation, lacking
value treatment, choosing critical columns the usage of filtering, deriving new columns,
figuring out the goal variables and visualising the information inside the graphical layout.
Python is used for easy and green processing of records. we used the pandas library to be had
in Python to system and extract facts from the given dataset. particularly, it gives facts
structures and operations for manipulating numerical tables and time series. The processed
facts is converted into appropriate graphs for higher visualisation of the effects and for higher
know-how. For acquiring the graph Matplot library is used. Matplotlib is a plotting library in
Python and its numerical mathematics extension NumPy. It presents an item-orientated API
for embedding plots into programs using standard-cause GUI toolkits like Tkinter, wxPython,
Qt and so forth. This organization is the largest on-line mortgage marketplace, facilitating
private loans, business loans, and financing of medical strategies. borrowers can effortlessly
get admission to lower hobby rate loans thru a quick on line interface. Like most different
lending companies, lending loans to ‘volatile’ applicants is the biggest source of financial loss
(called credit score loss). The credit score loss is the quantity of money lost by means of the
lender when the borrower refuses to pay or runs away with the money owed. In other phrases,
debtors who default cause the largest amount of loss to the creditors. In this case, the
customers labelled as 'charged-off' are the 'defaulters'. If one is capable of identify these
unstable mortgage candidates, then such loans can be reduced thereby slicing down the
amount of credit score loss. identification of such candidates the use of EDA is the purpose of
this assignment. In other phrases, the organisation desires to recognize the using elements (or
driving force variables) behind loan default, i.e. the variables which might be sturdy signs of
default. The corporation can utilise this knowledge for its portfolio and hazard assessment.
3
CHAPTER 1
INTRODUCTION
A loan is a amount of money borrowed from the bank to help for sure deliberate or unplanned
activities. The borrower is required to pay lower back the mortgage, inclusive of the hobby
charged over a stipulated duration. There are several styles of loans for various monetary
requirements. A bank can furnish a mortgage inside the shape of a secured or unsecured
mortgage. A cozy mortgage is often a big sum of money this is needed to buy a house or car
and is the correct choice for a home loan or car loan. An unsecured mortgage is preferential
for pupil loans, or personal loans which typically encompass smaller quantities of cash.
TYPES OF LOANS
Banks provide various types of loans to ensure that they meet all your needs.
• Home Loan – The financial institution borrows you money, and the house remains the
property of the financial institution till the final instalment is made. consumers are
required to pay returned the mortgage on a month-to-month foundation, on the given
hobby price and over a stipulated period, usually 20yrs.
• Student Loan – Students that need to in addition their studies at any better schooling
organization that require financial help apply for student loans. The bank provides the
money for the duration of their studies and after the finishing touch in their studies;
the pupil needs to pay back the cash. The hobby prices are commonly low and there
are flexible compensation alternatives.
• Car Loan – Most banks provide automobile loans for each used and new automobiles.
consumers pay lower back the instalments on a monthly basis, and the automobile
belongs to the bank until the final charge is made.
• Business Loan – A enterprise mortgage presents you with the capital to begin your
enterprise venture. The bank provides you with the cash and you're required to make
the repayments after an agreed period of time. The requirements range in keeping with
each financial institution and whether or not you are a new enterprise or had been in
operation performs a first-rate element on your loan software.
A unsecured mortgage is a short-time period loan and it has no assure connected to it. It is
usually given on the premise of your credit score file and monetary role. Unsecured loans
consist of credit playing cards, private loans and pupil loans. Because of the high danger of
this type of loan then, the hobby charge is likewise higher. It is imperative to visit your
preferred monetary institution at the diverse options regarding each their secured and
unsecured loans.
When a person applies for a loan, there are two types of decisions that could be taken by
the company:
Loan accepted: If the company approves the loan, there are 3 possible scenarios described
below:
Fully paid: Applicant has fully paid the loan (the principal and the interest rate)
Current: Applicant is in the process of paying the instalments, i.e. the tenure of the loan is not
yet completed. These candidates are not labelled as 'defaulted'.
Charged-off: Applicant has not paid the instalments in due time for a long period of time, i.e.
he/she has defaulted on the loan
Loan rejected: The company had rejected the loan (because the candidate does not meet their
requirements etc.). Since the loan was rejected, there is no transactional history of those
applicants with the company and so this data is not available with the company (and thus in
this dataset)
Now a day’s bank plays a vital role in market economy. The success or failure of organization
largely depends on the industry’s ability to evaluate credit risk. Before giving the credit loan
to borrowers, bank decides whether the borrower is bad (defaulter) or good (non defaulter).
The prediction of borrower status i.e. in future borrower will be defaulter or non defaulter is a
challenging task for any organization or bank. Basically the loan defaulter prediction is a
binary classification problem. Loan amount, costumer’s history governs his creditability for
receiving loan. The problem is to classify borrower as defaulter or non defaulter. However
developing such a model is a very challenging task due to increasing in demands for loans.
5
Exploratory records analysis (EDA) is an important step in any studies evaluation. The
number one aim with exploratory analysis is to look at the statistics for distribution, outliers
and anomalies to direct specific trying out of your speculation. It additionally offers tools for
hypothesis generation via visualizing and information the data usually through graphical
representation. EDA targets to help the natural styles reputation of the analyst. subsequently,
feature choice techniques regularly fall into EDA. because the seminal paintings of Tukey in
1977, EDA has received a large following because the gold widespread method to investigate
a statistics set. in step with Howard Seltman (Carnegie Mellon university), “loosely speakme,
any approach of looking at records that doesn't include formal statistical modeling and
inference falls under the term exploratory facts evaluation” . EDA is a fundamental early step
after information collection and pre-processing where the statistics is really visualized,
plotted, manipulated, without any assumptions, so that it will assist assessing the best of the
information and constructing fashions. “maximum EDA techniques are graphical in nature
with a few quantitative techniques. The reason for the heavy reliance on photos is that by
means of its very nature the principle role of EDA is to explore, and pictures gives the
analysts unprecedented energy to do so, even as being ready to advantage perception into the
facts. there are lots of ways to categorize the many EDA techniques”.
Exploratory information evaluation (EDA) is a totally vital step which takes place after
function engineering and acquiring data and it need to be performed earlier than any
modeling. this is because it's far very crucial for a facts scientist so one can understand the
nature of the facts with out making assumptions.
The purpose of EDA is to apply precis information and visualizations to better understand
facts, and discover clues approximately the tendencies of the records, its first-rate and to
formulate assumptions and the speculation of our evaluation. EDA is not approximately
making fancy visualizations or even aesthetically attractive ones, the goal is to try to answer
questions with data. Your intention should be that allows you to create a figure which a
person can have a look at in a couple of seconds and apprehend what is going on. If now not,
the visualization is too complicated (or fancy) and something comparable should be used.
EDA is also very iterative because we first make assumptions based totally on our first
exploratory visualizations, then construct some models. We then make visualizations of the
version results and song our fashions.
EDA is a phenomenon under data analysis used for gaining a better understanding of data
aspects like:
6
➢ Descriptive Statistics, which is a way of giving a brief overview of the dataset we are
dealing with, including some measures and features of the sample
➢ Grouping data
Descriptive Statistics
Descriptive data is a beneficial manner to apprehend characteristics of your records and to get
a short precis of it. Pandas in python provide an exciting approach describe(). The describe
feature applies primary statistical computations on the dataset like intense values, count of
statistics points general deviation and so on. Any lacking price or nan cost is automatically
skipped. Describe() characteristic offers a good photograph of distribution of information.
Grouping data
Group by is an interesting measure available in pandas which can help us figure out effect of
different categorical attributes on other data variables.
ANOVA
ANOVA stands for Analysis of Variance. It is performed to figure out the relation between
the different group of categorical data.
Under ANOVA we have two measures as result:
F-testscore : which shows the variaton of groups mean over variation
p-value: it shows the importance of the result
Correlation is a simple relationship between two variables in a context such that one variable
affects the other. Correlation is different from act of causing. One way to calculate correlation
among variables is to find Pearson correlation.
Its purpose is to take a preferred view of a few given statistics with out making any
assumptions approximately it. we are looking to get a feel for the records and what it would
suggest rather than reject or be given some type of premise round it before we start its
exploration.
In different phrases, with EDA we allow the information communicate for itself as opposed to
looking to pressure the facts into some form of pre-decided version.
although, a few techniques are used to help us get a sense for the facts. as an instance, we are
able to categorize facts, quantify some of its primary components, or visualize it.
for instance, uncooked records may be plotted using histograms or other visualization
strategies. every so often, the facts is juxtaposed in a manner that facilitates us spot vital styles
within or between records sets.
And perhaps, most importantly, EDA is used to help figure out our next steps with respect to
the data. For instance, we might have new questions we need answered or new research we
need to conduct.
PYTHON INTRODUCTION
• Python is Interactive − You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
History of Python
Python was developed by Guido van Rossum in the late eighties and early nineties at the
National Research Institute for Mathematics and Computer Science in the Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,
SmallTalk, and Unix shell and other scripting languages.
Python is copyrighted. Like Perl, Python source code is now available under the GNU
General Public License (GPL).
Python is now maintained by a core development team at the institute, although Guido van
Rossum still holds a vital role in directing its progress.
Python Features
Python's features include −
• Easy-to-learn − Python has few keywords, simple structure, and a clearly defined
syntax. This allows the student to pick up the language quickly.
• Easy-to-read − Python code is more clearly defined and visible to the eyes.
• A broad standard library − Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.
• Interactive Mode − Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.
• Portable − Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
9
• Extendable − You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more efficient.
• GUI Programming − Python supports GUI applications that can be created and
ported to many system calls, libraries and windows systems, such as Windows MFC,
Macintosh, and the X Window system of Unix.
• Scalable − Python provides a better structure and support for large programs than
shell scripting.
10
CHAPTER 2
LITERATURE SURVEY
AmiraKamil Ibrahim Hassan, Ajith Abraham (2008) makes use of a prediction model that's
built the use of three special education algorithms to train a supervised twolayer feed-ahead
community. The results display that the training algorithm improves the design of mortgage
default prediction version.
Angelini (2008) used a neural network with preferred topology and a feed-forward neural
community with ad hoc connections. Neural community may be used for prediction version.
This paper suggests that the above fashions provide most useful outcomes with less error.
Ngai (2009) makes use of the type model for predicting the future behaviour of costumers in
CRM. In CRM area, the typically used model is neural network. He recognized 80 seven
articles associated to records mining programs and strategies among 2000 and 2006.
Dr. A. Chitra and S. Uma (2010) introduced a ensemble gaining knowledge of technique for
prediction of time collection primarily based on Radial foundation function networks (RBF),
okay - Nearest Neighbor (KNN) and Self Organizing Map (SOM). They proposed a version
particularly PAPEM which carry out higher than person version.
Akkoç (2012) used a model namely hybrid Adaptive Neuro-Fuzzy Inference model, grouping
of facts and Neuro-Fuzzy community. a 10-fold go validation is used for better outcomes and
a contrast with other fashions.
Sarwesh web site, Dr.Sadhna okay. Mishra (2013) proposed a method in which two or more
classifiers are blended together to supply an ensemble version for the better prediction. They
used the bagging and boosting techniques after which used random wooded area method.
Maher Alaraj, MaysamAbbod, and ZiadHunaiti (2014) proposed a brand new ensemble
approach for classification of costumer mortgage. This ensemble method is based totally on
neural community. They kingdom that the proposed approach provide higher effects and
accuracy as compared to single classifier and any other version.
AlarajM ,AbbodM (2015) added a model that are based totally on homogenous and
heterogeneous classifiers. Ensemble version primarily based on three classifiers which are
logistic artificial neural network, logistic regression and aid vector system.
11
CHAPTER 3
PROBLEM STATEMENT
Take a consumer finance company which specialises in lending various types of loans to
urban customers. When the company receives a loan application, the company has to make a
decision for loan approval based on the applicant’s profile. Two types of risks are associated
with the bank’s decision:
If the applicant is likely to repay the loan, then not approving the loan results in a loss of
business to the company
If the applicant is not likely to repay the loan, i.e. he/she is likely to default, then approving
the loan may lead to a financial loss for the company
The dataset using for programming contains the information about past loan applicants and
whether they ‘defaulted’ or not. The aim is to identify patterns which indicate if a person is
likely to default, which may be used for taking actions such as denying the loan, reducing the
amount of loan, lending (to risky applicants) at a higher interest rate, etc.
OBJECTIVES
(1) Identification of variables that have impact on loan status i.e. Probability if loan default
(2) Data exploration, cleanup, univariate, segmented univariate and bivariate analysis to
(4) Probability of Charged Off or default =Number of Applicants who defaulted/ Total No.of
Applicants
12
RESEARCH METHDOLOGY
collection
of data
set
result feature
analysis selection
test train
model model
1.Collection of data set : the information set include the information of mortgage given by
organization. In data series , device load the excel sheet containing loan records.
2. Feature selection: pick out right functions from the information set to get right effects.
function choice is one of the center ideas in system getting to know which hugely influences
the performance of your version. The data functions which you use to train your machine
gaining knowledge of models have a large influence on the performance you could reap.
3.Train model : train your machine model on the basis of selected features. The process of
training an ML model involves offering an ML algorithm (this is, the getting to know
algorithm) with schooling records to study from. The term ML version refers back to the
model artifact this is created with the aid of the training procedure. you could use the ML
version to get predictions on new facts for that you do no longer realize the goal.
5. Result analysis: compare predictive value with real value to get the accuracy ofmachine
model.
CHAPTER 4
RESULTS
For continuous variables, we need to understand central tendency and spread hence following
plots are beneficial
• Box plot
• Violin plots
Findings
➢ There are many outliers and hence need to be removed to do any further analysis
For categorical variables, frequency tables can be used to understand distribution of each
category. Metrics possible are count and count%. Following are beneficial:
15
a. Count plot
b. Bar chart
Findings
1. Loans are majorly from 10+ years , < 1year, 2 years, 3 years and 4 years
1. B – 30%
2. A – 25%
3. C – 20%
17
Findings
1. Purpose of loans
2. Loan Status
3. Home Ownership
Findings
1. Address States
2. Issue Dates
1. From Heat map, it is evident that loan amount, fund amount, fund amount investors,
instalment are highly co-related
2. Small business shows large charged off / defaults, followed by medical and moving
21
1. Probability of Charged Off or default =Number of Applicants who defaulted/ Total No.of
Applicants
2. Probability of charged off shows variation wrt loan states however that is not significant
1. Probability of Charged Off or default =Number of Applicants who defaulted/ Total No.of
Applicants
2. Probability of charged off shows variation wrt loan purpose however that is significant for
small business, medical and moving
1. Probability of Charged Off or default =Number of Applicants who defaulted/ Total No.of
Applicants
2. Probability of charged off shows variation wrt grade and it increases with changing grades
from A to G
4. Sub grade is a sub-parameter and follows a similar trend, however we are not taking as a
predictor variable as Grade would solve the purpose.
23
1. Probability of default decreases with increase in annual income. Lower the income, lower
is probability of default
2. Interest rates from 10% and above has probability of default of more 12%.
3. 15% and above interest rates, probability of default increases to 23% 4. Interest rate can be
considered as predictor variable.
24
1. Loan amount slabs increase and it increases beyond 10K loan amount
2. Employees with 1 year or are self employed (0 years) have maximum default
2. Loan principal amount received is small, the probability of default is high at 60%
10. Analysis (loan payment slabs / interest amount received v/s probability of default)
2. Loan payment amount received is small, the probability of default is high at 60%.
27
Following can be considered as major predictor variables for target variable loan status and
predict a higher probability of default
1. Purpose of Loan - Small Business has maximum probability of default 26%. Medical,
moving and debt consolidation are next in the list
2. Employment Length - Self employed and 1 year has more chances of default
4. Interest Rate - 10% and above interest rate has higher probability of default
6. Loan Payment amount - Lower the amount paid back higher is the probability of default
Time series evaluation can be executed using the loan information of numerous years, for
prediction of the approximate time, whilst the consumer can default. future analysis may be
finished on predicting the approximate interest fees that the loan applicant is expected to get
as consistent with his profile if his mortgage is authorized. This could be beneficial for loan
applicants, in view that some banks approve loans, but deliver very high hobby fees to the
consumer. It would provide the customers a rough insight concerning the interest rates that
they should be getting for his or her profile and it's going to ensure they don’t grow to be
paying a whole lot greater quantity in interest to the financial institution. an software may be
constructed, with the intention to take numerous inputs from the consumer like, employment
duration, salary, age, marital status, ssn, deal with, mortgage amount, loan period and many
others. And provide a prediction of whether or not their mortgage utility may be authorized by
the banks or now not based on their inputs along side an approximate interest prices.
28
CHAPTER 4
Import pandas as pd
Df = pd.read_csv("stock_data.csv")
Print(df)
Df = pd.read_csv("stock_data.csv", skiprows=1)
Print(df)
Df = pd.read_csv("stock_data.csv", header=1)
Print(df)
Print(df)
Df = pd.read_csv("stock_data.csv", nrows=2)
Print(df)
Print(df)
Df = pd.read_csv("stock_data.csv", na_values={
'revenue': [-1],
})
Print(df)
Df.to_csv("new.csv", index=false)
Df.columns
29
Df.to_csv("new.csv",header=false)
Df = pd.read_excel("stock_data.xlsx","sheet1")
Print(df)
Def convert_people_cell(cell):
if cell=="n.a.":
return cell
Def convert_price_cell(cell):
if cell=="n.a.":
return 50
return cell
Df = pd.read_excel("stock_data.xlsx","sheet1", converters= {
'people': convert_people_cell,
'price': convert_price_cell
})
Print(df)
Df_stocks = pd.dataframe({
})
Df_weather = pd.dataframe({
'day': ['1/1/2017','1/2/2017','1/3/2017'],
'temperature': [32,35,28],
})
df_stocks.to_excel(writer, sheet_name="stocks")
df_weather.to_excel(writer, sheet_name="weather")
Import pandas as pd
Import numpy as np
Df = pd.read_csv("weather_data.csv")
Print(df)
Print(new_df)
Print(new_df)
New_df = df.replace({
'temperature': -99999,
'windspeed': -99999,
'event': '0'
}, np.nan)
31
Print(new_df)
Print(new_df)
Df = pd.dataframe({
})
Df
Import pandas as pd
India_weather = pd.dataframe({
"city": ["mumbai","delhi","banglore"],
"temperature": [32,45,30],
})
India_weather
Us_weather = pd.dataframe({
"temperature": [21,14,35],
})
Us_weather
32
Df = pd.concat([india_weather, us_weather])
Df
Df
Df
Df.loc["us"]
Df.loc["india"]
Temperature_df = pd.dataframe({
"city": ["mumbai","delhi","banglore"],
"temperature": [32,45,30],
}, index=[0,1,2])
Temperature_df
Windspeed_df = pd.dataframe({
"city": ["delhi","mumbai"],
"windspeed": [7,12],
}, index=[1,0])
Windspeed_df
Df = pd.concat([temperature_df,windspeed_df],axis=1)
Df
S = pd.series(["humid","dry","rain"], name="event")
Df = pd.concat([temperature_df,s],axis=1)
Df
33
Import pandas as pd
Df = pd.dataframe(weather_data)
Print(df)
Print("rows:", rows)
Print("columns:", columns)
Print(df[2:5]) #indexing
Print(df.columns)
Print(df['temperature'].max())
Print(df['temperature'].min())
Print(df[df.temperature == df.temperature.max()])
34
Print(df)
Print(df)
Print(df.loc['snow']) // snow
35
REFERENCES
[2]. Aafer Y, Du W &Yin H 2013, DroidAPIMiner: ‘Mining API-Level Features for Robust
Malware Detection in Android’, in: Security and privacy in Communication Networks
Springer, pp 86-103 .
[5]. J. R. Quinlan. Induction of Decision Tree. Machine Learning, Vol. 1, No. 1. pp. 81-106.,
1086.
[7]. J.R. Quinlan. Induction of decision trees. Machine learningSpringer, 1(1):81–106, 1086.
[8]. Andy Liaw and Matthew Wiener 2002. Classification and Regression by randomForest. R
News( http://CRAN.R-project.org/doc/Rnews/ ), 2(3):9–22, 2002.
[9]. S.S. Keerthi and E.G. Gilbert 2002 . Convergence of a generalizeSMO algorithm for
SVM classifier design. Machine Learning, Springer, 46(1):351–360, 2002.
[10]. J.M. Chambers. Computational methods for data analysis.Applied Statistics, Wiley,
1(2):1–10, 1077.
[11] G. Francesca, "A Discrete-Time Hazard Model for Loans: Some Evidence from Italian
Banking System," American Journal of Applied Sciences, vol. 9, p. 1337, 2012.
[12] S. Purohit and A. Kulkarni, "Credit evaluation model of loan proposals for Indian
Banks," in Information and Communication Technologies (WICT), 2011 World Congress on,
2011, pp. 868-873.
[13] D. Zakrzewska, "On integrating unsupervised and supervised classification for credit risk
evaluation," Information Technology and Control, vol. 36, pp. 98-102, 2007.
[14] M. L. Bhasin, "Data Mining: A Competitive Tool in the Banking and Retail Industries,"
Banking and finance, vol. 588, 2006.
[15] N. İkizler and H. A. Guvenir, "Mining interesting rules in bank loans data," in
Proceedings of the Tenth Turkish Symposium on Artificial Intelligence and Neural Networks,
2001.
[16] J. Nassali, "A Loan Assessment System for Centenary Rural Development Bank," 2005.
36
[17] T. Jacobson and K. Roszbach, "Bank lending policy, credit scoring and value-at-risk,"
Journal of banking & finance, vol. 27, pp. 615-633, 2003.
[18] G. Kabir, I. Jahan, M. H. Chisty, and M. A. A. Hasin, "Credit Risk Assessment and
Evaluation System for Industrial Project."
[19] B. Bodla and R. Verma, "Credit Risk Management Framework at Banks in India," ICFAI
Journal of Bank Management, Feb2009, vol.8, pp. 47-72, 2009.
[22] M. Anbarasi, E. Anupriya, and N. Iyengar, "Enhanced prediction of heart disease with
feature subset selection using genetic algorithm," International Journal of Engineering
Science and Technology, vol. 2, pp. 5370-5376, 2010.
[23] M. Du, S. M. Wang, and G. Gong, "Research on decision tree algorithm based on
information entropy," Advanced Materials Research,vol. 267, pp. 732-737, 2011.
[24] X. Liu and X. Zhu, "Study on the Evaluation System of Individual Credit Risk in
commercial banks based on data mining," in Communication Systems, Networks and
Applications (ICCSNA), 2010 Second International Conference on, 2010, pp. 308-311.
[26] M. Lopez, J. Luna, C. Romero, and S. Ventura, "Classification via clustering for
predicting final marks based on student participation in forums," Educational Data Mining
Proceedings, 2012.
[27] K.Kala, Dr. E.Ramaraj “ERPCA: A Novel Approach for Risk Evaluation of
Multidimensional Risk Prediction Clustering Algorithm” ,International Journal of computer
science and Engineering, ISSN : 0975-3397 Vol. 5 No. 10 Oct 2013.
[28] Pedro S, João M, and C. S, "Educational Data Mining: preliminary results at University
of Porto," pp. 1-11, JUNE 2014
[29] A.F.ElGamal, "An Educational Data Mining Model for Predicting Student Performance
in Programming Course," International Journal of Computer Applications, vol. 70, pp. 22-28,
May 2013.
[30] J. Nayak, B. Naik, and H. S. Behera, "A Comprehensive Survey on Support Vector
Machine in Data Mining Tasks: Applications & Challenges," International Journal of
Database Theory and Application, vol. 8, pp. 169- 186. [11] S. K. Shevade, S. S. Keerthi, C.
37