Ganesh
Ganesh
Chapter 1
INTRODUCTION
Loan prediction is one of the most important and most prominent research areas in the field
of banking and insurance sectors. In the modern environment identifying and analyzing the
patterns of the obtained sample data set plays a vital role in this era. The loan prediction
involves the application of various machine-learning algorithms. There are some prediction
systems in the market using deep learning and so on. However, those are limited with
certain features and cannot assist the users beyond those limits. The loan prediction project
is developed using machine learning algorithms such as logistic regression. The Python
programming language is used for the implementation of the code.
The html pages are developed for deployment of website using Visual Studio code.
The proposed system can deliver high accuracy results and moderate loss for training and
validate data. Finally, the results show the model implemented with high accuracy. Further,
this work can be extended in order to improve the focus where the high accuracy can be
obtained.The data of previous customers of various banks to whom on a set of parameters
loan were approved. Therefore, the machine-learning model is trained on that record to get
accurate results. Our main objective of this research is to predict the safety of loan. To
predict loan safety, the logistic regression algorithm is used. First, the data is cleaned to
avoid the missing values in the data set.
To train our model data set of 1500 cases and 10 numerical and eight categorical
attributes has been taken. To credit a loan to customer various parameters like CIBIL
(Credit History), Business Value, Assets of Customer etc. has been considered. List of
parameters as shown below: Qualification Categorical In Service / Business Owner
Categorical Individual income of Applicant Qualitative Individual income of Co Applicant
(if Any) Qualitative Amount of Loan required Qualitative Term for which loured
Qualitative Credit History of Applicant Qualitative Area of Property.
1.1 Python
Python is an interpreted high-level programming language for general-purpose
programming. Created by Guido van Rossum and first released in 1991, Python has a
design philosophy that emphasizes code readability, notably using significant whitespace.
Dept. Of ECE, VIT 1 2022-23
Machine Learning with Python
It provides constructs that enable clear programming on both small and large scales. In July
2018, Van Ross-um stepped down as the leader in the language community after 30 years.
Python interpreters are available for many operating systems. C Python, the reference
implementation of Python, is open source software and has a community-based
development model, as do nearly all of Python is other implementations.
The non-profit Python Software Foundation manages Python and C Python. Python
has a simple, easy to learn syntax emphasizes readability hence; it reduces the cost of
program maintenance. In addition, Python supports modules and packages, which
encourages program modularity and code reuse.
a set of rules to facilitate the formatting of code. Additionally, the wide base of users and
active developers has resulted in a rich internet resource bank to encourage development
and the continued adoption of the language.
2. Data visualization:-
A self-explanatory name. Taking data and turning it into something colorful. Included here:
Matplotlib; Seaborne; Data sheet; others.
4. Deep learning: -
This is a subset of machine learning that is seeing a renaissance, and is commonly
implemented with K eras, among other libraries. It has seen monumental improvements
over the last 5 years, such as Alex Net in 2012, which was the first design to incorporate
consecutive convolution layers. Included here: K eras, Tensor Flow, and a whole host of
others.
1. Data loading:
Load a datasets “prisoners.csv” using pandas and display the first and last five rows in the
datasets. Then find out the number of columns using describe method in Pandas.
2. Data Manipulation:
Create a new column -“total benefitted”, which is the sum of inmates benefitted through
all modes.
3. Data Visualization:
Create a bar plot with each state name on the x-axis and their total benefitted inmates as
their bar heights.
1.4.1 Numpy
Numpy is the fundamental package for scientific computing with Python. It contains
among other things:
• a powerful N-dimensional array object
• sophisticated (broadcasting) functions
• tools for integrating C/C++ and Fortran code
• Useful linear algebra, Fourier transform, and random number capabilities.
Besides its obvious scientific uses, Numpy can also be used as an efficient multi -
dimensional container of generic data. Arbitrary data-types can be defined. This allows
Numpy to seamlessly and speedily integrate with a wide variety of databases. Numpy is
licensed under the BSD license, enabling reuse with few restrictions. The core functionality
of Numpy is its "ND array", for n-dimensional array, data structure. These arrays are stride
views on memory.
In contrast to Python's built-in list data structure (which, despite the name, is a
dynamic array), these arrays are homogeneously typed: all elements of a single array must
be of the same type. Numpy has built-in support for memory-mapped arrays.
1.4.2 Pandas
Pandas is an open-source, BSD-licensed Python library providing high-performance, and
easy-to-use data structures and data analysis tools for the Python programming language.
Python with Pandas is used in a wide range of fields including academic and commercial
domains including finance, economics, Statistics, analytics, etc. In this tutorial, we will
learn the various features of Python Pandas and how to use them in practice. The name
Pandas is derived from the word Panel Data – an Econometrics from Multidimensional
data. In 2008, developer Wes McKinney started developing pandas when in need of high
Performance, flexible tool for analysis of data. Prior to Pandas, Python was majorly used
for data managing and preparation.
It had very little contribution towards data analysis. Pandas solved this problem.
Using Pandas, we can accomplish five typical steps in the processing and analysis of data,
regardless of the origin of data — load, prepare, manipulate, model, and analyze. Python
with Pandas is used in a wide range of fields including academic and commercial domains
including finance, economics, Statistics, analytis etc.
1.4.3 Matplotlib
Matplotlib is a Python 2D plotting library, which produces publication quality figures in a
variety of hardcopy formats and interactive environments across platforms. Matplotlib can
be used in Python scripts, the Python and I Python shells, the Jupyter notebook, web
application servers, and four graphical user interface toolkits. Matplotlib tries to make easy
things easy and hard things possible. You can generate plots, histograms, power spectra,
bar charts, error charts, scatterplots, etc., with just a few lines of code.
The sample plots and thumbnail gallery for simple plotting examples, the pl ot
module provides a MATLAB-like interface, particularly when combined with Python. For
the power user, you have full control of line styles, font properties, axes properties, etc. via
an object-oriented interface or via a set of functions familiar to MATLAB users.
Chapter 2
ORGANIZATION PROFILE
The teams have a unique blend of functional and operational knowledge, along with
technical expertise and result-oriented management experience ranging from Application
Development to end-to-end IT Implementation projects. Our organisation derives its
strength from its strong leadership team focused on inspiring an environment of
entrepreneurial culture seeped in delivering exceptional value to the customers. The
company network portfolio consists two companies “TechCiti Technologies Private
Limited” and “TechCiti Software Consulting Private Limited “. TechCiti Technologies
Private Limited being the parent company and TechCiti Software Consulting Private
Limited being the deemed subsidiary of TechCiti Technologies Private Limited.
Vision of TechCiti
Our vision is to enable people and organizations realize their potential reinventing their
engagement in defining the future using - technology.
Mission of TechCiti
Our mission is to achieve the leading position as a distinguished & absolute end-to-end
information technology infrastructure & service provider. We want to develop with
profitable growth through superior Customer service, Innovation, Quality and
Commitment.
2016 - Created PAN India presence by opening branch offices in Chennai, Pune and Delhi
2017 - Developed its own ERP based software application related to HRMS (Human
Subscribers
2018 – Developed a Web based ERP application related to Visitor Management System
Named “e- Visitor”. At present, we have more than 1500+ corporate subscribers.
2018 - TechCiti Technologies Private Limited established its deemed subsidiary TechCiti
2018 - TechCiti signed a partnership with Riverbed and has become their sole partners
2019 – We developed an ERP based accounting software.
Stage 2: Analysis
Stage 3: Design
Stage 4: Implementation
Stage 6: Maintained
Chapter 3
SYSTEM ANALYSIS
System analysis is “the process of studying a procedure or business to identify its goal and
purposes and creates systems and procedures that will efficiently achieve them”
You are done with the successful installation of Python and Anaconda in our system,
now let us set up Jupyter Notebook.
1. To launch Jupyter Notebook via command line, simply open our Anaconda
Windows Command Prompt. Here, type and run Jupiter Notebook as shown in
Fig: 3.4.
2. A Jupyter Notebook dashboard will open on our default your browser as shown in Fig:
3.5.
3. Here, click on New→ then select Python 3
4. A new python kernel will be opened, and you are ready to write a new program
Chapter 4
recognition, speech-to-text and natural language generation. These neural networks work
by combing through millions of examples of training data and automatically identifying
often-subtle correlations between many variables. Once trained, the algorithm can use its
bank of associations to interpret new data. These algorithms have only become feasible in
the age of big data, as they require massive amounts of training data.
This near immediate response is critical in a niche where bots, viruses, worms,
hackers and other cyber threats can affect thousands or even millions of people in
minutes.
4 Automation:
Machine Learning is a key component in technologies such as predictive analytics and
artificial intelligence. The automated nature of Data Science means it can save time and
money, as developers and analysts are freed up to perform high-level tasks that a
computer simply cannot handle. On the other side, you have a computer running the
show and that is something that is certain to make any developer squirm with
discomfort. For now, technology is imperfect. Still, there are workarounds. For
instance, if you are employing Data Science technology in order to develop an
algorithm, you might program the Data Science interface so it just suggests
improvements or changes that must be implemented by a human.
This workaround adds a human gatekeeper to the equation, thereby eliminating the
potential for problems that can arise when a computer is in charge. After all, an
algorithm update that looks good on paper may not work effectively when it is put
practice.
HARDWARE REQUIREMENTS
• System : Pentium IV 2.4 GHz.
• Hard Disk : 40 GB.
• Floppy Drive : 1.44 Mb.
• Monitor : 15 VGA Colour.
• Mouse : Logitech.
• Ram : 512 Mb.
HTML: HTML stands for Hyper Text Markup Language, which is the most widely used
language on Web to develop web pages. Hypertext refers to the way in which Web pages
(HTML documents) are linked together. HTML is a Markup Language, which means you
use HTML to simply "mark-up" a text document with tags that tell a Web browser how to
structure it to display
CSS: CSS is the acronym for "Cascading Style Sheet". CSS handles the look and feel part
of a web page. Using CSS, you can control the color of the text, the style of fonts, the
spacing between paragraphs, how columns are sized and laid out, what background images
or colors are used, layout designs, and variations in display for different devices and screen
sizes as well as a variety of other effects.
BOOTSTRAP: Bootstrap is the popular HTML, CSS and JavaScript framework for
developing a responsive and mobile friendly website. Bootstrap is used to create responsive
websites.
Django
Django is an open source and web framework present in python, which is developed and
maintain by DSF (Django Software Foundation). Now a days Django widely in used
because of it is more built in functionalities. There are some famous and well -known
companies and apps are using Django for the development of their websites and those
companies and apps are Google, Instagram, Discus, Spotify, You Tube, Pinterest, .It is
used in web development in python.
It support templates and static files that means you can easily render the HTML pages
by putting all the HTML files in the directory called ‘templates’ and similarly you can place
all the files related to styles like CSS and JS will be placed inside the directory called
‘static’. In this project, Django is used for the front-end development. Further Django
provide more features as compared to other frameworks and those features are given below.
The tool has been designed using Jupyter (python integrated development environment),
Integrated development environment, big data, Hadoop. The user interacts with the tool
using a GUI.
• The GUI operates in two forms, backend, and Frontend.
• Frontend shows the interface and Tkinter frames, windows, canvas, etc.
• Backend used to execute several queries to extract useful insights.
or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it
gives the probabilistic values which lie between 0 and 1 as shown in Fig: 4.1.
• Logistic Regression is much similar to the Linear Regression except that how they
are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.
• The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight,
etc. Logistic Regression is a significant machine-learning algorithm because it has
the ability to provide probabilities and classify new data using continuous and
discrete datasets.
• Logistic Regression can be used to classify the observations using different types
of data and can easily determine the most effective variables used for the
classification. The below image is showing the logistic function:
• Decision Tree is a supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision rules and each
leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches as shown in Fig: 4.2.
• The decisions or the test is performed based on features of the given dataset. A
decision tree contain categorical data (YES/NO) as well as numeric data
Chapter 5
RESULTS AND DISCUSSION
In a design report, the results and discussions may involve an evaluation of the design
or method used. In a feasibility or case study, the results and discussions section would
involve measuring the feasibility or evaluating the success of one or more solutions.
The anaconda prompt is used to create a project using Django tool and with some
commands as shown in Fig: 5.1
The index page is used to know the eligibility for a loan and enter the user’s login or register
details as shown in Fig: 5.2
The register page include the personal details of a user and once the provided details are
filled then click on register as shown in Fig: 5.3
The login page is used to login the site with username and password then click on login
as shown in Fig: 5.4
The user’s login page needs personal details to check the eligibility for a loan then click
on check status as shown in Fig: 5.5
The prediction page says the results of a user’s status weather profile is matched or
does not matched the eligibility criteria as shown in Fig: 5.6
CONCLUSION
Loan prediction is one of the most important and most prominent research areas in the field
of banking and insurance sectors. In the modern environment identifying and analyzing the
patterns of the obtained sample dataset plays a vital role in this era. The loan prediction
involves the application of various machine-learning algorithms. There are some prediction
systems in the market using deep learning and so on. However, those are limited with
certain features and cannot assist the users beyond those limits. The loan prediction project
is developed using machine learning algorithms such as logistic regression.
The Python programming language is used for the implementation of the code,
which has been developed in Anaconda prompt, and the html pages are developed for
deployment of website using Visual Studio code. In this project, Logistic Regression
Algorithm and Decision tree (CNN) machine learning algorithm for accurate prediction of
Loan. The proposed system can deliver high accuracy results and moderate loss for training
and validate data. Finally, the results show the model implemented with high accuracy.
Further, this work can be extended in order to improve the focus where the high accuracy
can be obtained.
REFERENCES
[1] Amado, Agran and Enri Plaza. “Case-based reasoning: Foundational issues,
methodological variations, and system approaches.” AI communications 7.1 (1994):39-59.
[2] Adebayo, Julius, Justin Gilmer, Michael Mulley, Ian Goodfellow, Moritz Hardt and
Been Kim. “Sanity checks for saliency maps.” arid preprint 2018.
[3] Alain, Guillaume, and Yoshie Bagnio. “Understanding intermediate layers using linear
classifier probes.” aria preprint 2016.
[6] Apley, Daniel W., and Jingyu Zhu. “Visualizing the effects of predictor variables in
black box supervised learning models.” Journal of the Royal Statistical Society: Series B
(Statistical Methodology) 82.4, pp: 1059-1086, 2020.
[7] Kumar Arun, Garg Ishan, Kaur Sanmeet, “Loan Approval Prediction based on Machine
Learning Approach‖”, IOSR Journal of Computer Engineering (IOSR-JCE), Vol. 18, Issue
3, pp. 79-81, Ver. I (May-Jun. 2016).
[8] Aboobyda Jafar Hamid and Tarig Mohammed Ahmed, Developing Prediction Model of
Loan Risk in Banks using Data Mining‖, Machine Learning and Applications: An
International Journal (MLAIJ), Vol.3, No.1, pp. 1-9, March 2016.
[9] S. Vimala, K.C. Sharmili, ―Prediction of Loan Risk using NB and Support Vector
Machine‖, International Conference on Advancements in Computing Technologies
(ICACT 2018), vol. 4, no. 2, pp. 110-113, 2018.
[10] Nikhil Madane, Siddharth Nanda,”Loan Prediction using Decision tree”, Journal of the
Gujrat Research History, Volume 21 Issue 14s, December 2019.
[11] Pidikiti Supriya, Myneedi Pavani, Nagarapu Saisushma, Namburi Vimala Kumari, k
Vikash,“Loan Prediction by using Machine Learning Models”, International Journal of
Engineering and Techniques. Volume5 Issue 2, Mar-Apr 2019