0% found this document useful (0 votes)
38 views37 pages

Nitin Project1

Uploaded by

Ishika Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views37 pages

Nitin Project1

Uploaded by

Ishika Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Project Report

On
“Credit card default prediction”

Submitted by student
of
“BACHELOR OF COMPUTER APPLICATIONS”
From
INSTITUTE OF TECHNOLOGY AND SCIENCE

NSTITUTE OF
ECHNOLOGY AND
CIENCE

SUBMITTED TO :- SUBMITTED BY :-

Prateek gupta Name :- Nitin kanojiya


(Data Scientist)
trainer Roll number :- R210934106201
Certificate
This is to Certify that ‘’Nitin Kanojiya’’ has carried out
the project work presented in this report entitled
“Credit card default prediction” for the award of
Bachelor Of Computer Applications from Institute of
Technology & Science, Mohan Nagar, Ghaziabad, under
my supervision.
The report embodies result of original work and studies
carried out by Student himself and the contents of the
report do not form the basis for the award of any other
degree to the candidate or to anybody else.

Date: Prateek Gupta


(Data Scientist)
Acknowledgement
In the accomplishment of completion of my project
on“Credit Card Default Prediction”I would like to
convey my special gratitude to Mr.Prateek gupta(Data
Scientist) and as well as Shri.Sunil kumar pandey
(director) of I.T.S education group .

your valuable guidance and suggestions helped me in


various phases of the completion of this project. I will
always be thankful to you in this regard.
I am ensuring that this project was finished by me and not
copied

Nitin kanojiya
(student of BCA)
Abstrect
Aiming at the problem that the credit card default
prediction data of a financial institution is unbalanced,
which leads to unsatisfactory prediction results, this project
proposes a prediction model based on various supervised,
classification algorithms. It mostly focuses on enhancing
the classifier performance for credit card default prediction.
Different machine learning models are also employed to
obtain efficient results.

We developed the hypothesis of whether developed models


using different machine learning techniques are
significantly the same or different and whether resampling
techniques significantly improves the performance of the
proposed models.

One-way Analysis of Variance is a hypothesis-testing


technique, used to test the significance of the results. The
split method is utilized to validate the results in which data
has split into training and test sets. The results using our
proposed methods significantly improve the accuracy of
Taiwan clients credit data set.
CONTENTS
S.no Title

1. Introduction

1.1 python

1.2 Brief History of Python

1.3 Python’s Benevolent Dictator For Life

1.4 Data type of python

1.5 Advantages & Disadvantages of Python

2. Machine learning

2.1 Defination

2.2 How machine learning works

2.3 Few examples of machine learning

2.4 Challenges of machine learning

3. Logistic regression

3.1 Defination

3.2Understanding of logistic regression

4. Credit Card Default Prediction code


5. Description of project

6. Conclusion

7. Future of the project

8. Bibliogaphy
Introduction

Python:-

Python is an interpreted, object-oriented, high-


level programming language with dynamic
semantics. Its high-level built in data structures,
combined with dynamic typing and dynamic
binding, make it very attractive for Rapid
Application Development, as well as for use as a
scripting or glue language to connect existing
components together. Python's simple, easy to
learn syntax emphasizes readability and therefore
reduces the cost of program maintenance. Python
supports modules and packages, which
encourages program modularity and code reuse.
The Python interpreter and the extensive
standard library are available in source or binary
form without charge for all major platforms, and
can be freely distributed.
Brief History of Python

- Invented in the Netherlands, early


90s by Guido van Rossum

- Named after Monty Python

- Open sourced from the beginning


Considered a scripting language, but
is much more

- Scalable ,object oriented and


functional from the beginning

- Used by Google from the beginning


- Increasingly popular
Python’s Benevolent Dictator For Life

“Python is an experiment in how much freedom


program-mers need. Too much freedom and nobody
can read another's code; too little and expressive-ness
is endangered.”
- Guido van Rossum
Advantages of Python

1. Easy to Read, Learn and Write


Python is a high-level programming language that has
English-like syntax. This makes it easier to read and
understand the code.

Python is really easy to pick up and learn, that is why a lot of


people recommend Python to beginners. You need less lines
of code to perform the same task as compared to other major
languages like C/C++ and Java.

2. Improved Productivity
Python is a very productive language. Due to the simplicity of
Python, developers can focus on solving the problem. They
don’t need to spend too much time in understanding the
syntax or behavior of the programming language. You write
less code and get more things done.

3. Interpreted Language
Python is an interpreted language which means that Python
directly executes the code line by line. In case of any error, it
stops further execution and reports back the error which has
occurred.

4. Dynamically Typed
Python doesn’t know the type of variable until we run the
code. It automatically assigns the data type during execution.
The programmer doesn’t need to worry about declaring
variables and their data types.
Disadvantages of Python

1. Slow Speed
We discussed above that Python is an interpreted language
and dynamically-typed language. The line by line execution
of code often leads to slow execution.

The dynamic nature of Python is also responsible for the


slow speed of Python because it has to do the extra work
while executing code. So, Python is not used for purposes
where speed is an important aspect of the project.

2. Not Memory Efficient


To provide simplicity to the developer, Python has to do a
little tradeoff. The Python programming language uses a
large amount of memory. This can be a disadvantage while
building applications when we prefer memory optimization.

3. Weak in Mobile Computing


Python is generally used in server-side programming. We
don’t get to see Python on the client-side or mobile
applications because of the following reasons. Python is not
memory efficient and it has slow processing power as
compared to other languages.

4. Database Access
Programming in Python is easy and stress-free. But when we
are interacting with the database, it lacks behind.

The Python’s database access layer is primitive and


underdeveloped in comparison to the popular technologies
like JDBC and ODBC.
Introduction
Machine learning :-

Machine learning is a branch of artificial intelligence(AI)


and computer science which focuses on the use of data
and algorithms to imitate the way that humans learn,
gradually improving its accuracy.
Machine learning is an important component of the growing
field of data science. Through the use of statistical methods,
algorithms are trained to make classifications or predictions,
and to uncover key insights in data mining projects. These
insights subsequently drive decision making within
applications and businesses, ideally impacting key growth
metrics. As big data continues to expand and grow, the
market demand for data scientists will increase. They will be
required to help identify the most relevant business
questions and the data to answer them.

Machine learning algorithms are typically created using


frameworks that accelerate solution development, such as
TensorFlow and PyTorch.
How machine learning works

1. A Decision Process: In general, machine learning


algorithms are used to make a prediction or
classification. Based on some input data, which can be
labeled or unlabeled, your algorithm will produce
an estimate about a pattern in the data.

2. An Error Function: An error function evaluates the


prediction of the model. If there are known examples,
an error function can make a comparison to assess
the accuracy of the model.

3. A Model Optimization Process: If the model can fit


better to the data points in the training set, then
weights are adjusted to reduce the discrepancy
between the known example and the model estimate.

4. The algorithm will repeat this “evaluate and


optimize” process, updating weights autonomously
until a threshold of accuracy has been met.
Here are just a few examples of machine
learning you might encounter every day:

Speech recognition:
It is also known as automatic speech recognition (ASR),
computer speech recognition, or speech-to-text, and it is a
capability which uses natural language processing (NLP) to
translate human speech into a written format. Many mobile
devices incorporate speech recognition into their systems to
conduct voice search—e.g. Siri—or improve accessibility for
texting.

Customer service:
Customer service: Online chatbots are replacing human agents
along the customer journey, changing the way we think about
customer engagement across websites and social media
platforms. Chatbots answer frequently asked questions (FAQs)
about topics such as shipping, or provide personalized advice,
cross-selling products or suggesting sizes for users. Examples
include virtual agents on e-commerce sites; messaging bots,
using Slack and Facebook Messenger; and tasks usually done by
virtual assistants and voice assistants.

Computer vision:
This AI technology enables computers to derive meaningful
information from digital images, videos, and other visual inputs,
and then take the appropriate action. Powered by convolutional
neural networks, computer vision has applications in photo
tagging on social media, radiology imaging in healthcare, and
self-driving cars in the automotive industry.
Recommendation engines:
Using past consumption behavior data, AI algorithms can help
to discover data trends that can be used to develop more
effective cross-selling strategies. This approach is used by online
retailers to make relevant product recommendations to
customers during the checkout process.

Challenges of machine learning

As machine learning technology has developed, it has certainly


made our lives easier. However, implementing machine learning
in businesses has also raised a number of ethical concerns about
AI technologies. Some of these include

Technological singularity :-
While this topic garners a lot of public attention, many
researchers are not concerned with the idea of AI surpassing
human intelligence in the near future. Technological singularity
is also referred to as strong AI or superintelligence. Philosopher
Nick Bostrum defines superintelligence as “any intellect that
vastly outperforms the best human brains in practically every
field, including scientific creativity, general wisdom, and social
skills.” Despite the fact that superintelligence is not imminent in
society, the idea of it raises some interesting questions as we
consider the use of autonomous systems, like self-driving cars.
It’s unrealistic to think that a driverless car would never have an
accident, but who is responsible and liable under those
circumstances? Should we still develop autonomous vehicles, or
do we limit this technology to semi-autonomous vehicles which
help people drive safely? The jury is still out on this, but these
are the types of ethical debates that are occurring as new,
innovative AI technology develops.

AI impact on jobs :-
While a lot of public perception of artificial intelligence centers
around job losses, this concern should probably be reframed.
With every disruptive, new technology, we see that the market
demand for specific job roles shifts. For example, when we look
at the automotive industry, many manufacturers, like GM, are
shifting to focus on electric vehicle production to align with
green initiatives. The energy industry isn’t going away, but the
source of energy is shifting from a fuel economy to an electric
one.

In a similar way, artificial intelligence will shift the demand for


jobs to other areas. There will need to be individuals to help
manage AI systems. There will still need to be people to address
more complex problems within the industries that are most
likely to be affected by job demand shifts, such as customer
service. The biggest challenge with artificial intelligence and its
effect on the job market will be helping people to transition to
new roles that are in demand.
Logistic Regression
Logistic regression is a supervised machine learning algorithm
mainly used for binary classification where we use a logistic
function, also known as a sigmoid function that takes input as
independent variables and produces a probability value
between 0 and 1. For example, we have two classes Class 0 and
Class 1 if the value of the logistic function for an input is greater
than 0.5 (threshold value) then it belongs to Class 1 it belongs
to Class 0. It’s referred to as regression because it is the
extension of linear regression but is mainly used for
classification problems. The difference between linear
regression and logistic regression is that linear regression
output is the continuous value that can be anything while
logistic regression predicts the probability that an instance
belongs to a given class or not.
Understanding Logistic Regression

It is used for predicting the categorical dependent variable


using a given set of independent variables.
 Logistic regression predicts the output of a categorical
dependent variable. Therefore the outcome must be a
categorical or discrete value.
 It can be either Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and 1, it gives the
probabilistic values which lie between 0 and 1.
 Logistic Regression is much similar to the Linear
Regression except that how they are used. Linear Regression
is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
 In Logistic regression, instead of fitting a regression line,
we fit an “S” shaped logistic function, which predicts two
maximum values (0 or 1).
 The curve from the logistic function indicates the
likelihood of something such as whether the cells are
cancerous or not, a mouse is obese or not based on its weight,
etc.
Description of the project

The Credit Card Default Prediction project using logistic


regression aims to assess the likelihood of a credit card user
defaulting on payments. Logistic regression is employed as a
statistical model to predict binary outcomes, in this case,
whether a customer will default or not. The project typically
involves data preprocessing, feature selection, model training,
and evaluation.

Prediction project using logistic regression is a comprehensive


endeavor involving several key steps and considerations:

Data Collection:
Gather information about credit card users, like their credit
scores, payment history, and financial details.

Data Cleaning:
Make sure the gathered data is clean and ready for analysis by
handling missing values and fixing any errors.

Understanding the Data:


Look at the data to understand patterns and relationships
between different factors.

Choosing Important Factors:


Decide which factors, like credit score or payment history, are
most crucial for predicting if someone might fail to pay their
credit card bill.

Training a Model:
Use a mathematical model (logistic regression) to learn from the
data and make predictions based on the chosen factors.

Adjusting the Model:


Fine-tune the model to make it better at predicting by adjusting
its settings.

Checking Model Accuracy:


Test the model to see how well it predicts reality using metrics
like accuracy and precision.

Making Practical Decisions:


Decide on a threshold to determine when the model predicts
someone might not pay their credit card bill.

Understanding the Results:


Figure out what the model is telling us by looking at the factors
that strongly influence its predictions.

Using the Model:


Put the model to work in real situations, and keep an eye on how
well it's doing over time.
Conclusion
In conclusion, the Credit Card Default Prediction project,
utilizing logistic regression, represents a sophisticated and
systematic approach to addressing the critical challenge of
forecasting whether credit card users are at risk of defaulting on
their payments. This multifaceted endeavor encompasses
various stages, commencing with the careful collection and
preprocessing of pertinent data. The subsequent steps involve
delving into exploratory data analysis to extract meaningful
insights and selecting key features that significantly influence
the prediction of default.

The heart of the project lies in the application of logistic


regression, a statistical technique well-suited for binary
classification tasks. Through an intricate process of model
training, the algorithm learns from historical data patterns,
estimating coefficients for each feature to discern their impact
on the likelihood of default. The subsequent fine-tuning of
model parameters optimizes its predictive performance,
ensuring a robust and reliable tool for credit risk assessment.

The project's success is contingent on rigorous evaluation,


employing metrics such as accuracy, precision, recall, and F1
score to gauge the model's effectiveness. Careful consideration is
given to the selection of a threshold that strikes a balance
between identifying potential defaulters and avoiding
unnecessary false alarms.
Future of the project
Looking ahead, the Credit Card Default Prediction project using
logistic regression could see improvements in accuracy by
adopting more advanced techniques. We might include new
types of data, like how people behave financially, and focus on
making the model easier to understand. The future could also
bring models that learn and adapt in real-time, ensuring they
stay effective as economic conditions change. Ethical use of AI,
following rules, and collaborating with others in the industry
will continue to be important. Ultimately, the goal is to create a
reliable and fair system for predicting credit card defaults while
keeping up with technology and ethical standards.
Advanced Techniques: Explore more sophisticated modeling
methods beyond logistic regression for improved accuracy.

Diverse Data Sources:


Integrate additional data types, like behavioral information, to
enhance the model's predictive capabilities.

Interpretability:
Focus on making the model easier to understand for
stakeholders, including customers and regulators.

Dynamic Adaptation:
Develop models that can adapt in real-time to changing
economic conditions for sustained relevance.

Integration with Decision Systems:


Closer alignment with decision support systems to facilitate
quicker and more informed decision-making.

Ethical Considerations:
Emphasize ethical AI practices, ensuring fairness and unbiased
outcomes in credit predictions.
Regulatory Compliance:
Stay updated on evolving regulations and compliance standards
in handling financial data.

Collaboration:
Work collaboratively with industry partners, data scientists, and
regulators to establish best practices.

Global Economic Trends:


Consider the adaptability of the model to global economic trends
and geopolitical factors.

Continuous Monitoring:
Implement robust systems for ongoing monitoring and proactive
model maintenance ensure accuracy.
Bibliography

 WEB SITE
WIKIPEDIA.ORG
KAGGLE.COM
OPENAI.COM
GEEKSFORGEEKS.ORG

 SOFTWARE
ANACONDA
MICROSOFT WORD
MICROSOFT POWER POINT

 DATA SETS
WWW.KAGGLE.COM

 BOOKS
Drugov, V. G. (n.d.). Default Payments of Credit Card
Clients in Taiwan from 2005. Retrieved April 23, 2019.

M. Zan, G. Yanrong, and F. Guanlong, “CREDIT CARD


DEFAULT PREDICTION,” Journal of Computer Applications
vol. 39, no. 2, pp. 314–318, 2019

You might also like