Credit Card Scoring Using ML PDF
Credit Card Scoring Using ML PDF
ON
CREDIT SCORING SYSTEM USING
ML
A report submitted in partial fulfillment of the requirements for the Award
of Degree of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING (AI & ML)
By
Under Supervision
of Ms. NIKITA
Ms. NIKITA
Department of CSE (AI & ML)
1
DECLARATION
We hereby declare that the work presented in this report entitles
“CREDIT SCORING SYSTEM USING ML ” for the partial fulfillment
of B.tech degree in Department of CS AI & ML, Greater Noida
Institute of Technology, Greater Noida is an authentic record of our
work under the supervision of Ms. NIKITA. The results of this
investigation reported in this project have so far not been reported
for any other Degree / Diploma or other technical forum. The
assistance and help received during the course of the investigation
have been duly acknowledged.
ABHISHEK KUMAR
(2001321530003)
ABHIJEET KUMAR
(20013201530002)
2
CERTIFICATE OF ACCEPTANCE
3
BONAFIDE CERTIFICATE
4
ACKNOWLEDGEMENT
5
ABSTRACT
6
TABLE OF CONTENTS
II
III
IV
VI
10-11
12-14
15-16
17-18
19-21
22-25
26-28
29
30-31
32-33
7
CHAPTER 1
INTRODUCTION
8
CHAPTER 2
OBJECTIVES
Our Objective with this Project is to introduce a more Robust and Reliable Score
using artificial intelligence and integrate CLV factor within the score itself to make it
more industry friendly.
Risk Assessment:
Objective Evaluation: The primary objective is to objectively assess the credit risk
associated with an individual or entity. By analyse credit reports and various financial
behaviours, the scoring system provides a numerical representation (credit score)
that reflects the likelihood of timely repayment.
Standardization:
Uniform Evaluation: Credit scoring systems standardize the evaluation process,
ensuring that all applicants are assessed using a consistent set of criteria. This
promotes fairness and prevents subjective bias in the lending process.
Cost Reduction:
Resource Optimization: Lenders can use credit scores to allocate resources more
effectively. By focusing attention on applicants with higher creditworthiness,
lenders can reduce the resources spent on manual underwriting for riskier
applicants.
9
CHAPTER 3
METHODOLOGY
The methodology of a credit scoring system involves the use of statistical models to analyze various
Use input variables from common sense for ML Model
Machine Learning Model using the same common sense input variables to predict who will default
and who won't :
10
If we use common sense; we will predict that XYZ is more likely to default and ABC will
most probably not default. ML Model predicts the same thing by studying past data.
Select all data types of object type (object type means non-numeric)
Col_for_dummies = Training_data.select_dtypes(include=['O']).columns
pd.getdummies()
11
CHAPTER 4
SYSTEM REQUIREMENT
The system requirements for a credit scoring system vary depending on the model's
complexity, the size of the dataset, and the specific needs of the organization.
However, the following general requirements and considerations are crucial to
developing and deploying an effective credit scoring system:
Hardware Requirements:
Processing Power: The system must have enough processing power to handle the
computation-intensive tasks associated with building and deploying credit scoring
models. Multi-core processors and scalable computing resources are
recommended.
Memory (RAM): Sufficient RAM is necessary for efficiently handling large datasets
during model development and scoring.
Storage: Adequate storage space is essential for storing historical credit data, model
parameters, and other relevant information.
Software Requirements:
Statistical Software: To build and validate credit scoring models, statistical software
such as R, Python (with libraries like scikit-learn, pandas, and NumPy), or specialized
tools like SAS or SPSS may be used.
Database Management System (DBMS): A robust DBMS is critical for storing and
managing large volumes of credit-related data. Popular choices include MySQL,
PostgreSQL, or NoSQL databases depending on the specific needs.
Data Integration Tools: To preprocess and prepare raw data for analysis, ETL tools
for extracting, transforming, and loading data are necessary.
Model Development Environment:
12
Development Frameworks: Machine learning frameworks like TensorFlow or
PyTorch may be used for complex scoring models. For traditional statistical
modeling, tools like RStudio or Jupyter notebooks are commonly employed.
Version Control: Version control systems (e.g., Git) are crucial for tracking changes
in the code and models over time.
Security Measures:
Data Encryption: Implement encryption mechanisms to secure sensitive credit-
related data, especially during data transmission and storage.
Access Controls: Define and enforce access controls to ensure that only
authorized personnel can access and modify the credit scoring system.
13
User Interface:
If the credit scoring system includes a user interface for manual intervention or
review, ensure it is user-friendly and integrates seamlessly with the underlying
systems. t's crucial to note that the specific requirements can vary based on the
organization's goals, regulatory environment, and the chosen technology stack.
Regular updates and maintenance are essential to keep the credit scoring system
effective and aligned with evolving business needs.
14
CHAPTER 5
Architecture
To calculate these scores, scoring models like FICO and VantageScore, utilize
predetermined algorithms that assign weights to different credit report elements.
Additionally, industry-specific scores may be employed, with scores tailored to
predict the likelihood of timely payments for auto loans.
The credit scoring model undergoes a training phase and leverages diverse
algorithms such as logistic regression, decision trees, support vector machines, and
neural networks to learn patterns and relationships within the data, optimizing its
ability to predict credit risk accurately.
15
Machine Learning-based Credit Scoring: Key Components
16
CHAPTER 6
Operational Principles
Predictive Modeling
The first step in predictive modeling is to collect and aggregate an extensive
amount of data that pertains to an individual's financial history, transactional
behavior, and other pertinent alternative data sources. The collected data
undergoes a rigorous process of cleaning, normalization, and preprocessing to
ensure its accuracy and consistency.
sse={) # error
tx_recency tx_user[['Recency']]
for k in range(1, 10):
kmeans KMeans(n_clusters-k, max_iter=1000).fit(tx_recency)
tx_recency["clusters"] = kmeans.labels_ #cluster names corresponding to recency values
18
CHAPTER 7
Challenges Faced
Credit scoring systems are essential to modern financial decision-making, but they
face various challenges related to technology, ethics, and regulation. Addressing
these challenges is crucial to ensuring the reliability, fairness, and security of credit
scoring operations.
Transparent Audits
To conduct algorithmic fairness audits, it is important to maintain
transparency in model decision-making. Complex machine learning models
present challenges in providing clear and interpretable insights into the
factors influencing credit scoring decisions.
Continuous Monitoring
Achieving fairness is an ongoing process. Continuous monitoring is
essential to detect and address biases that may emerge over time as
economic, social, or regulatory landscapes evolve.
Diverse Training Data
Bias in credit scoring systems often arises from historical disparities in
training data. Ensuring diversity in training data is crucial to avoid
discriminatory outcomes in credit decisions.
Representative Sampling
Obtaining a representative sample of the population in the training dataset
is challenging. The dataset must encompass diverse demographic groups
to avoid underrepresentation or overrepresentation of certain segments.
Dynamic Nature of Bias
Bias can evolve over time, and new patterns may emerge. Adapting the
training data to reflect these changes requires ongoing efforts to identify
and rectify biases as they arise.
Balancing the need for model explainability with the inherent complexity of
advanced machine learning models is a major challenge in credit scoring.
Ensuring that users can understand and trust the decisions made by complex
models is an ongoing ethical consideration. Educating users about how credit
score models work and the factors influencing their credit assessments is
essential. Financial institutions must develop effective communication strategies
to bridge the gap between technical intricacies and user comprehension.
21
CHAPTER 8
Recent Advancements
Explainable AI (XAI)
Interactive Dashboards
22
Innovative Data Integration Techniques for Credit Scoring
Holistic Assessment
By analyzing communication sentiment, financial narratives, and overall
communication patterns, NLP enables a more holistic assessment of
creditworthiness. The integration of NLP broadens the scope of data sources,
providing credit scoring models with additional context. This allows for a more
nuanced understanding of an individual's financial habits and reliability,
particularly for those with limited traditional credit histories.
Behavioral Biometrics
The incorporation of behavioral biometrics, such as keystroke dynamics and
mouse movement patterns, into credit scoring models provides unique markers to
assess an individual's authenticity and potential credit risk.
23
The Advantages of Dynamic Risk Scoring and Future Innovations in Credit Scoring
The introduction of dynamic risk scoring has revolutionized credit scoring systems.
The shift from static, point-in-time assessments to continuous monitoring of
financial behavior allows for real-time adjustments to credit scores based on
changing circumstances. Here's what you need to know:
Adaptability:
Dynamic risk-scoring models are highly adaptable, reflecting the dynamic nature of
financial behavior.
This adaptability ensures that credit scores remain relevant and reflective of an
individual's current financial standing.
Real-time adjustments enable immediate responses to changes in an individual's
financial situation. This is particularly valuable for both lenders and borrowers,
providing a more accurate representation of credit risk at any given moment.
Reduced Latency:
Edge computing can significantly reduce the latency associated with data
processing.
This is crucial in credit scoring, where quick decisions are essential for providing
timely financial services and responding to changing circumstances.
Decentralization:
The move towards decentralized processing aligns with broader trends in
distributed computing.
Edge computing offers the potential to distribute computing tasks across various
devices and locations, contributing to a more resilient and scalable credit scoring
infrastructure.
Exponential Growth of Data:
The increasing volume and variety of data, including the proliferation of the
Internet of Things (IoT), contribute to more nuanced credit scoring models.
Embracing this exponential growth in data allows credit scoring systems to
leverage a broader range of information for more accurate assessments.
24
The Impact of Technology on Creditworthiness Assessment
In recent years, technology has revolutionized the assessment of creditworthiness.
With data coming from various sources such as IoT devices, social media, and other
emerging technologies, a more comprehensive evaluation of creditworthiness can
be made. This includes considering aspects such as lifestyle, spending patterns,
and even health-related data. As data volumes continue to increase, machine
learning models are becoming more sophisticated in mining data and uncovering
subtle patterns. Therefore, as the availability of diverse data sources increases, so
will the advances in model training techniques and algorithms.
25
CHAPTER 9
Practical Implication
Expanded Access: Individuals who might have been excluded from traditional
lending due to a lack of credit history now have the opportunity to access credit.
This promotes financial inclusion by catering to a diverse demographic range.
Microfinance Opportunities: Credit scoring facilitates the extension of microloans
to individuals and small businesses, fostering entrepreneurship and economic
growth in underserved communities.
Risk Mitigation, Loan Approvals, and Interest Rates
Credit scoring systems play a crucial role in mitigating risk for financial institutions
by aiding in the decision-making process for loan approvals and determining
interest rates. This has several implications, including:
26
The Advantages of Credit Scoring: Improving Decision-making, Risk
Stratification, and Responsible Lending
Financial institutions can benefit from credit scoring by making more informed
decisions on loan approvals. This automated assessment process reduces the
time and resources required for manual evaluations and enables lenders to
categorize individuals into risk segments, allowing for more favorable interest
rates for lower-risk individuals.
Responsible Lending
Credit scoring systems support ethical and responsible lending practices within
the financial industry. Accurate credit scoring promotes fair lending practices by
objectively assessing an individual's creditworthiness, irrespective of
demographic factors, aligning with regulatory efforts to promote fairness in
lending.
27
Improving Financial Systems: The Benefits of Credit Scoring
Mitigating Overindebtedness:
Responsible lending involves ensuring that individuals do not become
overburdened with debt. Credit scoring is a tool that assesses an individual's
ability to manage financial obligations, thereby preventing overindebtedness.
Technological Advancements:
Modern credit scoring systems incorporate advanced technologies that
improve efficiency and adaptability. Machine learning algorithms and predictive
analytics streamline credit assessments, and automated processes enable rapid
decision-making and faster credit approvals.
Adaptability to Change:
Credit scoring systems that integrate dynamic risk scoring and alternative data
analysis are better suited to respond to economic changes and shifting user
behaviors.
Empowering Users:
Credit scoring systems provide users with insights into their creditworthiness,
helping them understand financial concepts and make informed decisions.
Transparent credit scoring systems allow individuals to identify areas for
improvement, leading to increased financial opportunities over time.
28
CHAPTER 10
Future Outlook
Reduced Decision Time: Edge computing holds the promise of reducing the time
it takes to process credit-related data. This could lead to faster decision-making,
providing individuals with quicker access to financial services.
Enhanced Scalability: Decentralized processing can enhance the scalability of
credit scoring systems. Edge computing enables the distribution of computational
tasks across various devices, contributing to a more scalable infrastructure.
Exponential Growth of Data: Integration of a Wider Array of Data Sources
The future of credit scoring will likely be characterized by the integration of an
even wider array of data sources. The exponential growth of data, including data
from the Internet of Things (IoT), social media, and non-traditional financial
sources, will contribute to more comprehensive credit assessments. Here's what
we can expect:
Holistic Credit Profiles: The inclusion of diverse data sources will enable credit
scoring models to build more holistic profiles of individuals. This includes factors
such as lifestyle, spending patterns, and other non-traditional indicators of
creditworthiness.
Machine Learning Advances: Advancements in Handling Large Datasets
As the volume and variety of data increase, machine learning models will
continue to advance. Techniques for handling large datasets and uncovering
nuanced patterns will likely become more sophisticated, enhancing the accuracy
of credit assessments.
29
CHAPTER 11
Conclusion
Technological Advancements
With the integration of Explainable AI (XAI) and edge computing, credit scoring
systems are becoming more efficient and transparent. The use of these
technologies promotes localized explanations and interactive dashboards, allowing
users to understand the decision-making process and enhance transparency.
Ethical Considerations
The future of credit scoring depends on responsible AI use, fairness, and
transparency. The industry is taking steps towards addressing biases, ensuring
algorithmic fairness, and empowering users to contest complex systems' decisions.
30
The Future of Credit Scoring: Innovations, Regulations, and User Empowerment
As credit scoring technology rapidly advances, the regulatory landscape is expected
to evolve. This may include a focus on consumer rights, responsible AI usage, and
global harmonization. Financial institutions will need to navigate these regulations
while embracing innovation to stay competitive.
Credit scoring systems are shifting towards a more user-centric approach. This
includes localized explanations, interactive dashboards, and a focus on user
empowerment. Informed users are better equipped to manage their financial
profiles and navigate the credit landscape effectively.
31
REFERENCES
32
10. Zeng, Y., Chan, T. M., Yung, C., & Hui, L. C. K. (2017). Detecting Fraudulent
Activities in Financial Institutions: A Review of Credit Scoring, Red Flags, and Identity
Theft. Journal of Money Laundering Control, 20(2), 188–215.
33