0% found this document useful (0 votes)
102 views28 pages

DMMLM - Risk Score Prediction Model

Uploaded by

Ruchita Ghinaiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views28 pages

DMMLM - Risk Score Prediction Model

Uploaded by

Ruchita Ghinaiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

RISK

Data Mining and Machine


SCORE Learning for Managers
PREDICTION
Submitted By:

Tanvi Brahmankar (p44228)


Ruchita Ghinaiya (p44218)
Goonjan Papola (p44262)
Himanshu Pranami(p44264)
Diptonil Deka (p44260)
LENDING SECTOR
Financial institutions (banks, credit unions,
NBFCs) provide capital to individuals and
businesses through loans.

Lending supports key economic activities: home


ownership, business expansion, education, etc.

Importance of efficient loan management to


Types of Loans
balance profitability and risk.
Lending in Financial Institutions
Lending process: Application → Credit Evaluation
→ Risk Assessment → Loan Approval.

Types of loans: personal loans, business loans,


mortgages, student loans.

Lending generates significant revenue for


financial institutions through interest rates and
fees.

Creditworthiness assessments are crucial to


minimize defaults and maintain financial
stability.
Risk in Lending A Key Concern
Loan risk is a primary concern for financial
institutions.

Risks include loan defaults, leading to financial


losses and reduced liquidity.

Common risk factors: borrower credit history,


income stability, debt levels, economic
conditions.

Accurate risk assessment helps lenders make


informed decisions and minimize default rates.
Traditional Approaches to Loan Risk Assessment
Credit Widely used metric for assessing an individual's
creditworthiness. Limitations of Traditional Risk Scoring
Scores
Examples: FICO (USA), CIBIL (India).

Lenders assess annual or monthly income to


Income determine the borrower's ability to repay the loan. Over-Reliance on Credit History
Verification Income is verified through payslips, tax returns,
and bank statements. Bias Against Non-Traditional Incomes

Secured loans require assets (like property, Time-Intensive Manual Processes


vehicles) as collateral.
Collateral
Reduces lender risk because the collateral can be
repossessed if the borrower defaults. Inability to Handle Complex
Relationships
Past loan performance, including defaults, late
Debt payments, is used to assess current risk.
History Borrowers with a clean history of debt repayment
are considered lower risk.
Need for Data-Driven Risk Assessment
The problem: Accurate loan risk prediction is
critical to minimize defaults and manage risk.

Traditional models fail to capture complex


relationships between borrower features.

Need for a data-driven approach using AI/ML to


analyze factors like income, credit score, loan
amount, and more.

Goal: Use AI/ML to assign a Risk Score, which


quantifies the likelihood of loan default, guiding
lending decisions and interest rates.
CHALLENGES IN LOAN RISK ASSESSMENT
Manual Assessment: Traditionally, banks rely on
human judgment, which can lead to
inconsistencies in evaluating applicants'
creditworthiness.

High Default Rates: Incorrect risk


assessment may result in granting loans to
risky individuals, increasing default rates

Lack of Historical Data: Some applicants,


especially new ones, may lack sufficient credit
history, making risk prediction difficult.

Image Source: https://www.unit21.ai/blog/risk-management-in-banking


DATA QUALITY ISSUES IN LOAN RISK PREDICTION

Missing Data: Important fields like


Outliers: Extreme values in features
income, credit score, or loan
such as annual income or loan
purpose might be missing or
amount may distort model training.
incomplete.

Bias in Data: Historical data may Inconsistent Data: Variations in


reflect biases in previous data formats (e.g., dates,
approvals, leading to unfair risk income) can affect analysis and
predictions predictions
CURRENT APPROACHES TO LOAN RISK PREDICTION
Traditional Credit Scoring
Banks use fixed criteria like credit score, annual income, and loan purpose to approve loans.
The most prominent techniques used to develop credit scorecards are statistical discrimination and classification
methods. These include linear regression models, discriminant analysis, logit and probit models, and expert
judgment-based models.

Expert Systems
Rule-Based Systems: Some institutions use structured rule-based approaches where decision-making is based on
predefined 'if-then' rules (e.g., if credit score is above 650 and income is greater than $50,000, then approve the loan).
Limited Flexibility: These systems struggle to adapt to complex interactions between variables and often don't learn
from new data, making them static and prone to outdated recommendations.

Human Underwriting
Personal Review: Loan officers personally review applications, relying on experience and intuition to evaluate
applicants. While this can provide flexibility, it can also be inconsistent.
Decision Fatigue: Human decision-makers can get fatigued when reviewing a large volume of loans, leading to
mistakes or rushed judgments.
LIMITATIONS OF TRADITIONAL METHODS

Inflexibility: Traditional methods rely on static


criteria that don't account for dynamic changes in
a borrower’s profile

Subjectivity: Human judgment can introduce


subjectivity, leading to potential bias in risk
assessments

Inability to Handle Big Data: As more loan


applicants emerge, traditional methods struggle
to process large amounts of diverse data Image Source: https://images.app.goo.gl/fP9D17KbbYdkvTnHA

Limited Predictive Power: Existing scoring


methods often fail to capture the full risk
spectrum and may lead to inaccurate predictions.
NEED FOR AI/ML IN LOAN
RISK ASSESSMENT
Data-Driven Decision Making: AI/ML
algorithms can analyze large volumes of
data, identifying patterns m

Use a histogram when you


Improved Accuracy: Machine learning
have data in different ranges,
models can learn from historical data to
likemore
measuring
accuratelythe heights
predict of
risk scores.
people in your class.
Efficiency: AI models can process
applications quickly, reducing the time
needed for loan approval. Image source: https://images.app.goo.gl/DjyTuxCYuB5HReV47

Adaptability: AI/ML models can be updated


with new data, making them more adaptable
to changing market conditions.
AI/ML TECHNIQUES OVERVIEW
Supervised Learning: Most commonly used for predicting risk scores, as we have
labeled data (Risk Score).

Regression Algorithms: Models like Linear Regression, Decision Trees, and Random
Forests can estimate a continuous risk score.

Classification Algorithms: If the Risk Score is bucketed into categories (e.g., High,
Medium, Low), classification techniques like Logistic Regression, K-Nearest Neighbors
(KNN), and Gradient Boosting may be used.
OVERVIEW OF KEY ML TECHNIQUES

MULTIPLE LINEAR LASSO & RIDGE K-NEAREST


REGRESSION NEIGHBORS (KNN)

Predicts Risk Improve model


Classifies based
Score using performance by
on 'k' closest data
features (age, preventing
points.
income, etc.). overfitting.
BETTER RISK PREDICTION

DECISION TREES RANDOM FOREST BOOSTING


(DT) (RF) (XGBOOST)

Simple model but Improves


Sequentially
prone to accuracy through
refines models
overfitting on multiple decision
for high accuracy.
training data. trees.
USING SUPPORT VECTOR MACHINES FOR RISK
CLASSIFICATION

SVM: A powerful algorithm to find the optimal hyperplane for separating risk
categories, effective in both linear and non-linear problems (with kernel tricks).

Hyperparameters: Discuss tuning parameters like the regularization parameter


C, kernel type (linear, polynomial, radial), and gamma.

Application: Particularly useful when feature spaces are large or not clearly
separable.
OPTIMIZING MODELS THROUGH HYPERPARAMETER
TUNING
Hyperparameters in Models: Examples include alpha for Ridge/Lasso, k for KNN,
max_depth for Decision Trees, and n_estimators for Random Forest.

Cost Function Minimization: Explain how techniques like gradient descent minimize
the cost function (e.g., Mean Squared Error in regression models or hinge loss in SVM).

Cross-Validation: Model evaluation method used to fine-tune hyperparameters and


avoid overfitting.
APPLYING AI/ML TO REAL-WORLD RISK
PREDICTION
Loan Risk Prediction: AI models such as KNN, Random Forest, and SVM are widely
used to predict loan defaults, credit scores, and fraud detection.

ANN & CNN (Minor Focus): While primarily used in image processing, ANN can model
complex non-linear relationships in financial risk prediction, but less interpretable.

Future Improvements: Incorporate deep learning (ANN) for complex feature


interactions, though their interpretability remains a challenge in financial domains.
MODEL EVALUATION METRICS
MODEL EVALUATION METRICS

Random Forest again shows


superior generalization with the
lowest Cross-Validation MSE.
Ensemble and Ridge also show
stable performance across
Cross-Validation.
RANDOM FOREST FEATURE IMPORTANCE

Key features that influence the Random Forest model's predictions, ranked
by importance.

Top 5 Features:
Total Debt to Income Ratio (16.32%)
Bankruptcy History (14.48%)
Net Worth (10.90%)
Debt to Income Ratio (12.10%)
Monthly Income (8.93%)
RANDOM FOREST VS. ENSEMBLE MODEL

Random Forest significantly outperforms the Ensemble model in terms of both MSE
and R².
The small difference in Cross-Validation results shows that both models generalize
well, but Random Forest offers a more accurate prediction.
RANDOM FOREST

Random Forest is the best model, offering the lowest MSE and highest R² (88%
variance explained).
Key predictors identified: Debt-to-income ratio, bankruptcy history, and net worth.
Further refine Random Forest using hyperparameter tuning to possibly reduce MSE
further.
Consider additional ensemble techniques (like stacking) to see if performance can
be boosted.
Focus on refining data related to the most important features (e.g., more precise
debt and income metrics).
AI/ML FOR LOAN RISK PREDICTION - KEY
BENEFITS
Improved Accuracy:
AI/ML models analyze vast amounts of historical loan data.
They identify complex relationships between multiple risk factors, leading to more
accurate predictions compared to traditional models.

Efficiency and Speed:


AI/ML can process large datasets and generate risk scores in real-time. This reduces
manual work and allows financial institutions to process applications faster, enhancing
overall efficiency.

Data Driven fairness: AI models can help minimize human bias by evaluating applicants
based on data and consistent rules leading to fairer lending decisions
AI/ML FOR LOAN RISK PREDICTION - KEY
BENEFITS
Continued....
Scalability
AI/ML models are designed to handle large-scale datasets with ease for which they are
ideal for scaling with the growth of financial institutions, where manual assessments
become impractical.

Early Detection of Risk


AI/ML can flag high-risk applicants early by analyzing factors like previous defaults or debt-
to-income ratios. This enables lenders to make proactive decisions by adjusting loan terms or
rejecting risky applications upfront.
LIMITATION FOR AI/ML FOR LOAN RISK
PREDICTION
Data Quality and Availability
AI/ML models rely on high-quality, extensive datasets. So, If data is incomplete, biased, or
outdated, predictions can become inaccurate, leading to poor decision-making.

Interpretability
Advanced models like Random Forests or Gradient Boosting often act as "black boxes.",
where lack of transparency makes it difficult for lenders to explain why certain applicants are
deemed risky, which can lead to regulatory and customer concerns.

Bias in AI Model
AI models may inherit historical biases from training data. Discriminatory practices in the past can
be replicated unless data is carefully managed and models are designed to ensure fairness.
LIMITATION FOR AI/ML FOR LOAN RISK
PREDICTION
Continued...

Overfitting
Overfitting occurs when models perform well on training data but poorly on unseen data. Complex
models can be too tuned to past patterns, failing to generalize to new loan applicants.

Regulatory Challenges
Financial institutions must ensure AI models comply with regulations like Fair Lending Laws.
Ensuring transparency, explainability, and bias mitigation can be a major challenge in heavily
regulated environments.
POTENTIAL FUTURE IMPROVEMENTS IN AI/ML
FOR FUTURE LOAN PREDICTION
Enhanced Data Quality and Integration
Improving the data collection methods will ensure higher quality, updated, and comprehensive
datasets.

Bias Mitigation Techniques

Implementing ways of fairness algorithms and bias-detection tools to ensure AI models


do not replicate discriminatory practices. Also, efforts should be made to train models on
diverse datasets to reduce historical biases.
THANK YOU

You might also like