DMMLM - Risk Score Prediction Model
DMMLM - Risk Score Prediction Model
Expert Systems
Rule-Based Systems: Some institutions use structured rule-based approaches where decision-making is based on
predefined 'if-then' rules (e.g., if credit score is above 650 and income is greater than $50,000, then approve the loan).
Limited Flexibility: These systems struggle to adapt to complex interactions between variables and often don't learn
from new data, making them static and prone to outdated recommendations.
Human Underwriting
Personal Review: Loan officers personally review applications, relying on experience and intuition to evaluate
applicants. While this can provide flexibility, it can also be inconsistent.
Decision Fatigue: Human decision-makers can get fatigued when reviewing a large volume of loans, leading to
mistakes or rushed judgments.
LIMITATIONS OF TRADITIONAL METHODS
Regression Algorithms: Models like Linear Regression, Decision Trees, and Random
Forests can estimate a continuous risk score.
Classification Algorithms: If the Risk Score is bucketed into categories (e.g., High,
Medium, Low), classification techniques like Logistic Regression, K-Nearest Neighbors
(KNN), and Gradient Boosting may be used.
OVERVIEW OF KEY ML TECHNIQUES
SVM: A powerful algorithm to find the optimal hyperplane for separating risk
categories, effective in both linear and non-linear problems (with kernel tricks).
Application: Particularly useful when feature spaces are large or not clearly
separable.
OPTIMIZING MODELS THROUGH HYPERPARAMETER
TUNING
Hyperparameters in Models: Examples include alpha for Ridge/Lasso, k for KNN,
max_depth for Decision Trees, and n_estimators for Random Forest.
Cost Function Minimization: Explain how techniques like gradient descent minimize
the cost function (e.g., Mean Squared Error in regression models or hinge loss in SVM).
ANN & CNN (Minor Focus): While primarily used in image processing, ANN can model
complex non-linear relationships in financial risk prediction, but less interpretable.
Key features that influence the Random Forest model's predictions, ranked
by importance.
Top 5 Features:
Total Debt to Income Ratio (16.32%)
Bankruptcy History (14.48%)
Net Worth (10.90%)
Debt to Income Ratio (12.10%)
Monthly Income (8.93%)
RANDOM FOREST VS. ENSEMBLE MODEL
Random Forest significantly outperforms the Ensemble model in terms of both MSE
and R².
The small difference in Cross-Validation results shows that both models generalize
well, but Random Forest offers a more accurate prediction.
RANDOM FOREST
Random Forest is the best model, offering the lowest MSE and highest R² (88%
variance explained).
Key predictors identified: Debt-to-income ratio, bankruptcy history, and net worth.
Further refine Random Forest using hyperparameter tuning to possibly reduce MSE
further.
Consider additional ensemble techniques (like stacking) to see if performance can
be boosted.
Focus on refining data related to the most important features (e.g., more precise
debt and income metrics).
AI/ML FOR LOAN RISK PREDICTION - KEY
BENEFITS
Improved Accuracy:
AI/ML models analyze vast amounts of historical loan data.
They identify complex relationships between multiple risk factors, leading to more
accurate predictions compared to traditional models.
Data Driven fairness: AI models can help minimize human bias by evaluating applicants
based on data and consistent rules leading to fairer lending decisions
AI/ML FOR LOAN RISK PREDICTION - KEY
BENEFITS
Continued....
Scalability
AI/ML models are designed to handle large-scale datasets with ease for which they are
ideal for scaling with the growth of financial institutions, where manual assessments
become impractical.
Interpretability
Advanced models like Random Forests or Gradient Boosting often act as "black boxes.",
where lack of transparency makes it difficult for lenders to explain why certain applicants are
deemed risky, which can lead to regulatory and customer concerns.
Bias in AI Model
AI models may inherit historical biases from training data. Discriminatory practices in the past can
be replicated unless data is carefully managed and models are designed to ensure fairness.
LIMITATION FOR AI/ML FOR LOAN RISK
PREDICTION
Continued...
Overfitting
Overfitting occurs when models perform well on training data but poorly on unseen data. Complex
models can be too tuned to past patterns, failing to generalize to new loan applicants.
Regulatory Challenges
Financial institutions must ensure AI models comply with regulations like Fair Lending Laws.
Ensuring transparency, explainability, and bias mitigation can be a major challenge in heavily
regulated environments.
POTENTIAL FUTURE IMPROVEMENTS IN AI/ML
FOR FUTURE LOAN PREDICTION
Enhanced Data Quality and Integration
Improving the data collection methods will ensure higher quality, updated, and comprehensive
datasets.