Python Code For Loan Default Prediction

The document details a machine learning project to classify loan applicants as defaulters or non-defaulters using Python. It covers loading and exploring the data, preparing it for modeling, building and evaluating a logistic regression model, and concludes with next steps to improve the model performance.

Uploaded by

sitaramr54

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

223 views4 pages

Python Code For Loan Default Prediction

Uploaded by

sitaramr54

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Python code for loan default prediction

A simple machine learning project to classify loan applicants as

defaulters or non-defaulters

Importing libraries
• # We need pandas to manipulate data frames
• import pandas as pd
• # We need numpy to perform numerical operations
• import numpy as np
• # We need sklearn to use machine learning models and metrics
• from sklearn.linear_model import LogisticRegression
• from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
• # We need matplotlib and seaborn to visualize the data
• import matplotlib.pyplot as plt
• import seaborn as sns

Loading and exploring the data

• # We load the data from a csv file into a pandas data frame
• data = pd.read_csv('loan_data.csv')
• # We check the shape and the first five rows of the data
• print(data.shape)
• print(data.head())
• # We see that the data has 13 columns and 614 rows
• # The columns are: Loan_ID, Gender, Married, Dependents, Education, Self_Employed,
ApplicantIncome, CoapplicantIncome, LoanAmount, Loan_Amount_Term, Credit_History,
Property_Area, Loan_Status
• # The target variable is Loan_Status, which indicates whether the loan was approved (Y) or
not (N)
• # We check the summary statistics of the numerical columns
• print(data.describe())
• # We see that the mean and standard deviation of ApplicantIncome, CoapplicantIncome,
LoanAmount, and Loan_Amount_Term vary a lot
• # We check the distribution of the categorical columns
• print(data['Gender'].value_counts())
• print(data['Married'].value_counts())
• print(data['Dependents'].value_counts())
• print(data['Education'].value_counts())
• print(data['Self_Employed'].value_counts())
• print(data['Credit_History'].value_counts())
• print(data['Property_Area'].value_counts())
• print(data['Loan_Status'].value_counts())
• # We see that the data is imbalanced, as there are more males, married, graduates, non-
self-employed, with credit history, and approved loans than the opposite categories
• # We also see that there are some missing values in some columns, such as Gender,
Married, Dependents, Self_Employed, and Credit_History
• # We visualize the relationship between the target variable and the other variables using bar
plots
• plt.figure(figsize=(15,10))
• plt.subplot(2,4,1)
• sns.countplot(x='Gender', hue='Loan_Status', data=data)
• plt.subplot(2,4,2)
• sns.countplot(x='Married', hue='Loan_Status', data=data)
• plt.subplot(2,4,3)
• sns.countplot(x='Dependents', hue='Loan_Status', data=data)
• plt.subplot(2,4,4)
• sns.countplot(x='Education', hue='Loan_Status', data=data)
• plt.subplot(2,4,5)
• sns.countplot(x='Self_Employed', hue='Loan_Status', data=data)
• plt.subplot(2,4,6)
• sns.countplot(x='Credit_History', hue='Loan_Status', data=data)
• plt.subplot(2,4,7)
• sns.countplot(x='Property_Area', hue='Loan_Status', data=data)
• plt.tight_layout()
• plt.show()
• # We see that the loan approval rate is higher for males, married, with 0 or 1 dependents,
graduates, non-self-employed, with credit history, and from semi-urban areas
• # We visualize the relationship between the target variable and the numerical variables
using box plots
• plt.figure(figsize=(15,5))
• plt.subplot(1,4,1)
• sns.boxplot(x='Loan_Status', y='ApplicantIncome', data=data)
• plt.subplot(1,4,2)
• sns.boxplot(x='Loan_Status', y='CoapplicantIncome', data=data)
• plt.subplot(1,4,3)
• sns.boxplot(x='Loan_Status', y='LoanAmount', data=data)
• plt.subplot(1,4,4)
• sns.boxplot(x='Loan_Status', y='Loan_Amount_Term', data=data)
• plt.tight_layout()
• plt.show()
• # We see that there is no significant difference in the median income and loan amount
between the approved and rejected loans
• # We also see that there are some outliers in the income and loan amount columns
• # We see that the loan amount term is mostly 360 months, with some exceptions

Preparing the data for modeling

• # We split the data into features (X) and target (y)
• X = data.drop(['Loan_ID', 'Loan_Status'], axis=1)
• y = data['Loan_Status']
• # We encode the target variable as 1 for Y and 0 for N
• y = y.map({'Y':1, 'N':0})
• # We check the percentage of missing values in each column
• print(X.isnull().sum()/len(X)*100)
• # We see that the missing values are less than 10% in each column, so we can impute them
with the mode (most frequent value)
• for col in X.columns:
• X[col].fillna(X[col].mode()[0], inplace=True)
• # We check that there are no more missing values
• print(X.isnull().sum())
• # We encode the categorical variables as dummy variables (0 or 1)
• X = pd.get_dummies(X, drop_first=True)
• # We check the new shape and columns of X
• print(X.shape)
• print(X.columns)
• # We see that X now has 14 columns, after creating dummy variables for Gender, Married,
Dependents, Education, Self_Employed, Credit_History, and Property_Area
• # We split the data into train and test sets, with 80% for train and 20% for test
• from sklearn.model_selection import train_test_split
• X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
• # We check the shape of the train and test sets
• print(X_train.shape, y_train.shape)
• print(X_test.shape, y_test.shape)

Building and evaluating the model

• # We create a logistic regression model, which is a simple and widely used classification
algorithm
• model = LogisticRegression()
• # We fit the model on the train data
• model.fit(X_train, y_train)
• # We make predictions on the test data
• y_pred = model.predict(X_test)
• # We evaluate the model performance using accuracy, confusion matrix, and classification
report
• print('Accuracy:', accuracy_score(y_test, y_pred))
• print('Confusion matrix:\n', confusion_matrix(y_test, y_pred))
• print('Classification report:\n', classification_report(y_test, y_pred))
• # We see that the model has an accuracy of 0.78, which means it correctly classified 78%
of the test data
• # We see that the confusion matrix shows that the model predicted 79 true positives, 18
true negatives, 15 false positives, and 11 false negatives
• # We see that the classification report shows that the model has a precision of 0.84, a
recall of 0.87, and an f1-score of 0.85 for the positive class (loan approved)
• # We see that the model has a precision of 0.55, a recall of 0.47, and an f1-score of 0.51 for
the negative class (loan rejected)
• # We see that the model is better at predicting the positive class than the negative class,
which is expected given the imbalanced data

Conclusion
• # We have completed a simple machine learning project to predict loan default using
python
• # We have imported the necessary libraries, loaded and explored the data, prepared the
data for modeling, built and evaluated a logistic regression model
• # We have learned some basic steps and concepts of data analysis and machine learning,
such as data manipulation, data visualization, data imputation, data encoding, data
splitting, model fitting, model prediction, and model evaluation
• # We have seen that the model has a decent accuracy, but it can be improved by using
more complex algorithms, tuning the hyperparameters, or balancing the data

Kritika Sejwal 24MCI10023 ML Lab Project Report
No ratings yet
Kritika Sejwal 24MCI10023 ML Lab Project Report
10 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
23 pages
Machine Learning Paper BD
No ratings yet
Machine Learning Paper BD
16 pages
Machine Learning (P1)
No ratings yet
Machine Learning (P1)
9 pages
Reading Material - Module-5 - Introduction To Special Topics
No ratings yet
Reading Material - Module-5 - Introduction To Special Topics
27 pages
SSRN Id3769854
No ratings yet
SSRN Id3769854
8 pages
Loan Approval Prediction Python
No ratings yet
Loan Approval Prediction Python
6 pages
Loan Prediction
No ratings yet
Loan Prediction
33 pages
Final-12-Lab Programs
No ratings yet
Final-12-Lab Programs
30 pages
Loan Approval Prediction Models
No ratings yet
Loan Approval Prediction Models
10 pages
Inline: Import As Import As Import As Import As Matplotlib Import
100% (1)
Inline: Import As Import As Import As Import As Matplotlib Import
15 pages
LOan Final
No ratings yet
LOan Final
6 pages
Final Project Making Predictions From Data-Course 2: October 6, 2020
No ratings yet
Final Project Making Predictions From Data-Course 2: October 6, 2020
20 pages
LendingClub Loan Default Prediction Model
No ratings yet
LendingClub Loan Default Prediction Model
18 pages
Loan Approval
No ratings yet
Loan Approval
12 pages
Loan Prediction
No ratings yet
Loan Prediction
26 pages
Maths
No ratings yet
Maths
21 pages
Data Science for Home Loan Automation
No ratings yet
Data Science for Home Loan Automation
11 pages
Step by Step Data Processing For ML Project
No ratings yet
Step by Step Data Processing For ML Project
16 pages
Loan Eligibility Prediction Model Analysis
No ratings yet
Loan Eligibility Prediction Model Analysis
12 pages
Project Report On Credit Risk Analysis Using Random Forest
No ratings yet
Project Report On Credit Risk Analysis Using Random Forest
8 pages
Loan-Prediction Using Machine Learning
No ratings yet
Loan-Prediction Using Machine Learning
31 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Loan Default Prediction System 1753830667
No ratings yet
Loan Default Prediction System 1753830667
11 pages
PA v0.21
No ratings yet
PA v0.21
17 pages
Credit - Defaulters - Prediction Using Logostic Regression
No ratings yet
Credit - Defaulters - Prediction Using Logostic Regression
17 pages
1 - Understanding - The - Problem - and - The - Data - Ipynb - Colaboratory
No ratings yet
1 - Understanding - The - Problem - and - The - Data - Ipynb - Colaboratory
9 pages
Machine Learning for Loan Approval Prediction
No ratings yet
Machine Learning for Loan Approval Prediction
31 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
Machine Learning With PySpark and MLlib - Solving A Binary Classification Problem - by Susan Li - Towards Data Science
No ratings yet
Machine Learning With PySpark and MLlib - Solving A Binary Classification Problem - by Susan Li - Towards Data Science
10 pages
Zindi Financial Inclusion Guide
No ratings yet
Zindi Financial Inclusion Guide
12 pages
57 - AI2 - PRAC 6.ipynb - Colab
No ratings yet
57 - AI2 - PRAC 6.ipynb - Colab
3 pages
Final Project Credit Risk - Compressed - Compressed
No ratings yet
Final Project Credit Risk - Compressed - Compressed
27 pages
Case Study Stock Market Prediciton
No ratings yet
Case Study Stock Market Prediciton
10 pages
LDA CreditCardDefault Code N
No ratings yet
LDA CreditCardDefault Code N
11 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Logistic Regression in Python
No ratings yet
Logistic Regression in Python
4 pages
05 E RandomForest LoanData
No ratings yet
05 E RandomForest LoanData
8 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Final Project Title and Abstract Group-3
No ratings yet
Final Project Title and Abstract Group-3
5 pages
Asg One
No ratings yet
Asg One
10 pages
ML 6 7 8
No ratings yet
ML 6 7 8
10 pages
Credit Card Default Prediction
No ratings yet
Credit Card Default Prediction
33 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Employee Salary Prediction
No ratings yet
Employee Salary Prediction
27 pages
Loan Approval Prediction
No ratings yet
Loan Approval Prediction
23 pages
DAV Lab Manual Yashraj
No ratings yet
DAV Lab Manual Yashraj
28 pages
Ads 9
No ratings yet
Ads 9
8 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Prediciton of Loan Apprval-Project Report
No ratings yet
Prediciton of Loan Apprval-Project Report
82 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
Machine Learning with CARET in R
No ratings yet
Machine Learning with CARET in R
90 pages
Credit Card Approval Prediction Report-Final
No ratings yet
Credit Card Approval Prediction Report-Final
27 pages
Data Science Real World Applications
100% (1)
Data Science Real World Applications
19 pages
Machine Learning Credit Rating Model
No ratings yet
Machine Learning Credit Rating Model
12 pages
Model Learning Steps
No ratings yet
Model Learning Steps
12 pages
ML External Xerox
No ratings yet
ML External Xerox
1 page
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
Tm00442 Sov For Iac-A Moc - Signed
No ratings yet
Tm00442 Sov For Iac-A Moc - Signed
4 pages
Interpersonal Skills in Management
No ratings yet
Interpersonal Skills in Management
32 pages
2023 MBAZG511 - Lecture 3 Part 2 Attitudes
No ratings yet
2023 MBAZG511 - Lecture 3 Part 2 Attitudes
26 pages
Reading Material - Module-1 - Introduction To Financial Analytics
No ratings yet
Reading Material - Module-1 - Introduction To Financial Analytics
10 pages
Arima Modelling and Diagnostic Test
No ratings yet
Arima Modelling and Diagnostic Test
2 pages
Econometrics Exam 2
No ratings yet
Econometrics Exam 2
3 pages
Bai Tap Chuong 2
No ratings yet
Bai Tap Chuong 2
3 pages
Solution For Kriging Calculation
100% (2)
Solution For Kriging Calculation
6 pages
Finding The Mean, Variance, and The Standard Deviation of A Discrete Probability Distribution
No ratings yet
Finding The Mean, Variance, and The Standard Deviation of A Discrete Probability Distribution
30 pages
Chapter 05 Generating Random Numbers
No ratings yet
Chapter 05 Generating Random Numbers
45 pages
How To Calculate Standard Deviation For Ungrouped Data
No ratings yet
How To Calculate Standard Deviation For Ungrouped Data
8 pages
Chap 5-1 - Machine Learning Basics - Jinwook Kim
No ratings yet
Chap 5-1 - Machine Learning Basics - Jinwook Kim
39 pages
One-Sample Hypothesis Testing Guide
No ratings yet
One-Sample Hypothesis Testing Guide
38 pages
UNIT-5 Psychometry - 240505 - 1652001
No ratings yet
UNIT-5 Psychometry - 240505 - 1652001
20 pages
Time Series and Forecasting: Mcgraw-Hill/Irwin
No ratings yet
Time Series and Forecasting: Mcgraw-Hill/Irwin
15 pages
UnitTest D18 Feb 2025
No ratings yet
UnitTest D18 Feb 2025
9 pages
Correlation & Regression Answer Key
No ratings yet
Correlation & Regression Answer Key
5 pages
Project 2: Submitted By: Sumit Sinha Program & Group: Pgpbabionline May19 - A
No ratings yet
Project 2: Submitted By: Sumit Sinha Program & Group: Pgpbabionline May19 - A
17 pages
Diet Impact on Weight Loss Analysis
No ratings yet
Diet Impact on Weight Loss Analysis
13 pages
Separable Nonlinear Least Squares For Estimating
No ratings yet
Separable Nonlinear Least Squares For Estimating
5 pages
Naive Bayes for ML Enthusiasts
No ratings yet
Naive Bayes for ML Enthusiasts
15 pages
Chapter - 13 Correlation and Linear Regression
No ratings yet
Chapter - 13 Correlation and Linear Regression
32 pages
Excel Solver for Curve Fitting
No ratings yet
Excel Solver for Curve Fitting
3 pages
6400 Lecture Spreadsheets
No ratings yet
6400 Lecture Spreadsheets
351 pages
Mudit Dubey Resume 2025
No ratings yet
Mudit Dubey Resume 2025
1 page
Nptel Week 9
No ratings yet
Nptel Week 9
4 pages
STPM Math Exam for Selangor Students
No ratings yet
STPM Math Exam for Selangor Students
9 pages
Hypotheses Testing
No ratings yet
Hypotheses Testing
56 pages
Wgu C784 - Applied Healthcare Statistics Pre-Assessment Exam
100% (1)
Wgu C784 - Applied Healthcare Statistics Pre-Assessment Exam
29 pages
Central Tendency Formulas
No ratings yet
Central Tendency Formulas
2 pages
Sia2206 Data Analytics Notes
No ratings yet
Sia2206 Data Analytics Notes
42 pages
Statistical Inference in Linear Regression
No ratings yet
Statistical Inference in Linear Regression
33 pages
DTC Assignment Unit 3
No ratings yet
DTC Assignment Unit 3
2 pages
SPSS & Data Analysis Guide
No ratings yet
SPSS & Data Analysis Guide
4 pages
Six Sigma Tools Guide
No ratings yet
Six Sigma Tools Guide
1 page