0% found this document useful (0 votes)
3 views

0Loan_Eligibility_prediction_Python.ipynb - Colab

0Loan_Eligibility_prediction_Python.ipynb - Colab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

0Loan_Eligibility_prediction_Python.ipynb - Colab

0Loan_Eligibility_prediction_Python.ipynb - Colab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

12/3/24, 1:51 PM 0Loan_Eligibility_prediction_Python.

ipynb - Colab

keyboard_arrow_down Loan Eligibility prediction using Machine Learning Models in Python


Have you ever thought about the apps that can predict whether you will get your loan approved or not? In this article, we are going to develop
one such model that can predict whether a person will get his/her loan approved or not by using some of the background information of the
applicant like the applicant’s gender, marital status, income, etc.

Importing Libraries In this step, we will be importing libraries like NumPy, Pandas, Matplotlib, etc.

1 import numpy as np
2 import pandas as pd
3 import matplotlib.pyplot as plt
4 import seaborn as sb
5 from sklearn.model_selection import train_test_split
6 from sklearn.preprocessing import LabelEncoder, StandardScaler
7 from sklearn import metrics
8 from sklearn.svm import SVC
9 from imblearn.over_sampling import RandomOverSampler
10
11 import warnings
12 warnings.filterwarnings('ignore')

keyboard_arrow_down Loading Dataset


1 df = pd.read_csv('/content/loan_data.csv')
2 df.head()

Loan_ID Gender Married ApplicantIncome LoanAmount Loan_Status

0 LP001002 Male No 5849 NaN Y

1 LP001003 Male Yes 4583 128.0 N

2 LP001005 Male Yes 3000 66.0 Y

3 LP001006 Male Yes 2583 120.0 Y

4 LP001008 Male No 6000 141 0 Y

Next steps: Generate code with df


toggle_off View recommended plots New interactive sheet

To see the shape of the dataset, we can use shape method.

1 df.shape

(598, 6)

To print the information of the dataset, we can use info() method

1 df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 598 entries, 0 to 597
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Loan_ID 598 non-null object
1 Gender 598 non-null object
2 Married 598 non-null object
3 ApplicantIncome 598 non-null int64
4 LoanAmount 577 non-null float64
5 Loan_Status 598 non-null object
dtypes: float64(1), int64(1), object(4)
memory usage: 28.2+ KB

To get values like the mean, count and min of the column we can use describe() method.

1 df.describe()

https://colab.research.google.com/drive/1WcvuIKfQxZK-1e6O2I4GOHyISo4CRZFp#scrollTo=XMgDOJEhlvbZ&printMode=true 1/6
12/3/24, 1:51 PM 0Loan_Eligibility_prediction_Python.ipynb - Colab

ApplicantIncome LoanAmount

count 598.000000 577.000000

mean 5292.252508 144.968804

std 5807.265364 82.704182

min 150.000000 9.000000

25% 2877.500000 100.000000

50% 3806.000000 127.000000

75% 5746.000000 167.000000

max 81000 000000 650 000000

Exploratory Data Analysis EDA refers to the detailed analysis of the dataset which uses plots like distplot, barplots, etc.

Let’s start by plotting the piechart for LoanStatus column.

1 temp = df['Loan_Status'].value_counts()
2 plt.pie(temp.values,
3 labels=temp.index,
4 autopct='%1.1f%%')
5 plt.show()

Here we have an imbalanced dataset. We will have to balance it before training any model on this data.

We specify the DataFrame df as the data source for the sb.countplot() function. The x parameter is set to the column name from which the
count plot is to be created, and hue is set to ‘Loan_Status’ to create count bars based on the ‘Loan_Status’ categories.

1 plt.subplots(figsize=(15, 5))
2 for i, col in enumerate(['Gender', 'Married']):
3 plt.subplot(1, 2, i+1)
4 sb.countplot(data=df, x=col, hue='Loan_Status')
5 plt.tight_layout()
6 plt.show()

https://colab.research.google.com/drive/1WcvuIKfQxZK-1e6O2I4GOHyISo4CRZFp#scrollTo=XMgDOJEhlvbZ&printMode=true 2/6
12/3/24, 1:51 PM 0Loan_Eligibility_prediction_Python.ipynb - Colab

One of the main observations we can draw here is that the chances of getting a loan approved for married people are quite low compared to
those who are not married.

1 plt.subplots(figsize=(15, 5))
2 for i, col in enumerate(['ApplicantIncome', 'LoanAmount']):
3 plt.subplot(1, 2, i+1)
4 sb.distplot(df[col])
5 plt.tight_layout()
6 plt.show()

There are some extreme outlier’s in the data we need to remove them

1 df = df[df['ApplicantIncome'] < 25000]


2 df = df[df['LoanAmount'] < 400000]

Let’s see the mean amount of the loan granted to males as well as females. For that, we will use groupyby() method.

1 df.groupby('Gender').mean(numeric_only=True)['LoanAmount']

LoanAmount

Gender

Female 126.697248

Male 146.872294

The loan amount requested by males is higher than what is requested by females.

1 df.groupby(['Married', 'Gender']).mean(numeric_only=True)['LoanAmount']

LoanAmount

Married Gender

No Female 116.115385

Male 135.959677

Yes Female 153.322581

Male 150.875740

Here is one more interesting observation in addition to the previous one that the married people requested loan amount is generally higher than
that of the unmarried. This may be one of the reason’s that we observe earlier that the chances of getting loan approval for a married person are
lower than that compared to an unmarried person.

1 # Function to apply label encoding


2 def encode_labels(data):
3 for col in data.columns:

https://colab.research.google.com/drive/1WcvuIKfQxZK-1e6O2I4GOHyISo4CRZFp#scrollTo=XMgDOJEhlvbZ&printMode=true 3/6
12/3/24, 1:51 PM 0Loan_Eligibility_prediction_Python.ipynb - Colab
4 if data[col].dtype == 'object':
5 le = LabelEncoder()
6 data[col] = le.fit_transform(data[col])
7
8 return data
9
10 # Applying function in whole column
11 df = encode_labels(df)
12
13 # Generating Heatmap
14 sb.heatmap(df.corr() > 0.8, annot=True, cbar=False)
15 plt.show()

keyboard_arrow_down Data Preprocessing


In this step, we will split the data for training and testing. After that, we will preprocess the training data.

1 features = df.drop('Loan_Status', axis=1)


2 target = df['Loan_Status'].values
3
4 X_train, X_val, Y_train, Y_val = train_test_split(features, target,
5 test_size=0.2,
6 random_state=10)
7
8 # As the data was highly imbalanced we will balance
9 # it by adding repetitive rows of minority class.
10 ros = RandomOverSampler(sampling_strategy='minority',
11 random_state=0)
12 X, Y = ros.fit_resample(X_train, Y_train)
13
14 X_train.shape, X.shape

((456, 5), (638, 5))

We will now use Standard scaling for normalizing the data. To know more about StandardScaler refer this link.

1 # Normalizing the features for stable and fast training.


2 scaler = StandardScaler()
3 X = scaler.fit_transform(X)
4 X_val = scaler.transform(X_val)

keyboard_arrow_down Model Development


We will use Support Vector Classifier for training the model.

1 from sklearn.metrics import roc_auc_score


2 model = SVC(kernel='rbf')

https://colab.research.google.com/drive/1WcvuIKfQxZK-1e6O2I4GOHyISo4CRZFp#scrollTo=XMgDOJEhlvbZ&printMode=true 4/6
12/3/24, 1:51 PM 0Loan_Eligibility_prediction_Python.ipynb - Colab
3 model.fit(X, Y)
4
5 print('Training Accuracy : ', metrics.roc_auc_score(Y, model.predict(X)))
6 print('Validation Accuracy : ', metrics.roc_auc_score(Y_val, model.predict(X_val)))
7 print()

Training Accuracy : 0.6300940438871474


Validation Accuracy : 0.48198198198198194

Model Evaluation Model Evaluation can be done using confusion matrix.

we first train the SVC model using the training data X and Y. Then, we calculate the ROC AUC scores for both the training and validation
datasets. The confusion matrix is built for the validation data by using the confusion_matrix function from sklearn.metrics. Finally, we plot the
confusion matrix using the plot_confusion_matrix function from the sklearn.metrics.plot_confusion_matrix submodule.

1 from sklearn.svm import SVC


2 from sklearn.metrics import confusion_matrix
3 training_roc_auc = roc_auc_score(Y, model.predict(X))
4 validation_roc_auc = roc_auc_score(Y_val, model.predict(X_val))
5 print('Training ROC AUC Score:', training_roc_auc)
6 print('Validation ROC AUC Score:', validation_roc_auc)
7 print()
8 cm = confusion_matrix(Y_val, model.predict(X_val))

Training ROC AUC Score: 0.6300940438871474


Validation ROC AUC Score: 0.48198198198198194

a table that compares predicted values to actual values for a dataset to evaluate the performance of a classification model.

1 plt.figure(figsize=(6, 6))
2 sb.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False)
3 plt.title('Confusion Matrix')
4 plt.xlabel('Predicted Label')
5 plt.ylabel('True Label')
6 plt.show()

1 from sklearn.metrics import classification_report


2 print(classification_report(Y_val, model.predict(X_val)))

precision recall f1-score support

0 0.30 0.30 0.30 37


1 0.67 0.67 0.67 78

accuracy 0.55 115


macro avg 0.48 0.48 0.48 115
weighted avg 0.55 0.55 0.55 115

https://colab.research.google.com/drive/1WcvuIKfQxZK-1e6O2I4GOHyISo4CRZFp#scrollTo=XMgDOJEhlvbZ&printMode=true 5/6
12/3/24, 1:51 PM 0Loan_Eligibility_prediction_Python.ipynb - Colab

https://colab.research.google.com/drive/1WcvuIKfQxZK-1e6O2I4GOHyISo4CRZFp#scrollTo=XMgDOJEhlvbZ&printMode=true 6/6

You might also like