0Loan_Eligibility_prediction_Python.ipynb - Colab
0Loan_Eligibility_prediction_Python.ipynb - Colab
ipynb - Colab
Importing Libraries In this step, we will be importing libraries like NumPy, Pandas, Matplotlib, etc.
1 import numpy as np
2 import pandas as pd
3 import matplotlib.pyplot as plt
4 import seaborn as sb
5 from sklearn.model_selection import train_test_split
6 from sklearn.preprocessing import LabelEncoder, StandardScaler
7 from sklearn import metrics
8 from sklearn.svm import SVC
9 from imblearn.over_sampling import RandomOverSampler
10
11 import warnings
12 warnings.filterwarnings('ignore')
1 df.shape
(598, 6)
1 df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 598 entries, 0 to 597
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Loan_ID 598 non-null object
1 Gender 598 non-null object
2 Married 598 non-null object
3 ApplicantIncome 598 non-null int64
4 LoanAmount 577 non-null float64
5 Loan_Status 598 non-null object
dtypes: float64(1), int64(1), object(4)
memory usage: 28.2+ KB
To get values like the mean, count and min of the column we can use describe() method.
1 df.describe()
https://colab.research.google.com/drive/1WcvuIKfQxZK-1e6O2I4GOHyISo4CRZFp#scrollTo=XMgDOJEhlvbZ&printMode=true 1/6
12/3/24, 1:51 PM 0Loan_Eligibility_prediction_Python.ipynb - Colab
ApplicantIncome LoanAmount
Exploratory Data Analysis EDA refers to the detailed analysis of the dataset which uses plots like distplot, barplots, etc.
1 temp = df['Loan_Status'].value_counts()
2 plt.pie(temp.values,
3 labels=temp.index,
4 autopct='%1.1f%%')
5 plt.show()
Here we have an imbalanced dataset. We will have to balance it before training any model on this data.
We specify the DataFrame df as the data source for the sb.countplot() function. The x parameter is set to the column name from which the
count plot is to be created, and hue is set to ‘Loan_Status’ to create count bars based on the ‘Loan_Status’ categories.
1 plt.subplots(figsize=(15, 5))
2 for i, col in enumerate(['Gender', 'Married']):
3 plt.subplot(1, 2, i+1)
4 sb.countplot(data=df, x=col, hue='Loan_Status')
5 plt.tight_layout()
6 plt.show()
https://colab.research.google.com/drive/1WcvuIKfQxZK-1e6O2I4GOHyISo4CRZFp#scrollTo=XMgDOJEhlvbZ&printMode=true 2/6
12/3/24, 1:51 PM 0Loan_Eligibility_prediction_Python.ipynb - Colab
One of the main observations we can draw here is that the chances of getting a loan approved for married people are quite low compared to
those who are not married.
1 plt.subplots(figsize=(15, 5))
2 for i, col in enumerate(['ApplicantIncome', 'LoanAmount']):
3 plt.subplot(1, 2, i+1)
4 sb.distplot(df[col])
5 plt.tight_layout()
6 plt.show()
There are some extreme outlier’s in the data we need to remove them
Let’s see the mean amount of the loan granted to males as well as females. For that, we will use groupyby() method.
1 df.groupby('Gender').mean(numeric_only=True)['LoanAmount']
LoanAmount
Gender
Female 126.697248
Male 146.872294
The loan amount requested by males is higher than what is requested by females.
1 df.groupby(['Married', 'Gender']).mean(numeric_only=True)['LoanAmount']
LoanAmount
Married Gender
No Female 116.115385
Male 135.959677
Male 150.875740
Here is one more interesting observation in addition to the previous one that the married people requested loan amount is generally higher than
that of the unmarried. This may be one of the reason’s that we observe earlier that the chances of getting loan approval for a married person are
lower than that compared to an unmarried person.
https://colab.research.google.com/drive/1WcvuIKfQxZK-1e6O2I4GOHyISo4CRZFp#scrollTo=XMgDOJEhlvbZ&printMode=true 3/6
12/3/24, 1:51 PM 0Loan_Eligibility_prediction_Python.ipynb - Colab
4 if data[col].dtype == 'object':
5 le = LabelEncoder()
6 data[col] = le.fit_transform(data[col])
7
8 return data
9
10 # Applying function in whole column
11 df = encode_labels(df)
12
13 # Generating Heatmap
14 sb.heatmap(df.corr() > 0.8, annot=True, cbar=False)
15 plt.show()
We will now use Standard scaling for normalizing the data. To know more about StandardScaler refer this link.
https://colab.research.google.com/drive/1WcvuIKfQxZK-1e6O2I4GOHyISo4CRZFp#scrollTo=XMgDOJEhlvbZ&printMode=true 4/6
12/3/24, 1:51 PM 0Loan_Eligibility_prediction_Python.ipynb - Colab
3 model.fit(X, Y)
4
5 print('Training Accuracy : ', metrics.roc_auc_score(Y, model.predict(X)))
6 print('Validation Accuracy : ', metrics.roc_auc_score(Y_val, model.predict(X_val)))
7 print()
we first train the SVC model using the training data X and Y. Then, we calculate the ROC AUC scores for both the training and validation
datasets. The confusion matrix is built for the validation data by using the confusion_matrix function from sklearn.metrics. Finally, we plot the
confusion matrix using the plot_confusion_matrix function from the sklearn.metrics.plot_confusion_matrix submodule.
a table that compares predicted values to actual values for a dataset to evaluate the performance of a classification model.
1 plt.figure(figsize=(6, 6))
2 sb.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False)
3 plt.title('Confusion Matrix')
4 plt.xlabel('Predicted Label')
5 plt.ylabel('True Label')
6 plt.show()
https://colab.research.google.com/drive/1WcvuIKfQxZK-1e6O2I4GOHyISo4CRZFp#scrollTo=XMgDOJEhlvbZ&printMode=true 5/6
12/3/24, 1:51 PM 0Loan_Eligibility_prediction_Python.ipynb - Colab
https://colab.research.google.com/drive/1WcvuIKfQxZK-1e6O2I4GOHyISo4CRZFp#scrollTo=XMgDOJEhlvbZ&printMode=true 6/6