0% found this document useful (0 votes)

12 views15 pages

Synopsis

The document outlines a mini project focused on analyzing a loans dataset to identify variables that indicate the likelihood of loan default. It includes sections on data description, cleaning, and problem-solving techniques using Python, along with a detailed data dictionary and various data manipulation tasks. The project aims to provide insights for identifying risky loan applicants to mitigate financial losses.

Uploaded by

Arjun Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views15 pages

Synopsis

Uploaded by

Arjun Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

GROUP MINI

PROJECT
PYTHON (ITP+NPV)
LOAN DATASET

Ajith Kumar
Augustine Roy
Rithesh Kumar Singh
Rohan Kuldhar
S D Prem Kumar

TABLE OF INDEX

NO. Topic Page No.

Python Mini Project

01 Introduction to Problem Statement

02 Data Description

03 Data Cleaning

04 Problem Solving

05 What Could have been done better.

06 Takeaways

07 Conclusion

Introduction To Problem Statement

Loans dataset aims to identify variables which indicate if a person is
likely to default, which can be used for identifying the risky loan
applicants to avoid any financial loss to the company.

Dataset Description
It contains the complete loan data for all loans issued through the time
period 2007 to 2011.

Data Dictionary

2
Python Mini Project

1. annual_inc - The self-reported annual income provided by the

borrower during registration.
2. dti - A ratio calculated using the borrower’s total monthly debt
payments on the total debt obligations, excluding mortgage and the
requested LC loan, divided by the borrower’s self-reported monthly
income.
3. emp_length- Employment length in years. Possible values are
between 0 and 10 where 0 means less than one year and 10 means
ten or more years.
4. funded_amnt - The total amount committed to that loan at that
point in time.
5. funded_amnt_inv -The total amount committed by investors for that
loan at that point in time.
6. grade - LC assigned loan grade.

7. id - A unique LC assigned ID for the loan listing.

8. installment - The monthly payment owed by the borrower if the loan

originates.

9. int_rate - Interest Rate on the loan

10. last_pymnt_amnt-Last total payment amount received

11. last_pymnt_d -Last month payment was received

12. loan_amnt -The listed amount of the loan applied for by the
borrower. If at some point in time, the credit department reduces
the loan amount, then it will be reflected in this value.

3
Python Mini Project

13. loan_status - Current status of the loan

14. member_id -A unique LC assigned Id for the borrower member.

15. purpose - A category provided by the borrower for the loan

request.

16. term -The number of payments on the loan. Values are in months
and can be either 36 or 60.

17. total_acc -The total number of credit lines currently in the

borrower's credit file

18. total_pymnt -Payments received to date for total amount funded

19. total_pymnt_inv -Payments received to date for portion of total

amount funded by investors 20.total_rec_int -Interest received to
date.

Questions & Solutions

Before Importing Data in to Jupyter Import all the necessary
Libraries to work with data. Import NumPy, Pandas, Matplotlib &
Seaborn with mentioned aliases.

1. Import Dataset & Understand it.

4
Python Mini Project

Using pd.read_csv() function imports the loan dataset in to

jupyter notebook. If we observe the data consists of data
regarding the total loan applications, applicants, amount,
EMI & status of the loan details using 20 attributes.

2. List down the number of rows and columns.

Using the len() function on the axes of the data frame & print the
number of rows & columns using the print function. We can see
that there are
39,717 – No of Loan applications for the span of 2007-
2011.
22 – Total No of Attributes having info of the loan.

Data Cleaning
Data cleaning is the process of fixing or removing
incorrect, corrupted, incorrectly formatted, duplicate, or
incomplete data within a dataset. When combining
multiple data sources, there are many opportunities for
data to be duplicated or mislabeled. If data is incorrect,
outcomes and algorithms are unreliable, even though
they may look correct.

5
Python Mini Project

3. Int_rate’ column is character type. With the help of

lambda function convert into float type.
To convert the datatype of the column int_rate first we need to
check which datatype it is. When checked using the type()

function, the datatype of ‘int_rate’ column is object type. So now

using Lambda function like

Loan_dataset[‘int_rate’] = Loan_dataset['int_rate'].apply(lambda x :
float(x.replace('%','')))

We are trying to remove the % sign using string specific function

called replace() & performing type casting over the resultant
value. Finally, to verify the type of modified ‘int_rate’ column we
print the type.

4. Check the datatype of each column.

6
Python Mini Project

Using .info() method we can get the datatype & non null values
count in a single click as mentioned above pic.

5. Cleaning the dataset- Remove the columns having

complete NaN value in the entire dataset.

Removing null columns & rows helps us to filter the useful data. It
is one of the important practices to delete the null rows &
columns.
Using the drop() function we can drop the columns as mentioned
above & check the weather columns deleted or not using .info()
method.

7
Python Mini Project

6. Write the code to find the value counts of the

‘loan_status’ category column and filter only the ‘fully
paid’ and ‘charged off’ categories.

Using the .count() method we can count the number after

applying the group by function & print the value counts.
There are
5627 – Charged off Loans
1140 – Current Loans
32,950 – Fully paid Loans

8
Python Mini Project

Now using passing a compound conditional statement using or

operator

Loan_dataset[ (Loan_dataset['loan_status']=='Fully Paid') |

(Loan_dataset['loan_status']=='Charged Off') ]

we can filter out the fully paid & charged off Loans. There are
38,577 fully paid & charged off Loans from all the applied loans.

7. Filter the ‘Emp_Len’ column to extract the numerical

value from the string. Hint - Emp_len : < 1year, 2
years , 3 years as 1 , 2, 3 so on.

9
Python Mini Project

To extract the numeric value from the ‘emp_len’ column import re

module & write a pattern searching code which will search for
numbers in any text & pass it to a variable. Now using lambda
function we call the function find_number(x) & extract the
numeric.

import re as re
def find_number(text):
num = re.findall(r'[0-9]+',str(text))
return " ".join(num)
Loan_dataset['new_emp_length']=Loan_dataset['emp_length'].apply(lambda
x: find_number(x))
print("\Extracting numbers from dataframe columns:")
print(Loan_dataset['new_emp_length'])

8. Using the Lambda function, remove the month from

the ‘term’ column such that ‘36 months’, ‘60 months’
appear as 36 and 60 respectively.

10
Python Mini Project

Now using a lambda function involving a replace() function we

replace all the months terms in term column.

Loan_dataset['term'] = Loan_dataset['term'].apply(lambda x :
int((x.replace('months',''))))

9. Create a new column as risky_loan_applicant by

comparing loan_amnt and funded_amnt with the
following criteria - If loan_amnt is less than equals to
funded_amnt set it as ‘0’ else set it as ‘1’.

11
Python Mini Project

10. Using the bar plot visualize the loan_status

column against categorical column grade, term,
verification_status . Write the observation from each
graph.

12
Python Mini Project

11. .Using a user defined function convert the

‘emp_len’ column into categorical column as follows -
If emp_len is less than equals to 1 then recode as
‘fresher’. If emp_len is greater than 1 and less than 3

13
Python Mini Project

then recode as ‘junior’. If emp_len is greater than 3

and less than 7 then recode as ‘senior’ If emp_len is
greater than 7 then recode as ‘expert’.

12. .Find the sum of ‘loan_amnt’ for each grade and display the
distribution of ‘loan_amnt’ using a pie plot.

14
Python Mini Project

Num5 Ibm
No ratings yet
Num5 Ibm
222 pages
Real Python Interview Questions American Express
No ratings yet
Real Python Interview Questions American Express
7 pages
Prediciton of Loan Apprval-Project Report
No ratings yet
Prediciton of Loan Apprval-Project Report
82 pages
DS Final
No ratings yet
DS Final
46 pages
SRTPV Solar Application Form
No ratings yet
SRTPV Solar Application Form
30 pages
I Love Merge
No ratings yet
I Love Merge
56 pages
RT 900 User Guide
No ratings yet
RT 900 User Guide
83 pages
N RQgi 8 Eg DUNFS451 K4 X QXA
No ratings yet
N RQgi 8 Eg DUNFS451 K4 X QXA
61 pages
Xbox 360
No ratings yet
Xbox 360
5 pages
Value Proposition
No ratings yet
Value Proposition
32 pages
IP Project I
No ratings yet
IP Project I
56 pages
Real Python Interview Questions
No ratings yet
Real Python Interview Questions
20 pages
IP Project I
No ratings yet
IP Project I
51 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
Enclosure 6307 - Product Description
No ratings yet
Enclosure 6307 - Product Description
25 pages
Rajni Ip File Final
No ratings yet
Rajni Ip File Final
42 pages
Data Science Real World Applications
100% (1)
Data Science Real World Applications
19 pages
Practical File Informatics Practices (2024-2025)
No ratings yet
Practical File Informatics Practices (2024-2025)
47 pages
GTA 5 Cheats On PC - Full List of Cheat Codes For PC - GTA BOOM
No ratings yet
GTA 5 Cheats On PC - Full List of Cheat Codes For PC - GTA BOOM
12 pages
Interim Report
No ratings yet
Interim Report
28 pages
Ip Projrct Ii
No ratings yet
Ip Projrct Ii
31 pages
Present
No ratings yet
Present
25 pages
Preseython
No ratings yet
Preseython
25 pages
Chap-7 Memory and Programmable Logic 4th Ed. Mano
100% (1)
Chap-7 Memory and Programmable Logic 4th Ed. Mano
42 pages
Eda Case Study Code
No ratings yet
Eda Case Study Code
40 pages
Sample Questions For XII IP
No ratings yet
Sample Questions For XII IP
59 pages
Naive Bayes Vs Logistic Regression
No ratings yet
Naive Bayes Vs Logistic Regression
16 pages
(The Ultimate PDF) Practical File For I.P. Practical 2023-24
No ratings yet
(The Ultimate PDF) Practical File For I.P. Practical 2023-24
45 pages
Dissertation Topics Logistics Supply Chain
100% (1)
Dissertation Topics Logistics Supply Chain
7 pages
Data Analytics Using Python
No ratings yet
Data Analytics Using Python
22 pages
Projects For Science
No ratings yet
Projects For Science
19 pages
A Project Report On Bank Management System
No ratings yet
A Project Report On Bank Management System
20 pages
Insignia - 3rd Floor - 2nd Office-VRF
No ratings yet
Insignia - 3rd Floor - 2nd Office-VRF
21 pages
Loan Financing Presentation2
No ratings yet
Loan Financing Presentation2
16 pages
Predicting Credit Risk 1713295035
No ratings yet
Predicting Credit Risk 1713295035
19 pages
Finaldoc
No ratings yet
Finaldoc
19 pages
Loan
No ratings yet
Loan
11 pages
2HRMS
No ratings yet
2HRMS
4 pages
Ploy
No ratings yet
Ploy
12 pages
Practical File Krish Sahu
No ratings yet
Practical File Krish Sahu
30 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
QP-1PB-IP-2024 Set 1
No ratings yet
QP-1PB-IP-2024 Set 1
9 pages
1 - Understanding - The - Problem - and - The - Data - Ipynb - Colaboratory
No ratings yet
1 - Understanding - The - Problem - and - The - Data - Ipynb - Colaboratory
9 pages
Predicting Personal Loan Approval Using Machine Learning Handbook
No ratings yet
Predicting Personal Loan Approval Using Machine Learning Handbook
31 pages
Logical Structuring Deloitte S Case Competition Training
100% (1)
Logical Structuring Deloitte S Case Competition Training
66 pages
Cleaning Data
No ratings yet
Cleaning Data
18 pages
Python Functions and Scope
No ratings yet
Python Functions and Scope
19 pages
Stats Presentation
No ratings yet
Stats Presentation
11 pages
Standard Bank Home Loan Prediction
No ratings yet
Standard Bank Home Loan Prediction
11 pages
Practical File - Ip
No ratings yet
Practical File - Ip
22 pages
Python Interview Questions 1653100147
No ratings yet
Python Interview Questions 1653100147
24 pages
Exercises 2
No ratings yet
Exercises 2
10 pages
Loan Risk Analysis With Databricks and XGBoost - A Databricks Guide, Including Code Samples and Notebooks (2019)
No ratings yet
Loan Risk Analysis With Databricks and XGBoost - A Databricks Guide, Including Code Samples and Notebooks (2019)
11 pages
Project 5
No ratings yet
Project 5
29 pages
Cooling Load Final
No ratings yet
Cooling Load Final
10 pages
PB 1 IP Answer Key 2024
No ratings yet
PB 1 IP Answer Key 2024
6 pages
Control Panel: Need Help?
No ratings yet
Control Panel: Need Help?
12 pages
Data Analysis in The Banking Sector: Pandas Fundamentals
No ratings yet
Data Analysis in The Banking Sector: Pandas Fundamentals
16 pages
NUS - SOC - AML - Required Capstone Project
No ratings yet
NUS - SOC - AML - Required Capstone Project
5 pages
Linear Models Reading
No ratings yet
Linear Models Reading
26 pages
SSRN Id3769854
No ratings yet
SSRN Id3769854
8 pages
Python Practices by Raihan Sobhan Sir
No ratings yet
Python Practices by Raihan Sobhan Sir
7 pages
June 2019 Pure Shadow Paper 2
No ratings yet
June 2019 Pure Shadow Paper 2
13 pages
CCNA-1 Answer
No ratings yet
CCNA-1 Answer
14 pages
Python and PowerBI Syllabus
No ratings yet
Python and PowerBI Syllabus
3 pages
CV - (Hadziq Mufid Mahmud) (Middleware Developer)
No ratings yet
CV - (Hadziq Mufid Mahmud) (Middleware Developer)
6 pages
11 em Acc Public MLM
No ratings yet
11 em Acc Public MLM
11 pages
BST, S&I, and EI: Lab Manual
No ratings yet
BST, S&I, and EI: Lab Manual
28 pages
Student H2022
No ratings yet
Student H2022
7 pages
LAB6
50% (2)
LAB6
5 pages
Practical File XII IP 2024-25
No ratings yet
Practical File XII IP 2024-25
6 pages
Geomatics Engineering Technology
No ratings yet
Geomatics Engineering Technology
3 pages
PA v0.25
No ratings yet
PA v0.25
18 pages
Project Report - Lendingclub - FINAL
No ratings yet
Project Report - Lendingclub - FINAL
24 pages
PA v0.21
No ratings yet
PA v0.21
17 pages
Data Analytics Using Python
No ratings yet
Data Analytics Using Python
10 pages
U23cs382 - PP - Answer Key
No ratings yet
U23cs382 - PP - Answer Key
5 pages
Python OE 07032025
No ratings yet
Python OE 07032025
5 pages
REMOTE PLAN WORK DAILY ACCOMPLISHMENTS As of 04.6.2020
No ratings yet
REMOTE PLAN WORK DAILY ACCOMPLISHMENTS As of 04.6.2020
12 pages
Credit Risk Modelling (EDA & Classification) - Kaggle
No ratings yet
Credit Risk Modelling (EDA & Classification) - Kaggle
21 pages
Notes 326 Set6 PDF
No ratings yet
Notes 326 Set6 PDF
18 pages
Intelligent Variable Speed Pumps: Solution Outline
No ratings yet
Intelligent Variable Speed Pumps: Solution Outline
12 pages
Pragya Sachdeva Resume
No ratings yet
Pragya Sachdeva Resume
1 page
How To Parse Data Tables From A PDF Bank Statement With Python - by Phillip Heita - Nov, 2021 - Medium
No ratings yet
How To Parse Data Tables From A PDF Bank Statement With Python - by Phillip Heita - Nov, 2021 - Medium
8 pages
12 Useful Pandas Techniques in Python For Data Manipulation
100% (2)
12 Useful Pandas Techniques in Python For Data Manipulation
19 pages
Weekly Report Format
No ratings yet
Weekly Report Format
3 pages
Kendriya Vidyalaya Sangathan, Mumbai Region 1 Pre-Board Examination 2019-20
No ratings yet
Kendriya Vidyalaya Sangathan, Mumbai Region 1 Pre-Board Examination 2019-20
11 pages
Re: Confidential: Re: Resignation Notice Due To Multiple Family Issues - 002CVM744 - Shaik Ahmad/India/IBM
No ratings yet
Re: Confidential: Re: Resignation Notice Due To Multiple Family Issues - 002CVM744 - Shaik Ahmad/India/IBM
5 pages
Chapter 1 - Shining Resonance Refrain Walkthrough - Neoseeker
No ratings yet
Chapter 1 - Shining Resonance Refrain Walkthrough - Neoseeker
6 pages
Lebanese International University: CSCI345 - Digital Logic Assignment 1
No ratings yet
Lebanese International University: CSCI345 - Digital Logic Assignment 1
5 pages
Reading Passage
No ratings yet
Reading Passage
2 pages
Juan Barbecho CFPBAssessment
No ratings yet
Juan Barbecho CFPBAssessment
2 pages
Introduction To Python Interview Questions
No ratings yet
Introduction To Python Interview Questions
2 pages
Question Bank Sybbi It Sem 3 2024-25
No ratings yet
Question Bank Sybbi It Sem 3 2024-25
2 pages
MA1014 Lecture 15 and 16 Semester 1 Intake 2023
No ratings yet
MA1014 Lecture 15 and 16 Semester 1 Intake 2023
2 pages
Social Entrepreneurship: Assignment 1: Social Enterprise and Entrepreneur Desicrew Solutions and Saloni Malhotra
No ratings yet
Social Entrepreneurship: Assignment 1: Social Enterprise and Entrepreneur Desicrew Solutions and Saloni Malhotra
3 pages
Pi1826 Boq VRF
No ratings yet
Pi1826 Boq VRF
2 pages
EC Motors & Fan Applications: What Is An Ec Motor?
No ratings yet
EC Motors & Fan Applications: What Is An Ec Motor?
5 pages
TEFC 3 Phase Squirrel Cage Induction Motors-Frame Size 71 To 315L
No ratings yet
TEFC 3 Phase Squirrel Cage Induction Motors-Frame Size 71 To 315L
8 pages
Chat Bot Assignment
No ratings yet
Chat Bot Assignment
2 pages
Pipe Price Increase Chart
No ratings yet
Pipe Price Increase Chart
1 page
Continental Qty 27 - 12 - 2021
No ratings yet
Continental Qty 27 - 12 - 2021
1 page
DAO Cheatsheet
No ratings yet
DAO Cheatsheet
3 pages
Soliaire Hls
No ratings yet
Soliaire Hls
1 page
Brochure A3
No ratings yet
Brochure A3
2 pages
First Floor
No ratings yet
First Floor
1 page
Laporan Prestasi Pekerjaan Pemasangan Elevator
No ratings yet
Laporan Prestasi Pekerjaan Pemasangan Elevator
12 pages
Assigning Items To Catalogs - TEST
No ratings yet
Assigning Items To Catalogs - TEST
10 pages

Synopsis

Uploaded by

Synopsis

Uploaded by

GROUP MINI

NO. Topic Page No.

01 Introduction to Problem Statement

05 What Could have been done better.

Introduction To Problem Statement

1. annual_inc - The self-reported annual income provided by the

7. id - A unique LC assigned ID for the loan listing.

8. installment - The monthly payment owed by the borrower if the loan

9. int_rate - Interest Rate on the loan

10. last_pymnt_amnt-Last total payment amount received

11. last_pymnt_d -Last month payment was received

13. loan_status - Current status of the loan

14. member_id -A unique LC assigned Id for the borrower member.

15. purpose - A category provided by the borrower for the loan

17. total_acc -The total number of credit lines currently in the

18. total_pymnt -Payments received to date for total amount funded

19. total_pymnt_inv -Payments received to date for portion of total

Questions & Solutions

1. Import Dataset & Understand it.

Using pd.read_csv() function imports the loan dataset in to

2. List down the number of rows and columns.

3. Int_rate’ column is character type. With the help of

function, the datatype of ‘int_rate’ column is object type. So now

We are trying to remove the % sign using string specific function

4. Check the datatype of each column.

5. Cleaning the dataset- Remove the columns having

6. Write the code to find the value counts of the

Using the .count() method we can count the number after

Now using passing a compound conditional statement using or

Loan_dataset[ (Loan_dataset['loan_status']=='Fully Paid') |

7. Filter the ‘Emp_Len’ column to extract the numerical

To extract the numeric value from the ‘emp_len’ column import re

8. Using the Lambda function, remove the month from

Now using a lambda function involving a replace() function we

9. Create a new column as risky_loan_applicant by

10. Using the bar plot visualize the loan_status

11. .Using a user defined function convert the

then recode as ‘junior’. If emp_len is greater than 3

You might also like