0% found this document useful (0 votes)
12 views15 pages

Synopsis

The document outlines a mini project focused on analyzing a loans dataset to identify variables that indicate the likelihood of loan default. It includes sections on data description, cleaning, and problem-solving techniques using Python, along with a detailed data dictionary and various data manipulation tasks. The project aims to provide insights for identifying risky loan applicants to mitigate financial losses.

Uploaded by

Arjun Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views15 pages

Synopsis

The document outlines a mini project focused on analyzing a loans dataset to identify variables that indicate the likelihood of loan default. It includes sections on data description, cleaning, and problem-solving techniques using Python, along with a detailed data dictionary and various data manipulation tasks. The project aims to provide insights for identifying risky loan applicants to mitigate financial losses.

Uploaded by

Arjun Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

GROUP MINI

PROJECT
PYTHON (ITP+NPV)
LOAN DATASET

Ajith Kumar
Augustine Roy
Rithesh Kumar Singh
Rohan Kuldhar
S D Prem Kumar

TABLE OF INDEX

NO. Topic Page No.


Python Mini Project

01 Introduction to Problem Statement

02 Data Description

03 Data Cleaning

04 Problem Solving

05 What Could have been done better.

06 Takeaways

07 Conclusion

Introduction To Problem Statement


Loans dataset aims to identify variables which indicate if a person is
likely to default, which can be used for identifying the risky loan
applicants to avoid any financial loss to the company.

Dataset Description
It contains the complete loan data for all loans issued through the time
period 2007 to 2011.

Data Dictionary

2
Python Mini Project

1. annual_inc - The self-reported annual income provided by the


borrower during registration.
2. dti - A ratio calculated using the borrower’s total monthly debt
payments on the total debt obligations, excluding mortgage and the
requested LC loan, divided by the borrower’s self-reported monthly
income.
3. emp_length- Employment length in years. Possible values are
between 0 and 10 where 0 means less than one year and 10 means
ten or more years.
4. funded_amnt - The total amount committed to that loan at that
point in time.
5. funded_amnt_inv -The total amount committed by investors for that
loan at that point in time.
6. grade - LC assigned loan grade.

7. id - A unique LC assigned ID for the loan listing.

8. installment - The monthly payment owed by the borrower if the loan


originates.

9. int_rate - Interest Rate on the loan

10. last_pymnt_amnt-Last total payment amount received

11. last_pymnt_d -Last month payment was received

12. loan_amnt -The listed amount of the loan applied for by the
borrower. If at some point in time, the credit department reduces
the loan amount, then it will be reflected in this value.

3
Python Mini Project

13. loan_status - Current status of the loan

14. member_id -A unique LC assigned Id for the borrower member.

15. purpose - A category provided by the borrower for the loan


request.

16. term -The number of payments on the loan. Values are in months
and can be either 36 or 60.

17. total_acc -The total number of credit lines currently in the


borrower's credit file

18. total_pymnt -Payments received to date for total amount funded

19. total_pymnt_inv -Payments received to date for portion of total


amount funded by investors 20.total_rec_int -Interest received to
date.

Questions & Solutions


Before Importing Data in to Jupyter Import all the necessary
Libraries to work with data. Import NumPy, Pandas, Matplotlib &
Seaborn with mentioned aliases.

1. Import Dataset & Understand it.

4
Python Mini Project

Using pd.read_csv() function imports the loan dataset in to


jupyter notebook. If we observe the data consists of data
regarding the total loan applications, applicants, amount,
EMI & status of the loan details using 20 attributes.

2. List down the number of rows and columns.

Using the len() function on the axes of the data frame & print the
number of rows & columns using the print function. We can see
that there are
39,717 – No of Loan applications for the span of 2007-
2011.
22 – Total No of Attributes having info of the loan.

Data Cleaning
Data cleaning is the process of fixing or removing
incorrect, corrupted, incorrectly formatted, duplicate, or
incomplete data within a dataset. When combining
multiple data sources, there are many opportunities for
data to be duplicated or mislabeled. If data is incorrect,
outcomes and algorithms are unreliable, even though
they may look correct.

5
Python Mini Project

3. Int_rate’ column is character type. With the help of


lambda function convert into float type.
To convert the datatype of the column int_rate first we need to
check which datatype it is. When checked using the type()

function, the datatype of ‘int_rate’ column is object type. So now


using Lambda function like

Loan_dataset[‘int_rate’] = Loan_dataset['int_rate'].apply(lambda x :
float(x.replace('%','')))

We are trying to remove the % sign using string specific function


called replace() & performing type casting over the resultant
value. Finally, to verify the type of modified ‘int_rate’ column we
print the type.

4. Check the datatype of each column.

6
Python Mini Project

Using .info() method we can get the datatype & non null values
count in a single click as mentioned above pic.

5. Cleaning the dataset- Remove the columns having


complete NaN value in the entire dataset.

Removing null columns & rows helps us to filter the useful data. It
is one of the important practices to delete the null rows &
columns.
Using the drop() function we can drop the columns as mentioned
above & check the weather columns deleted or not using .info()
method.

7
Python Mini Project

6. Write the code to find the value counts of the


‘loan_status’ category column and filter only the ‘fully
paid’ and ‘charged off’ categories.

Using the .count() method we can count the number after


applying the group by function & print the value counts.
There are
5627 – Charged off Loans
1140 – Current Loans
32,950 – Fully paid Loans

8
Python Mini Project

Now using passing a compound conditional statement using or


operator

Loan_dataset[ (Loan_dataset['loan_status']=='Fully Paid') |


(Loan_dataset['loan_status']=='Charged Off') ]

we can filter out the fully paid & charged off Loans. There are
38,577 fully paid & charged off Loans from all the applied loans.

7. Filter the ‘Emp_Len’ column to extract the numerical


value from the string. Hint - Emp_len : < 1year, 2
years , 3 years as 1 , 2, 3 so on.

9
Python Mini Project

To extract the numeric value from the ‘emp_len’ column import re


module & write a pattern searching code which will search for
numbers in any text & pass it to a variable. Now using lambda
function we call the function find_number(x) & extract the
numeric.

import re as re
def find_number(text):
num = re.findall(r'[0-9]+',str(text))
return " ".join(num)
Loan_dataset['new_emp_length']=Loan_dataset['emp_length'].apply(lambda
x: find_number(x))
print("\Extracting numbers from dataframe columns:")
print(Loan_dataset['new_emp_length'])

8. Using the Lambda function, remove the month from


the ‘term’ column such that ‘36 months’, ‘60 months’
appear as 36 and 60 respectively.

10
Python Mini Project

Now using a lambda function involving a replace() function we


replace all the months terms in term column.

Loan_dataset['term'] = Loan_dataset['term'].apply(lambda x :
int((x.replace('months',''))))

9. Create a new column as risky_loan_applicant by


comparing loan_amnt and funded_amnt with the
following criteria - If loan_amnt is less than equals to
funded_amnt set it as ‘0’ else set it as ‘1’.

11
Python Mini Project

10. Using the bar plot visualize the loan_status


column against categorical column grade, term,
verification_status . Write the observation from each
graph.

12
Python Mini Project

11. .Using a user defined function convert the


‘emp_len’ column into categorical column as follows -
If emp_len is less than equals to 1 then recode as
‘fresher’. If emp_len is greater than 1 and less than 3

13
Python Mini Project

then recode as ‘junior’. If emp_len is greater than 3


and less than 7 then recode as ‘senior’ If emp_len is
greater than 7 then recode as ‘expert’.

12. .Find the sum of ‘loan_amnt’ for each grade and display the
distribution of ‘loan_amnt’ using a pie plot.

14
Python Mini Project

15

You might also like