Synopsis
Synopsis
PROJECT
PYTHON (ITP+NPV)
LOAN DATASET
Ajith Kumar
Augustine Roy
Rithesh Kumar Singh
Rohan Kuldhar
S D Prem Kumar
TABLE OF INDEX
02 Data Description
03 Data Cleaning
04 Problem Solving
06 Takeaways
07 Conclusion
Dataset Description
It contains the complete loan data for all loans issued through the time
period 2007 to 2011.
Data Dictionary
2
Python Mini Project
12. loan_amnt -The listed amount of the loan applied for by the
borrower. If at some point in time, the credit department reduces
the loan amount, then it will be reflected in this value.
3
Python Mini Project
16. term -The number of payments on the loan. Values are in months
and can be either 36 or 60.
4
Python Mini Project
Using the len() function on the axes of the data frame & print the
number of rows & columns using the print function. We can see
that there are
39,717 – No of Loan applications for the span of 2007-
2011.
22 – Total No of Attributes having info of the loan.
Data Cleaning
Data cleaning is the process of fixing or removing
incorrect, corrupted, incorrectly formatted, duplicate, or
incomplete data within a dataset. When combining
multiple data sources, there are many opportunities for
data to be duplicated or mislabeled. If data is incorrect,
outcomes and algorithms are unreliable, even though
they may look correct.
5
Python Mini Project
Loan_dataset[‘int_rate’] = Loan_dataset['int_rate'].apply(lambda x :
float(x.replace('%','')))
6
Python Mini Project
Using .info() method we can get the datatype & non null values
count in a single click as mentioned above pic.
Removing null columns & rows helps us to filter the useful data. It
is one of the important practices to delete the null rows &
columns.
Using the drop() function we can drop the columns as mentioned
above & check the weather columns deleted or not using .info()
method.
7
Python Mini Project
8
Python Mini Project
we can filter out the fully paid & charged off Loans. There are
38,577 fully paid & charged off Loans from all the applied loans.
9
Python Mini Project
import re as re
def find_number(text):
num = re.findall(r'[0-9]+',str(text))
return " ".join(num)
Loan_dataset['new_emp_length']=Loan_dataset['emp_length'].apply(lambda
x: find_number(x))
print("\Extracting numbers from dataframe columns:")
print(Loan_dataset['new_emp_length'])
10
Python Mini Project
Loan_dataset['term'] = Loan_dataset['term'].apply(lambda x :
int((x.replace('months',''))))
11
Python Mini Project
12
Python Mini Project
13
Python Mini Project
12. .Find the sum of ‘loan_amnt’ for each grade and display the
distribution of ‘loan_amnt’ using a pie plot.
14
Python Mini Project
15