0% found this document useful (0 votes)
39 views9 pages

PA v0.12

LendingClub is the world's largest peer-to-peer lending platform, headquartered in San Francisco. It was the first peer-to-peer lender to register securities offerings with the SEC and offer loan trading on a secondary market. The author built a predictive model using LendingClub loan data to identify borrowers likely to default based on historical data, in order to assess new potential customers' ability to repay loans. Random forest, decision tree, and neural network models were tested using Python on a dataset of over 300,000 rows and 27 columns, including categorical, numerical, and ordinal variables.

Uploaded by

Sai Pawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views9 pages

PA v0.12

LendingClub is the world's largest peer-to-peer lending platform, headquartered in San Francisco. It was the first peer-to-peer lender to register securities offerings with the SEC and offer loan trading on a secondary market. The author built a predictive model using LendingClub loan data to identify borrowers likely to default based on historical data, in order to assess new potential customers' ability to repay loans. Random forest, decision tree, and neural network models were tested using Python on a dataset of over 300,000 rows and 27 columns, including categorical, numerical, and ordinal variables.

Uploaded by

Sai Pawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

LendingClub is a US peer-to-peer lending company, headquartered in San Francisco, California.

It was the first peer-to-peer


lender to register its offerings as securities with the Securities and Exchange Commission (SEC), and to offer loan trading on
a secondary market. LendingClub is the world's largest peer-to-peer lending platform.

My Goal
Given historical data on loans given out with information on whether or not the borrower defaulted (charge-off), I have built
a model that can predict whether or not a borrower will pay back their loan? This way in the future when the company gets a
new potential customer we can assess whether or not they are likely to pay back the loan.

Model Used
1. Random Forest
2. Decision Tree
3. Neural Network
Language/Analytics Tools Used
1. Python – Jupyter Notebook

Modules used
2. Pandas
3. Numpy
4. Matplotlib
Data Set Overview: copy from
https://github.com/vishrut18/Data-Science-and-ML-Projects/blob/master/1.%20LendingClub%20Loan_Status%20Predictive%2
0model%20using%20Decision%20Tress%20and%20Random%20Forests.ipynb
In the form of table
27 columns
Mention it is categorical/Numerical/Ordinal in 1 column
2 files – 1st for data and 2nd for field description

Data set : Subset of All Lending Club loan data


https://www.kaggle.com/wordsforthewise/lending-club

Number of Rows and columns: 303704 and 26


EXPLORATORY DATA ANALYSIS

OVERALL GOAL: Get an understanding for which variables are important, view summary statistics, and
visualize the data

As we can see, this is really an imbalanced


problem. We have lot more entries of people that
fully pay off their loans than the ones that did not
pay back.

Ratio: XX:YY

The peaks at (10,000, 15,000, 20,000, etc.) indicate standard


amount loans!!
EXPLORATORY DATA ANALYSIS

Checking the correlation between the continuous feature variables

We can see that 'loan_amnt' has almost perfect


correlation with the 'installment' feature. Lets
Explore this feature further.

The peaks at (10,000, 15,000, 20,000, etc.) indicate standard


amount loans!!
EXPLORATORY DATA ANALYSIS

Checking the correlation between the continuous feature variables

boxplot showing the relationship between the loan_status a


the Loan Amount.

The loan status is not too dependant on the loan_amount. Although


the 'Charged off' status has relatively higher loan amount, which
intuitively does makes sense. We can also see this with the
summary statistics for the loan amount, grouped by the loan_status.
# Summary statistics for the loan amount, grouped by the loan_status.

Let's explore the Grade and SubGrade columns that LendingClub attributes to the loans.

# Lets display a count plot per subgrade


To get a correlation between numeric features and loan_status, first lets create a new column 'loan_repaid' which
contains 1 if the status is 'Fully Paid' and 0 if its 'Charged Off'

# Now lets create a bar plot showing this correlations

You might also like