0% found this document useful (0 votes)
42 views25 pages

Assignment - 3 - Data Analytics

Uploaded by

Learners Hub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views25 pages

Assignment - 3 - Data Analytics

Uploaded by

Learners Hub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Credit Score

Model Airtime
for

Loans
USING MACHINE LEARNING
TECHNIQUES
Group
Roll No Name
21PGPEX-02 Abhishek Kumar

21PGPEX-19 Ketan Jain

21PGPEX-21 Kritika Sharma

21PGPEX-23 Maharaja E
Introduction

Literature Review

Agenda Methodology

Results

Conclusions
Introduction
AIRTIME LOAN – A NEW BUSINESS OPPORTUNITY FOR MOBILE NETWORK
OPERATORS:
• Airtime is becoming a basic commodity in developing countries.

• Failure to have sufficient air time is a challenge to many customers.

• Opportunity to offer short-term airtime loans @ 10%.

• Risk of default need to be analysed.

• Risk transcends to 3rd party loan providers / MNOs.


Default on Loan: Risk Mitigation

To mitigate this risk, credit scoring models are required to assess the capability of the
customer to pay a certain amount within the specified period.
Credit Score Models
( Estimated using a variety of historical personal and financial data obtained from customers. )

Advantages : Challenges :
• Enables faster credit decisions. • Large population of unbanked adults, data
• Reduces the cost of credit analysis. are not readily available.
• Need to search for alternative datasets in
• Monitors the portfolio of existing
order to determine whether a customer.
accounts.
• Available Data - Customer’s calls and
recharge history.
Airtime Lending Industry
COMZAFRICA ( A Micro lending firm , Africa)

Airtime Credit Service (ACS) allows users to easily access airtime on a credit basis from wherever
they are at anytime, day or night.
Literature Review
Predictor factors in credit scoring model
Challenges
• Customer details are not available due to customer privacy
constraints.
• Selection of appropriate data for model to predict effectively
– cross validation of data is needed.
• Building a model without customer details is a challenge.

• Limitation of study due to factors availability only for loan


details and customer behaviour.
• Earlier models did not consider – Multiple loans taken by a
customer, loan duration, age in network.
Methodology
Feature selection in Model
Predictor factors considered for study:
• Loan amount
• Number of recharges for each
• Usage amount
• Activation date
• Date when loan was taken
• Date of loan payment
• Total amount used every month
Feature Construction
• Loan count (how many loans the customer has at any time).

• Loan duration (how long the customer took to repay the loan).

• Age on network (how long the customer has been with the MNO).

• Loan month (the month that the loan was taken).


Evaluation techniques in model
Machine learning models

• Logistic regression (LR) – Linear model


• Decision Tree (DT) – Non-linear model
• Random Forest (RF) – Non-linear model
Evaluation
• Cost of default is much greater than benefit of customer re-paying – Factor of 10.

• Accuracy not a relevant performance metric in the model.

• Most important to correct predict, when customer defaults.

• Specificity is key performance metric for model.


• Specificity = TN/(TN + FP)
Cross validation considerations

• Out of sample data for model is recommended. Test dataset is not part of building model
from train data.
• Two prediction classes highly imbalance – repaid & default (low percentage of defaults).

• To avoid bias in model, train & test datasets need to have equal representation 50-50% of
both prediction classes – repaid & default records.
Cross validation scenarios
CV1
• Loans are divided in a ratio of 70:30 randomly without considering any variable.
• Some loans of same customer can be considered in both train & test data.
• loan in future can be used to predict a loan default status in past.
• Default records is very small representation in train & test data.
• This will create bias in model to predict customer defaults (TN).
CV2
• Loans are divided in a ratio of 70:30 randomly based on customer.
• Customer bias from CV1 is eliminated from model.
• Does not address the time issue of loan (past versus future).
• Default records is very small representation in train & test data.
• This will create bias in model to predict customer defaults (TN).
CV3

• Loans for each customer are segregated and latest loan status taken in train / test data.
Ratio of default & non-default customers are maintained 50-50% to create balanced
dataset and then split into 70:30.
• No repeat customers & no time continuity problem.
Results
Model CV1 & CV2
• High Accuracy

• Low Specificity

• Unable to predict the


customers who will default.
Model CV3
• Accuracy lower than Model C1 and C2.
• High Specificity.
• DT and RF outperform LR because of non-linearity in
model.
• the predictions for loans repaid is correct for 85% and
incorrect for 15%.
• The predictions for loans defaulted are correct for 80%
and incorrect for 20%.
Business Implications
Default Rate as low as 0.01% -> accept all the loan requests.
When default rate increases > 2%, company can generate more
profits by using the model.
Without Model : company breaks even at zero profit at a default
rate of 8%.
With Model: company can leverage profits to loan defaults >=
32% (Tolerance limit is increased).
Conclusions
• Obtaining customer details from the MNOs would improve performance.

• For a classification problem with an imbalanced number of categories, specificity is a


better measure.

• For credit scoring, correct handling of the time of loan disbursement and customer
identity are crucial to avoid over-fitting and unrealistically high estimates of accuracy.

• Random forest was the best classifier with an accuracy of 82.3% which showed that
nonlinearity and an ensemble approach was superior.
Conclusions
• When the default rate is low, it is better to offer the loans to every customer.

• With increasing default rates, a point is eventually reached whereby the model will outperform
this simple approach of offering loans to everyone.

• The maximum tolerable default rate is increased by the optimal model to 32% compared with 8%
when the company does not use a model.

• The methodology and approach studied in this paper are also relevant for a wide range of pay-as-
you-go mobile products where credit is offered for basic services: electricity tokens, smart water
meters; smart cooking devices and solar energy.
Thank you

You might also like