Credit Score
Model Airtime
for
Loans
USING MACHINE LEARNING
TECHNIQUES
Group
Roll No Name
21PGPEX-02 Abhishek Kumar
21PGPEX-19 Ketan Jain
21PGPEX-21 Kritika Sharma
21PGPEX-23 Maharaja E
Introduction
Literature Review
Agenda Methodology
Results
Conclusions
Introduction
AIRTIME LOAN – A NEW BUSINESS OPPORTUNITY FOR MOBILE NETWORK
OPERATORS:
• Airtime is becoming a basic commodity in developing countries.
• Failure to have sufficient air time is a challenge to many customers.
• Opportunity to offer short-term airtime loans @ 10%.
• Risk of default need to be analysed.
• Risk transcends to 3rd party loan providers / MNOs.
Default on Loan: Risk Mitigation
To mitigate this risk, credit scoring models are required to assess the capability of the
customer to pay a certain amount within the specified period.
Credit Score Models
( Estimated using a variety of historical personal and financial data obtained from customers. )
Advantages : Challenges :
• Enables faster credit decisions. • Large population of unbanked adults, data
• Reduces the cost of credit analysis. are not readily available.
• Need to search for alternative datasets in
• Monitors the portfolio of existing
order to determine whether a customer.
accounts.
• Available Data - Customer’s calls and
recharge history.
Airtime Lending Industry
COMZAFRICA ( A Micro lending firm , Africa)
Airtime Credit Service (ACS) allows users to easily access airtime on a credit basis from wherever
they are at anytime, day or night.
Literature Review
Predictor factors in credit scoring model
Challenges
• Customer details are not available due to customer privacy
constraints.
• Selection of appropriate data for model to predict effectively
– cross validation of data is needed.
• Building a model without customer details is a challenge.
• Limitation of study due to factors availability only for loan
details and customer behaviour.
• Earlier models did not consider – Multiple loans taken by a
customer, loan duration, age in network.
Methodology
Feature selection in Model
Predictor factors considered for study:
• Loan amount
• Number of recharges for each
• Usage amount
• Activation date
• Date when loan was taken
• Date of loan payment
• Total amount used every month
Feature Construction
• Loan count (how many loans the customer has at any time).
• Loan duration (how long the customer took to repay the loan).
• Age on network (how long the customer has been with the MNO).
• Loan month (the month that the loan was taken).
Evaluation techniques in model
Machine learning models
• Logistic regression (LR) – Linear model
• Decision Tree (DT) – Non-linear model
• Random Forest (RF) – Non-linear model
Evaluation
• Cost of default is much greater than benefit of customer re-paying – Factor of 10.
• Accuracy not a relevant performance metric in the model.
• Most important to correct predict, when customer defaults.
• Specificity is key performance metric for model.
• Specificity = TN/(TN + FP)
Cross validation considerations
• Out of sample data for model is recommended. Test dataset is not part of building model
from train data.
• Two prediction classes highly imbalance – repaid & default (low percentage of defaults).
• To avoid bias in model, train & test datasets need to have equal representation 50-50% of
both prediction classes – repaid & default records.
Cross validation scenarios
CV1
• Loans are divided in a ratio of 70:30 randomly without considering any variable.
• Some loans of same customer can be considered in both train & test data.
• loan in future can be used to predict a loan default status in past.
• Default records is very small representation in train & test data.
• This will create bias in model to predict customer defaults (TN).
CV2
• Loans are divided in a ratio of 70:30 randomly based on customer.
• Customer bias from CV1 is eliminated from model.
• Does not address the time issue of loan (past versus future).
• Default records is very small representation in train & test data.
• This will create bias in model to predict customer defaults (TN).
CV3
• Loans for each customer are segregated and latest loan status taken in train / test data.
Ratio of default & non-default customers are maintained 50-50% to create balanced
dataset and then split into 70:30.
• No repeat customers & no time continuity problem.
Results
Model CV1 & CV2
• High Accuracy
• Low Specificity
• Unable to predict the
customers who will default.
Model CV3
• Accuracy lower than Model C1 and C2.
• High Specificity.
• DT and RF outperform LR because of non-linearity in
model.
• the predictions for loans repaid is correct for 85% and
incorrect for 15%.
• The predictions for loans defaulted are correct for 80%
and incorrect for 20%.
Business Implications
Default Rate as low as 0.01% -> accept all the loan requests.
When default rate increases > 2%, company can generate more
profits by using the model.
Without Model : company breaks even at zero profit at a default
rate of 8%.
With Model: company can leverage profits to loan defaults >=
32% (Tolerance limit is increased).
Conclusions
• Obtaining customer details from the MNOs would improve performance.
• For a classification problem with an imbalanced number of categories, specificity is a
better measure.
• For credit scoring, correct handling of the time of loan disbursement and customer
identity are crucial to avoid over-fitting and unrealistically high estimates of accuracy.
• Random forest was the best classifier with an accuracy of 82.3% which showed that
nonlinearity and an ensemble approach was superior.
Conclusions
• When the default rate is low, it is better to offer the loans to every customer.
• With increasing default rates, a point is eventually reached whereby the model will outperform
this simple approach of offering loans to everyone.
• The maximum tolerable default rate is increased by the optimal model to 32% compared with 8%
when the company does not use a model.
• The methodology and approach studied in this paper are also relevant for a wide range of pay-as-
you-go mobile products where credit is offered for basic services: electricity tokens, smart water
meters; smart cooking devices and solar energy.
Thank you