Assignment - 3 - Data Analytics
Assignment - 3 - Data Analytics
Model Airtime
for
Loans
USING MACHINE LEARNING
TECHNIQUES
Group
Roll No Name
21PGPEX-02 Abhishek Kumar
21PGPEX-23 Maharaja E
Introduction
Literature Review
Agenda Methodology
Results
Conclusions
Introduction
AIRTIME LOAN – A NEW BUSINESS OPPORTUNITY FOR MOBILE NETWORK
OPERATORS:
• Airtime is becoming a basic commodity in developing countries.
To mitigate this risk, credit scoring models are required to assess the capability of the
customer to pay a certain amount within the specified period.
Credit Score Models
( Estimated using a variety of historical personal and financial data obtained from customers. )
Advantages : Challenges :
• Enables faster credit decisions. • Large population of unbanked adults, data
• Reduces the cost of credit analysis. are not readily available.
• Need to search for alternative datasets in
• Monitors the portfolio of existing
order to determine whether a customer.
accounts.
• Available Data - Customer’s calls and
recharge history.
Airtime Lending Industry
COMZAFRICA ( A Micro lending firm , Africa)
Airtime Credit Service (ACS) allows users to easily access airtime on a credit basis from wherever
they are at anytime, day or night.
Literature Review
Predictor factors in credit scoring model
Challenges
• Customer details are not available due to customer privacy
constraints.
• Selection of appropriate data for model to predict effectively
– cross validation of data is needed.
• Building a model without customer details is a challenge.
• Loan duration (how long the customer took to repay the loan).
• Age on network (how long the customer has been with the MNO).
• Out of sample data for model is recommended. Test dataset is not part of building model
from train data.
• Two prediction classes highly imbalance – repaid & default (low percentage of defaults).
• To avoid bias in model, train & test datasets need to have equal representation 50-50% of
both prediction classes – repaid & default records.
Cross validation scenarios
CV1
• Loans are divided in a ratio of 70:30 randomly without considering any variable.
• Some loans of same customer can be considered in both train & test data.
• loan in future can be used to predict a loan default status in past.
• Default records is very small representation in train & test data.
• This will create bias in model to predict customer defaults (TN).
CV2
• Loans are divided in a ratio of 70:30 randomly based on customer.
• Customer bias from CV1 is eliminated from model.
• Does not address the time issue of loan (past versus future).
• Default records is very small representation in train & test data.
• This will create bias in model to predict customer defaults (TN).
CV3
• Loans for each customer are segregated and latest loan status taken in train / test data.
Ratio of default & non-default customers are maintained 50-50% to create balanced
dataset and then split into 70:30.
• No repeat customers & no time continuity problem.
Results
Model CV1 & CV2
• High Accuracy
• Low Specificity
• For credit scoring, correct handling of the time of loan disbursement and customer
identity are crucial to avoid over-fitting and unrealistically high estimates of accuracy.
• Random forest was the best classifier with an accuracy of 82.3% which showed that
nonlinearity and an ensemble approach was superior.
Conclusions
• When the default rate is low, it is better to offer the loans to every customer.
• With increasing default rates, a point is eventually reached whereby the model will outperform
this simple approach of offering loans to everyone.
• The maximum tolerable default rate is increased by the optimal model to 32% compared with 8%
when the company does not use a model.
• The methodology and approach studied in this paper are also relevant for a wide range of pay-as-
you-go mobile products where credit is offered for basic services: electricity tokens, smart water
meters; smart cooking devices and solar energy.
Thank you