0% found this document useful (0 votes)
63 views

Alternate Data

The document discusses alternative data analytics and describes modules for harnessing alternative data including a data mart, feature store, ML models, and use cases. It provides examples of alternative data sources like mobile, telecom, and social media data. It also describes how the feature store can be used to accelerate feature engineering for predictive modeling. Additionally, it provides examples of how natural language processing can be used to extract features from text data like SMS messages to generate customer insights and features for decisioning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Alternate Data

The document discusses alternative data analytics and describes modules for harnessing alternative data including a data mart, feature store, ML models, and use cases. It provides examples of alternative data sources like mobile, telecom, and social media data. It also describes how the feature store can be used to accelerate feature engineering for predictive modeling. Additionally, it provides examples of how natural language processing can be used to extract features from text data like SMS messages to generate customer insights and features for decisioning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

2 Alternative Data Analytics

Click to add text


Alternative Data Components
Modules for harnessing the power of Alternate Data

DataMart Feature Store ML Models Use Cases

 Mobile Device  Seamless transformation of raw  ML Algorithms  Customer Profiling and


 Telecom data to Features, to be used for  Model Landscape Segmentation
 E-Commerce predictive modelling  Model Development  Credit Scoring
 Utility and Payments (POS)  Model Documentation  Income Estimation
 Social Media  Model Validation  Pricing
 E-Mail  Model Deployment  Propensity
 Insurance  Independent Review
 Others: Travel, Rent, Web, Tax,  Policy Framework
Government Records,
Psychometrics etc.
 Bank Statement ***
 Alternative Lending Products
Payment data
Leverage our AGGREGATOR Leverage our FEATURE STORE We use Advanced Machine Leverage our expertise for multiple
DATAMART to accelerate data to accelerate Feature Engineering Learning algorithms to build use cases to get a 360 degree view
architecture and storage for building predictive models and Explainable predictive models for of a customer relationship
decision analytics Financial Institutions

*** Physical Copy of Bank Statement has been used for manual underwriting in consumer lending for long. However, the information typically does not flow as a
feature in a credit scoring engine. In Digital Lending paradigm, bank statement are being digitized and its information is being used for credit scoring
Alternative Data Feature Store
Automated Feature Engineering

Feature Primitives

Feature Synthesis

Raw Data Feature Classification Feature Store Predictive Model

Pattern Matching

Automated Feature Engineering


Layer

Expert Judgment

Raw data points are transformed to features using Feature Synthesis (applying library of transformations to raw data) and Feature Mining using NLP (e.g. extraction of
features from Text data such as SMS, Email), with an overlay of expert judgement.
Illustrative Feature Mining from SMS Data using NLP
Automated Feature Mining

Data SMS Tagging Data Insights Feature Engg. Decisioning

 SMS classification to  Rules to extract  Roll-up of individual SMS Scoring Engine

standard L1 and L2 information from each level data at customer level Customer Risk
(Id / pool) Score
SMS1 categories SMS such as ID, Amount, to generate features for
Customer1 0.99
Transaction Type, Date etc. model training, such as:
Customer2 0.80
 L1 such as Savings, Customer 3 0.50
SMS2 Current, Debit Card, Credit • Monthly Income
Card, E-Wallets etc. • Total Loans O/s Customer4 0.25

• Total EMI
 L2 such as Savings > • Expected Monthly Spend
SMS3
Salary, Spend, Balance, and Savings
Investment, Loan / EMI • Delinquency pattern
related, Account Info
SMS4
Process Process
 NLP based classification Process  Feature engineering by data
SMS5 (SMS embeddings using  Pattern matching based science team
neural networks) data extraction rules
Feature Mining: Bank Statement with Text Recognition and NLP
Aptivaa’s Bank Statement API supports English and Arabic Bank Statement
Customer Score

Feature Generation and


Pattern Recognition
AutoScoring

 Usage of Computer Vision and NLP


Peer classification
 Transaction comparison as per Text
algorithms for scanning & digitization  Identification of the language available  Key insights generated around Income
in the statement and translation to English Patterns/Classification Rules into standard pattern, Customer behavior and
 Custom Neural Network Models for Credit
transaction typesAnalysis Psychographic Segmentation and further,
English and Arabic  Identification of Text
 Auto-summary generation using various metrics generated for Risk Scoring
 Support for both languages in the same Patterns/Classification Rules in a master
table (e.g. transaction description customizable, user-defined metrics  Feature generation (for adding to
sentence as well
containing ‘Salary’/’Payroll’ are of type exposed on user interface providing full Application Scorecard and creating
 Easily trainable for specific fonts types Salary control of analysis to user* internal Feature Store)
and sizes  Pivoting by different transaction types and
 Minimizes data errors through present  Auto Scoring (automated scorecard,
validation rules and users’ validation as other dimensions (such as Time period, provided historical performance data)
well Debit/Credit etc.)
 Final reports analysis is available in
both PDF as well as in smart HTML
formats

Digitization of the input Transaction Classification


statement and Analysis

Income Estimation

Spend Analytics

Fixed Obligations
Alternative Data Modelling
Explainable Machine Learning for superior predictive power with full model transparency

Bin 1 Bin 2 Bin 3 Bin 4


XgBoost
Feature 1
Bin 1 Bin 2 Bin 3 Bin 4
Feature 2
Feature 1 Explainable ML
Bin 1 Bin 2 Bin 3 Bin 4
Feature 3
Feature Store

Feature 2 Bin 1 Bin 2 Bin 3 Bin 4


Feature 4
… Bin 1 Bin 2 Bin 3 Bin 4
Feature 5
Feature M Bin 1 Bin 2 Bin 3 Bin 4

Bin 1 Bin 2 Bin 3 Bin 4
Feature N Neural Net

Important Feature Predictive Model


ML Algorithms
Features Discretization

Non-linear Machine Learning Models are used for feature selection. Discretization and Transformed (such as WoE transformation) Features are passed as an input to a Linear
Algorithm or XgBoost (with Monotonic Constraints) to build fully-explainable predictive models
Alternative Data Model Landscape for different customer segments
Illustrative Model Landscape
Approach 1 Approach 2

Step1 Step1

Alternate + Traditional Data Model


Alternate Data Model
for Bureau Hit Segment
for all customers

Step2 Step2

Alternate +
Traditional Data for some segments Alternate Data Model
for No Hit Segment

Some Segments (e.g. Medium Risk Customers) are Combined Model is used for Hit Segment and
rescored using a Combined Data Model (for Bureau Standalone Alternate Data Model is used for No Hit
Hit cases only) Segment

The final approach is selected on basis of product (ticket size, loan tenor), data cost (bureau pull, alternate data cost) and marginal contribution of a source of data to predictive power
Combining Alternative Data with Traditional Data
Prevalent methodologies to combine alternative data with traditional data

Approaches to combine Alternative and Traditional Data

Traditional Data Alternative Data


Features Features

Single Model trained on combined Alternative Model Score added Traditional Model Score added Two independent models are
dataset, with features from both sources as a feature to traditional data as a feature to alternative data trained, and a matrix of scores
for model training for model training from both models is used for
decisioning
Illustrative Alternative Data Use Case
Credit Scoring using Telco Data

Call Location
User Info
Records Data

Internet Top-Ups
VAS Data Demograp Income Spend
Usage Data
hics Related Related

Daily Postpaid
SMS Data Usage Social Employme
Balance Payment
Duration Network nt

Mobile Device
Apps Data
Wallet Txn Info

Data Category Feature Category ML Algorithms Scoring Engine


Illustrative Alternative Data Use Case
Credit Scoring using Device Data

XgBoost

Call Location Demograp Income Spend


SMS Data
Records Data hics Related Related

Contacts Device Fixed Social


Apps Info Assets
Info Info Obligation Network

Data Category Feature Category ML Algorithms Scoring Engine


Business Benefit of Analytics
Improved ROA

Use of predictive models instead of heuristic/rule-based models can significantly improve profitability, business volume and ROA

1. For instance, for a default prediction model, an improvement of Gini coefficient from 40% to 50% 2. This would result in either higher business
would result in Lower Default Rate for same approval rate (reduction to 1.3% DR from 3.0% DR volumes at same delinquency rates; or lower
at same score cut-off for the ‘illustrative portfolio’) or Higher Approval Rate for same default rate delinquency rates at same business volume. In
(improvement in Approval Rate from 72.7% to 89.1% at ~3% DR for the ‘illustrative portfolio’). either case, ROA would improve significantly.

Score Cut-Off Band Applications Defaults Gini = 40% Gini = 50%


DR for Approved Cases Approval Rate ROA DR for Approved Cases Approval Rate ROA

1 10 8 5.7% 98.2% 0.1% 5.6% 98.2% 0.2%


2 20 6 4.8% 94.5% 0.6% 4.2% 94.5% 0.9%
3 30 5 4.1% 89.1% 1.0% 2.9% 89.1% 1.6%
4 40 4 3.6% 81.8% 1.2% 1.8% 81.8% 2.1%
5 50 4 3.0% 72.7% 1.5% 1.3% 72.7% 2.4%
6 60 3 2.6% 61.8% 1.7% 0.9% 61.8% 2.6%
7 70 3 2.2% 49.1% 1.9% 0.7% 49.1% 2.6%
8 80 2 2.1% 34.5% 1.9% 0.5% 34.5% 2.7%
9 90 2 2.0% 18.2% 2.0% 0.0% 18.2% 3.0%
10 100 2 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
Challenges in using Alternative Data
Not all data is equal

1 Compliance with GDPR guidelines for expats 2 Data sparsity (incomplete datasets)

3 4 Unstructured formats (e.g. SMS data), not suitable for saving in


Data Integration challenges (e.g. customers will not
RDBMS
have a common ID across data sources)

5 Vendor Risk (e.g. financial strength of third-party data 6 Data Quality and Veracity
providers)

7 Commercial Implications (Cost vs. Benefit) 8 Different predictive power for different data sources, so cannot be used
with performance assessment

You might also like