0% found this document useful (0 votes)

63 views

Alternate Data

The document discusses alternative data analytics and describes modules for harnessing alternative data including a data mart, feature store, ML models, and use cases. It provides examples of alternative data sources like mobile, telecom, and social media data. It also describes how the feature store can be used to accelerate feature engineering for predictive modeling. Additionally, it provides examples of how natural language processing can be used to extract features from text data like SMS messages to generate customer insights and features for decisioning.

Uploaded by

Brijesh Kumar Giri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

Alternate Data

Uploaded by

Brijesh Kumar Giri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

2 Alternative Data Analytics

Click to add text

Alternative Data Components
Modules for harnessing the power of Alternate Data

DataMart Feature Store ML Models Use Cases

 Mobile Device  Seamless transformation of raw  ML Algorithms  Customer Profiling and

 Telecom data to Features, to be used for  Model Landscape Segmentation
 E-Commerce predictive modelling  Model Development  Credit Scoring
 Utility and Payments (POS)  Model Documentation  Income Estimation
 Social Media  Model Validation  Pricing
 E-Mail  Model Deployment  Propensity
 Insurance  Independent Review
 Others: Travel, Rent, Web, Tax,  Policy Framework
Government Records,
Psychometrics etc.
 Bank Statement ***
 Alternative Lending Products
Payment data
Leverage our AGGREGATOR Leverage our FEATURE STORE We use Advanced Machine Leverage our expertise for multiple
DATAMART to accelerate data to accelerate Feature Engineering Learning algorithms to build use cases to get a 360 degree view
architecture and storage for building predictive models and Explainable predictive models for of a customer relationship
decision analytics Financial Institutions

*** Physical Copy of Bank Statement has been used for manual underwriting in consumer lending for long. However, the information typically does not flow as a
feature in a credit scoring engine. In Digital Lending paradigm, bank statement are being digitized and its information is being used for credit scoring
Alternative Data Feature Store
Automated Feature Engineering

Feature Primitives

Feature Synthesis

Raw Data Feature Classification Feature Store Predictive Model

Pattern Matching

Automated Feature Engineering

Layer

Expert Judgment

Raw data points are transformed to features using Feature Synthesis (applying library of transformations to raw data) and Feature Mining using NLP (e.g. extraction of
features from Text data such as SMS, Email), with an overlay of expert judgement.
Illustrative Feature Mining from SMS Data using NLP
Automated Feature Mining

Data SMS Tagging Data Insights Feature Engg. Decisioning

 SMS classification to  Rules to extract  Roll-up of individual SMS Scoring Engine

standard L1 and L2 information from each level data at customer level Customer Risk
(Id / pool) Score
SMS1 categories SMS such as ID, Amount, to generate features for
Customer1 0.99
Transaction Type, Date etc. model training, such as:
Customer2 0.80
 L1 such as Savings, Customer 3 0.50
SMS2 Current, Debit Card, Credit • Monthly Income
Card, E-Wallets etc. • Total Loans O/s Customer4 0.25

• Total EMI
 L2 such as Savings > • Expected Monthly Spend
SMS3
Salary, Spend, Balance, and Savings
Investment, Loan / EMI • Delinquency pattern
related, Account Info
SMS4
Process Process
 NLP based classification Process  Feature engineering by data
SMS5 (SMS embeddings using  Pattern matching based science team
neural networks) data extraction rules
Feature Mining: Bank Statement with Text Recognition and NLP
Aptivaa’s Bank Statement API supports English and Arabic Bank Statement
Customer Score

Feature Generation and

Pattern Recognition
AutoScoring

 Usage of Computer Vision and NLP

Peer classification
 Transaction comparison as per Text
algorithms for scanning & digitization  Identification of the language available  Key insights generated around Income
in the statement and translation to English Patterns/Classification Rules into standard pattern, Customer behavior and
 Custom Neural Network Models for Credit
transaction typesAnalysis Psychographic Segmentation and further,
English and Arabic  Identification of Text
 Auto-summary generation using various metrics generated for Risk Scoring
 Support for both languages in the same Patterns/Classification Rules in a master
table (e.g. transaction description customizable, user-defined metrics  Feature generation (for adding to
sentence as well
containing ‘Salary’/’Payroll’ are of type exposed on user interface providing full Application Scorecard and creating
 Easily trainable for specific fonts types Salary control of analysis to user* internal Feature Store)
and sizes  Pivoting by different transaction types and
 Minimizes data errors through present  Auto Scoring (automated scorecard,
validation rules and users’ validation as other dimensions (such as Time period, provided historical performance data)
well Debit/Credit etc.)
 Final reports analysis is available in
both PDF as well as in smart HTML
formats

Digitization of the input Transaction Classification

statement and Analysis

Income Estimation

Spend Analytics

Fixed Obligations
Alternative Data Modelling
Explainable Machine Learning for superior predictive power with full model transparency

Bin 1 Bin 2 Bin 3 Bin 4

XgBoost
Feature 1
Bin 1 Bin 2 Bin 3 Bin 4
Feature 2
Feature 1 Explainable ML
Bin 1 Bin 2 Bin 3 Bin 4
Feature 3
Feature Store

Feature 2 Bin 1 Bin 2 Bin 3 Bin 4

Feature 4
… Bin 1 Bin 2 Bin 3 Bin 4
Feature 5
Feature M Bin 1 Bin 2 Bin 3 Bin 4
…
Bin 1 Bin 2 Bin 3 Bin 4
Feature N Neural Net

Important Feature Predictive Model

ML Algorithms
Features Discretization

Non-linear Machine Learning Models are used for feature selection. Discretization and Transformed (such as WoE transformation) Features are passed as an input to a Linear
Algorithm or XgBoost (with Monotonic Constraints) to build fully-explainable predictive models
Alternative Data Model Landscape for different customer segments
Illustrative Model Landscape
Approach 1 Approach 2

Step1 Step1

Alternate + Traditional Data Model

Alternate Data Model
for Bureau Hit Segment
for all customers

Step2 Step2

Alternate +
Traditional Data for some segments Alternate Data Model
for No Hit Segment

Some Segments (e.g. Medium Risk Customers) are Combined Model is used for Hit Segment and
rescored using a Combined Data Model (for Bureau Standalone Alternate Data Model is used for No Hit
Hit cases only) Segment

The final approach is selected on basis of product (ticket size, loan tenor), data cost (bureau pull, alternate data cost) and marginal contribution of a source of data to predictive power
Combining Alternative Data with Traditional Data
Prevalent methodologies to combine alternative data with traditional data

Approaches to combine Alternative and Traditional Data

Traditional Data Alternative Data

Features Features

Single Model trained on combined Alternative Model Score added Traditional Model Score added Two independent models are
dataset, with features from both sources as a feature to traditional data as a feature to alternative data trained, and a matrix of scores
for model training for model training from both models is used for
decisioning
Illustrative Alternative Data Use Case
Credit Scoring using Telco Data

Call Location
User Info
Records Data

Internet Top-Ups
VAS Data Demograp Income Spend
Usage Data
hics Related Related

Daily Postpaid
SMS Data Usage Social Employme
Balance Payment
Duration Network nt

Mobile Device
Apps Data
Wallet Txn Info

Data Category Feature Category ML Algorithms Scoring Engine

Illustrative Alternative Data Use Case
Credit Scoring using Device Data

XgBoost

Call Location Demograp Income Spend

SMS Data
Records Data hics Related Related

Contacts Device Fixed Social

Apps Info Assets
Info Info Obligation Network

Data Category Feature Category ML Algorithms Scoring Engine

Business Benefit of Analytics
Improved ROA

Use of predictive models instead of heuristic/rule-based models can significantly improve profitability, business volume and ROA

1. For instance, for a default prediction model, an improvement of Gini coefficient from 40% to 50% 2. This would result in either higher business
would result in Lower Default Rate for same approval rate (reduction to 1.3% DR from 3.0% DR volumes at same delinquency rates; or lower
at same score cut-off for the ‘illustrative portfolio’) or Higher Approval Rate for same default rate delinquency rates at same business volume. In
(improvement in Approval Rate from 72.7% to 89.1% at ~3% DR for the ‘illustrative portfolio’). either case, ROA would improve significantly.

Score Cut-Off Band Applications Defaults Gini = 40% Gini = 50%

DR for Approved Cases Approval Rate ROA DR for Approved Cases Approval Rate ROA

1 10 8 5.7% 98.2% 0.1% 5.6% 98.2% 0.2%

2 20 6 4.8% 94.5% 0.6% 4.2% 94.5% 0.9%
3 30 5 4.1% 89.1% 1.0% 2.9% 89.1% 1.6%
4 40 4 3.6% 81.8% 1.2% 1.8% 81.8% 2.1%
5 50 4 3.0% 72.7% 1.5% 1.3% 72.7% 2.4%
6 60 3 2.6% 61.8% 1.7% 0.9% 61.8% 2.6%
7 70 3 2.2% 49.1% 1.9% 0.7% 49.1% 2.6%
8 80 2 2.1% 34.5% 1.9% 0.5% 34.5% 2.7%
9 90 2 2.0% 18.2% 2.0% 0.0% 18.2% 3.0%
10 100 2 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
Challenges in using Alternative Data
Not all data is equal

1 Compliance with GDPR guidelines for expats 2 Data sparsity (incomplete datasets)

3 4 Unstructured formats (e.g. SMS data), not suitable for saving in

Data Integration challenges (e.g. customers will not
RDBMS
have a common ID across data sources)

5 Vendor Risk (e.g. financial strength of third-party data 6 Data Quality and Veracity
providers)

7 Commercial Implications (Cost vs. Benefit) 8 Different predictive power for different data sources, so cannot be used
with performance assessment

Software Associates Assignment 2
100% (2)
Software Associates Assignment 2
8 pages
DM2TR
No ratings yet
DM2TR
1,055 pages
Python Ds Lab Manual
No ratings yet
Python Ds Lab Manual
82 pages
DSML - Sem 1 - Module 02 - Types of Data
No ratings yet
DSML - Sem 1 - Module 02 - Types of Data
31 pages
Untitled
No ratings yet
Untitled
200 pages
Nikhil Aryal: Roshan Sapkota Aarya Dahal Kushal Poudel
No ratings yet
Nikhil Aryal: Roshan Sapkota Aarya Dahal Kushal Poudel
12 pages
Test Selects - TGI Kolkata Sept9 - 2016
No ratings yet
Test Selects - TGI Kolkata Sept9 - 2016
88 pages
Sib Presentation
No ratings yet
Sib Presentation
33 pages
Corim - Risk & Crisis Corporate Campaign (Full)
No ratings yet
Corim - Risk & Crisis Corporate Campaign (Full)
40 pages
AMFI Reports
No ratings yet
AMFI Reports
100 pages
Zonewise Report 11.10.22
No ratings yet
Zonewise Report 11.10.22
16 pages
Bihar State - Serviceable Pin Codes
No ratings yet
Bihar State - Serviceable Pin Codes
12 pages
Bs CSV 22444161
0% (1)
Bs CSV 22444161
10 pages
080920-Maharashtra Circle and Its Branches
No ratings yet
080920-Maharashtra Circle and Its Branches
245 pages
8600134501.1 - Reports
No ratings yet
8600134501.1 - Reports
72 pages
QR Feedback Dine-In Delivery January'23
No ratings yet
QR Feedback Dine-In Delivery January'23
363 pages
Ckyc 20 Cases - Tss
No ratings yet
Ckyc 20 Cases - Tss
25 pages
Technology in MFI
No ratings yet
Technology in MFI
24 pages
Member List Template126
No ratings yet
Member List Template126
11 pages
Sampel For Ty Bcom 2021
No ratings yet
Sampel For Ty Bcom 2021
18 pages
Updated - Note On Neobanks
No ratings yet
Updated - Note On Neobanks
38 pages
Deck MaXight All 2020 v3 Reduced
No ratings yet
Deck MaXight All 2020 v3 Reduced
32 pages
JD v3.1 Tirunelveli Travels 2022 23
No ratings yet
JD v3.1 Tirunelveli Travels 2022 23
22 pages
Mohali Chat
No ratings yet
Mohali Chat
55 pages
Sanjay Pandey
No ratings yet
Sanjay Pandey
153 pages
New DSR
No ratings yet
New DSR
53 pages
Student Details of 2017-18 Updated
No ratings yet
Student Details of 2017-18 Updated
202 pages
Auro Scholar Deck 19may22
No ratings yet
Auro Scholar Deck 19may22
27 pages
15 Minute Stock Breakouts, Technical Analysis Scanner
No ratings yet
15 Minute Stock Breakouts, Technical Analysis Scanner
8 pages
DMT Compiled Merged
No ratings yet
DMT Compiled Merged
442 pages
All Listings Report 06!17!2023
No ratings yet
All Listings Report 06!17!2023
373 pages
Sastra Alumni Details
No ratings yet
Sastra Alumni Details
118 pages
Delhi Bank 2
No ratings yet
Delhi Bank 2
56 pages
Blacklisted Employers
No ratings yet
Blacklisted Employers
2 pages
India Mobile Resellers
No ratings yet
India Mobile Resellers
125 pages
Mobile Network Codes - India 2015
No ratings yet
Mobile Network Codes - India 2015
116 pages
Inc42's Q4 2022 Fintech Report
No ratings yet
Inc42's Q4 2022 Fintech Report
60 pages
Module 3 Lab Manual 5 Input Validation - Answer
No ratings yet
Module 3 Lab Manual 5 Input Validation - Answer
9 pages
Kanpur12th Pass Students Data .....
No ratings yet
Kanpur12th Pass Students Data .....
50 pages
Upi TXN Count April 2022 Bankwise
No ratings yet
Upi TXN Count April 2022 Bankwise
143 pages
Process Acct Cust Name
No ratings yet
Process Acct Cust Name
34 pages
Market Development Report
No ratings yet
Market Development Report
121 pages
Big Data For Org
No ratings yet
Big Data For Org
10 pages
16423list DIRM Qualified Members
No ratings yet
16423list DIRM Qualified Members
84 pages
Karnataka Polling Station-Janaagraha
No ratings yet
Karnataka Polling Station-Janaagraha
233 pages
Security On Network
No ratings yet
Security On Network
12 pages
CRM Data - Empl Tie Ups - 00
No ratings yet
CRM Data - Empl Tie Ups - 00
20 pages
Doctor List
No ratings yet
Doctor List
4 pages
Untitled
No ratings yet
Untitled
44 pages
BlockMC Wise Report
No ratings yet
BlockMC Wise Report
182 pages
Jan To Jun Chassic No
No ratings yet
Jan To Jun Chassic No
19 pages
Sbi n19 20 Unpaid Data
No ratings yet
Sbi n19 20 Unpaid Data
448 pages
Customer Mobile Number Jalgaon Area
No ratings yet
Customer Mobile Number Jalgaon Area
12 pages
Placement Season 2020-21: Sector Company Company Designation
No ratings yet
Placement Season 2020-21: Sector Company Company Designation
2 pages
GSTR 2A Rashtradoot 2019 20
No ratings yet
GSTR 2A Rashtradoot 2019 20
124 pages
E Kyc PM Kisan All
No ratings yet
E Kyc PM Kisan All
336 pages
Sno Ugatrollno Enrollno Name Fname
No ratings yet
Sno Ugatrollno Enrollno Name Fname
28 pages
Car Kotak
No ratings yet
Car Kotak
15 pages
GD Topics & PI Questions - 2022
No ratings yet
GD Topics & PI Questions - 2022
44 pages
Yes Bank Pan India Pincode List 01122021 22
No ratings yet
Yes Bank Pan India Pincode List 01122021 22
362 pages
Virtual Bank Assistance: An Ai Based Voice Bot For Better Banking
No ratings yet
Virtual Bank Assistance: An Ai Based Voice Bot For Better Banking
9 pages
Presentation - Women Micro Bank
No ratings yet
Presentation - Women Micro Bank
16 pages
ECE-ND-2021-EC 2352-COMPUTER NETWORKS-260846343-80457(EC2352-PTEC2352-10144BME41-10144EC603)
No ratings yet
ECE-ND-2021-EC 2352-COMPUTER NETWORKS-260846343-80457(EC2352-PTEC2352-10144BME41-10144EC603)
2 pages
Assignment 2 - Frontsheet - Security
No ratings yet
Assignment 2 - Frontsheet - Security
26 pages
Catálogo V.2 OPTICTIMES SAS
No ratings yet
Catálogo V.2 OPTICTIMES SAS
38 pages
Downlink Throughput Troubleshooting
100% (2)
Downlink Throughput Troubleshooting
21 pages
Acer Aspire AOD255 Ddr3 PAV70 La-6421p
No ratings yet
Acer Aspire AOD255 Ddr3 PAV70 La-6421p
39 pages
Computational Thinking Assessment - Towards More Vivid Interpretations
No ratings yet
Computational Thinking Assessment - Towards More Vivid Interpretations
31 pages
Bala
No ratings yet
Bala
2 pages
Message Broker Interview Questions
No ratings yet
Message Broker Interview Questions
18 pages
S&S Lab Mannual Final
No ratings yet
S&S Lab Mannual Final
53 pages
ADM Pranit Micro
100% (1)
ADM Pranit Micro
28 pages
User Guide For OneDrive - 2019 07 24
No ratings yet
User Guide For OneDrive - 2019 07 24
11 pages
United States Patent (10) Patent No.: US 9.221,659 B2
No ratings yet
United States Patent (10) Patent No.: US 9.221,659 B2
26 pages
ECE Jubilation Attending Students List
No ratings yet
ECE Jubilation Attending Students List
5 pages
VAMP Arc Flash Detection PDF
100% (1)
VAMP Arc Flash Detection PDF
16 pages
ITCC in Riyadh Residential Complex J10-13300 16770-1 Voice Evacuation System
100% (1)
ITCC in Riyadh Residential Complex J10-13300 16770-1 Voice Evacuation System
15 pages
Arducopter
No ratings yet
Arducopter
16 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
10 pages
malharDL 2 .Ipynb - Colab
No ratings yet
malharDL 2 .Ipynb - Colab
6 pages
Kotlin Notes
No ratings yet
Kotlin Notes
89 pages
CISCO Identity Services Engine: 802.1X and TACACS+ Configuration Lab
No ratings yet
CISCO Identity Services Engine: 802.1X and TACACS+ Configuration Lab
59 pages
GDPR
No ratings yet
GDPR
16 pages
Golf
No ratings yet
Golf
42 pages
Dme 415-435 Presentacion
100% (1)
Dme 415-435 Presentacion
2 pages
Entrepreneurial Marketing
No ratings yet
Entrepreneurial Marketing
15 pages
Power System Analysis
No ratings yet
Power System Analysis
33 pages
CCNPv7 SWITCH - SBA Version A - STUDENT
0% (1)
CCNPv7 SWITCH - SBA Version A - STUDENT
5 pages
Lead Workplace Communication Edited
No ratings yet
Lead Workplace Communication Edited
27 pages
Workplace Team and Environment
No ratings yet
Workplace Team and Environment
22 pages

Alternate Data

Uploaded by

Alternate Data

Uploaded by

2 Alternative Data Analytics

Click to add text

DataMart Feature Store ML Models Use Cases

 Mobile Device  Seamless transformation of raw  ML Algorithms  Customer Profiling and

Raw Data Feature Classification Feature Store Predictive Model

Automated Feature Engineering

Data SMS Tagging Data Insights Feature Engg. Decisioning

 SMS classification to  Rules to extract  Roll-up of individual SMS Scoring Engine

Feature Generation and

 Usage of Computer Vision and NLP

Digitization of the input Transaction Classification

Bin 1 Bin 2 Bin 3 Bin 4

Feature 2 Bin 1 Bin 2 Bin 3 Bin 4

Important Feature Predictive Model

Alternate + Traditional Data Model

Approaches to combine Alternative and Traditional Data

Traditional Data Alternative Data

Data Category Feature Category ML Algorithms Scoring Engine

Call Location Demograp Income Spend

Contacts Device Fixed Social

Data Category Feature Category ML Algorithms Scoring Engine

Score Cut-Off Band Applications Defaults Gini = 40% Gini = 50%

1 10 8 5.7% 98.2% 0.1% 5.6% 98.2% 0.2%

3 4 Unstructured formats (e.g. SMS data), not suitable for saving in

You might also like