0% found this document useful (0 votes)
43 views92 pages

Hu Mit

This thesis aims to predict invoice payment delays using machine learning models. The author analyzes invoice payment data from a Fortune 500 company to build and train predictive models. Specifically, supervised learning models are used to detect which invoices are likely to have delayed payments and which customers may cause problems. The models can predict payment timeliness accurately and estimate delay magnitudes. This approach could enable customized collection actions by the firm. The thesis contributes by demonstrating how machine learning can help address the business problem of managing accounts receivable and improving cash flow.

Uploaded by

muss haile
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views92 pages

Hu Mit

This thesis aims to predict invoice payment delays using machine learning models. The author analyzes invoice payment data from a Fortune 500 company to build and train predictive models. Specifically, supervised learning models are used to detect which invoices are likely to have delayed payments and which customers may cause problems. The models can predict payment timeliness accurately and estimate delay magnitudes. This approach could enable customized collection actions by the firm. The thesis contributes by demonstrating how machine learning can help address the business problem of managing accounts receivable and improving cash flow.

Uploaded by

muss haile
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Predicting and Improving Invoice-to-Cash Collection Through

Machine Learning
by
Hu Peiguang
B.Eng. Hon., Engineering Science Programme
National University of Singapore, 2013
Submitted to the Department of Civil and Environmental Engineering and the
Department of Electrical Engineering and Computer Science
in partial fulfillment of the requirements for the degrees of
Master of Science in Transportation
U)-J
and
S Lii
Master of Science in Electrical Engineering and Computer Science 00z
0Fi#

at the 3
<0
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2015
@ 2015 Massachusetts Institute of Technology. All rights reserved.

A uthor ..........................
Signature redacted...
Departmnt of Civil and Environmental Engineering
Depart ment oElectrical Engineering and Computer Science
MVdaLy 2 2
01 5U1L
ignature redacted
,

Certified by...............
C
David Simichi-Levi
Professor of Civil and Environmental Engineering
,Thesis Supervisor
Certified by ............ Signature redacted.
Ksuman Ozdaglar
Profe ssor of Electrical ngineering and Computer Science
1 6 le'j Thesis Supervisor
Accepted by ............... 14 ignature redacted Heidi Nepf
Chair, Depaytmental Committee for Graduate Students
Accepted by . Signature redacted .............
.

' V ULeslie A. Kolodziejski


Chair of the Committee on Graduate Students
Predicting and Improving Invoice-to-Cash Collection Through
Machine Learning
by
Hu Peiguang

Submitted to the Department of Civil and Environmental Engineering


and the Department of Electrical Engineering and Computer Science
on May 20, 2015, in partial fulfillment of the
requirements for the degrees of
Master of Science in Transportation
and
Master of Science in Electrical Engineering and Computer Science

Abstract
Delinquent invoice payments can be a source of financial instability if it is poorly
managed. Research in supply chain finance shows that effective invoice collection is
positively correlated with the overall financial performance of companies. In this thesis
I address the problem of predicting the delinquent invoice payments in advance with
machine learning of historical invoice data. Specifically, this thesis demonstrates how
supervised learning models can be used to detect the invoices that would have delay
payments, as well as the problematic customers, which enables customized collection
actions from the firm. The model from this thesis can predict with high accuracy if
an invoice will be paid on time or not and also estimate the magnitude of the delay.
This thesis builds and trains its invoice delinquency prediction capability based on the
real-world invoice data from a Fortune 500 company.

Thesis Supervisor: David Simichi-Levi


Title: Professor of Civil and Environmental Engineering

Thesis Supervisor: Asuman Ozdaglar


Title: Professor of Electrical Engineering and Computer Science
To my parents; Rao Yongjun and Hu Zhizheng
Acknowledgments
I would like to express my deepest gratitude to my thesis supervisors, Prof. David

Simchi-Levi and Prof. Asuman Ozdaglar, for their invaluable patience, guidance and

support over last two years.

My plan to come to graduate school in this topic began many years back when I

was an undergraduate summer research student at Northwestern University, where I

was fortunate working with Prof. Daniel Abrams. I would like to thank him for en-

couraging me to pursue research of my interests even though it could have been (and

had been) a difficult and rewarding journey.

More personally, I am thankful to friends in MIT. Especially to Kim Yoo Joon, thank

you for listening to all my agonies in life and suggesting solutions. I am lucky to have

you as my classmate. To my roommate, Zhang Hongyi, for stimulating discussions

that helped me forming ideas for my thesis. To my lifelong friends, Huang Yijie and

Luo Feng, thank you for keeping me up-to-date on what lives outside of MIT are like.

Finally, to my parents, thank you for your love and support throughout this jour-

ney. None of this could have been possible without you and for that I am dedicating

this to you.
8
Contents

1 Introduction 15

1.1 Motivation & Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.2 Thesis Contribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Literature Review 21

2.1 Supply Chain Finance (SCF) . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 Account Receivable & Invoice Outcome . . . . . . . . . . . . . . . . . . 22

2.3 Account Receivable Management . . . . . . . . . . . . . . . . . . . . . 25

2.4 Business Analytics & Machine Learning . . . . . . . . . . . . . . . . . . 25

2.4.1 Credit Card Fraud Detection Model . . . . . . . . . . . . . . . . 26

2.4.2 Consumer Credit Rating . . . . . . . . . . . . . . . . . . . . . . 27

3 Problem Formulation 29

3.1 Invoice Outcome Definition . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1.1 Binary outcome case . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.2 Multiple outcome case . . . . . . . . . . . . . . . . . . . . . . . 31

4 Data and Pre-processing 33

4.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2 Preliminary Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2.1 Invoice delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

9
4.2.2 Payment terms . . . . . . . . . . . . . . . . . . . . . . 38

.
4.2.3 Invoice amount & delay . . . . . . . . . . . . . . . . . 38

.
4.3 Feature Construction . . . . . . . . . . . . . . . . . . . . . . . 41

.
4.3.1 Selection of information . . . . . . . . . . . . . . . . . 42

.
4.3.2 Two levels of features . . . . . . . . . . . . . . . . . . . 43

.
4.3.3 Extra information & unexpected features . . . . . . . . 46

.
4.3.4 A full list of features . . . . . . . . . . . . . . . . . . . 48

.
5 Analysis with Unsupervised Learning 51
5.1 Principle Component Analysis (PCA) . . . . . . 51

.
5.2 Clustering . . . . . . . . . . . . . . . . . . . . . 54

.
6 Prediction with Supervised Learning 59
6.1 Supervised Learning Algorithms . 60
.

6.2 Model Learning & Calibration . . . . .. . . . . .. . . .. . . . .. . 60


.

6.3 Model Outputs . . . . . . . . . . .. . .. . . . . .. . . .. . . .. . 62


.

6.3.1 Binary Outcome Case . .. . .. . . . . . .. . .. . . .. . 62

6.3.2 Multiple Outcome Case . .. . . .. . . . . .. . .. . . . .. 70

6.4 Results Analysis & Improvements . . . . . . . . . . . . . . . . . . . . . 78


6.5 Imbalanced Data & Solution . . . . .. . .. . . . . . .. . . .. . .. 80
.

6.5.1 Weighted Sampling . . . . . . .. . .. . . . . .. . . .. . . . 80


.

6.5.2 Cost-sensitive Learning . . . . . . . . . . . . . . . . . . . . . . 81


.

6.6 Model Robustness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83


.

6.7 Why Random Forests Works . . . . . .. . . .. . . . . .. . . .. . . 84


.

7 Conclusion 85
7.1 Summary . . . . . . . . . . . . . . . .. . . . .. . . .. . . .. . . 85
.

7.2 Future work . . . . . . . . . . . . . . .. . . . .. . . . .. . . .. . 86


.

10
List of Figures

1-1 Historical productivity growth rate versus the potential gain from big

data[4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 17

.
1-2 Typical order-to-cash (02C) process . . . . . . . . . . . . . . . . 18

.
3-1 Two cases of invoice outcome definition..... 30

4-1 Segmentation of invoices by delays . . . . . . . . . . . . . 36


.

4-2 Histogram of multiple invoice outcomes . . . . . . . . . . . 37


.

4-3 Histogram of delayed days of delayed invoices . . . . . . . 38

4-4 Sam ple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39


.

4-5 Invoice amount and delay or not . . . . . . . . . . . . . . . 40


.

4-6 Invoice amount and delay level . . . . . . . . . . . . . . . . 41


.

4-7 Average amount of delayed invoice versus delayed days . . . . . . . . 42


.

4-8 Construction of an integrated database of invoice and customer statistics

used in machine learning models . . . . . . . . . . . . . . . . . . . . . 43


.

4-9 Histograms of customers' total number of invoices . . . . . . . . . . . 45


.

4-10 Histograms of customers' average delay days . . . . . . . . . . . . . . 46


.

4-11 Histogram of delay ratio of customers . . . . . . . . . . . . . . ..... 47

4-12 Histogram of amount delay ratio of customers . . . . . . . . . . . . . 48


.

4-13 Distribution of binary indicators (features) of Month End and Half Month 49

5-1 Total variance versus the number of principle components . . . . . . . . 52

5-2 Principle component analysis of invoice features . . . . . . . . . . . . . 53

11
5-3 How to pick the right number of clusters in unsupervised learning . . . 54

5-4 Average distance to centroid (Within groups sum of squares, also called

WSS) versus number of clusters . . . . . . . . . . . . . . . . . . . . . . 55

5-5 Clustering of invoices into two classes, plotted with first two discriminant

components (DC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5-6 Clustering of invoices into four classes, plotted with first two discrimi-

nant components (DC) . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6-1 Supervised classification . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6-2 K-fold cross validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6-3 Choose the best decision tree complexity (cp) with 10-fold cross validation 63

6-4 Decision tree demo on binary case . . . . . . . . . . . . . . . . . . . . . 64

6-5 Training error versus number of tree growing in random Forests . . . . 65

6-6 Variable importance random Forests in binary outcome case . . . . . . 66

6-7 Margin (prediction certainty) cumulative distribution of the AdaBoost

algorithm on invoice binary outcome prediction . . . . . . . . . . . . . 68

6-8 Choose the best decision tree complexity (cp) with 10-fold cross validation 71

6-9 Decision tree algorithm demo on multi-outcome case . . . . . . . . . . 72

6-10 Visualization of decision tree in multiple outcome cases of invoice pre-

diction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6-11 Variable importance random Forests in multiple outcome case . . . . . 74

6-12 Margin (prediction certainty) cumulative distribution of the AdaBoost


algorithm on invoice multi-outcome prediction . . . . . . . . . . . . . . 75

6-13 Two types of errors in decision making . . . . . . . . . . . . . . . . . . 78

12
List of Tables

4.1 Information on a typical electronic invoice . . . . . . . . . . . . . . . . 34

4.2 Selected subset of invoice information . . . . . . . . . . . . . . . . . . . 43

4.3 The full list of features of invoices . . . . . . . . . . . . . . . . . . . . . 50

6.1 The prediction result of binary case with various machine learning algo-

rithm s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.2 Confusion matrix of decision tree in binary outcome case . . . . . . . . 64

6.3 Confusion matrix of decision tree in binary outcome case . . . . . . . . 66

6.4 Confusion matrix of decision tree in binary outcome case . . . . . . . . 67

6.5 Confusion matrix of logistic regression in binary outcome case . . . . . 67

6.6 Statistically significant invoice features in logistic regression of binary

outcom e case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.7 Confusion matrix of SVM in binary outcome case . . . . . . . . . . . . 69

6.8 The prediction result of multiple outcome case with various machine
learning algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.9 Confusion matrix of decision tree in multiple outcome case . . . . . . . 70

6.10 Confusion matrix of random Forests in multiple outcome case . . . . . 74

6.11 Confusion matrix of SVM in multiple outcome case . . . . . . . . . . . 75

6.12 Confusion matrix of logistic regression in multiple outcome case . . . . 76

6.13 Statistically significant invoice features in logistic regression of multiple

outcom e case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.14 Confusion matrix of SVM in multiple outcome case . . . . . . . . . . . 77

13
6.15 Measuring the performance of Random Forests . . . . . . . . . . . . . . 78

6.16 Class prediction accuracy of Random Forests in Binary Outcome Case . 79

6.17 Class prediction accuracy of Random Forests in Multiple Outcome Case 79

6.18 A comparison of prediction accuracy of Random Forests and Balanced

Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.19 Misclassification cost matrix C for cost-sensitive Random Forests . . . 81

6.20 A comparison of prediction accuracy of Random Forests and cost-sensitive


Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.21 Model prediction accuracy in data of different donths . . . . . . . . . . 83

14
Chapter 1

Introduction

We are entering the era of big data in the business analytics (BA) industry. For

example, Walmart handles more than 1M transactions per hour and has databases

containing more than 2.5 petabytes (2.5 * 101) of information. And it is estimate that,

by 2009, nearly all sectors in the US economy had at least an average of 200 terabytes

(2 * 1012) of stored data per company with more than 1,000 employees[1].

Business analytics is used to be the practice of iterative, methodical exploration of

an organizations data with emphasis on statistical analysis. It is used by companies

with systematic data collection procedure to make data-driven decisions. However,

over the last 15 years the business analytics field went through a great transition, as

the amount of data available increased significantly. [2]. Specifically, this deluge of data

calls for automated methods of data analysis, where researchers looked machine for

help.

Therefore, it is not surprising machine learning algorithm began to emerge as a

dominating force in the business analytics industry. According to Murphy [3], machine

learning is a set of methods that automatically detect patterns in data and then use

the uncovered data patterns to predict future data, or to perform decision making un-

15
der uncertainty based on knowledge from data.

Machine learning empower one's ability to transform the data into actionable knowl-

edge. McKinsey & Co, a consultancy, identified five broadly applicable ways to leverage
big data that offer transformational potential[4]:

" Creating transparency of business information

* Enabling experimentation to discover needs, expose variability, and improve per-

formance

" Segmenting populations to customize actions

" Replacing/supporting human decision making with automated algorithms

" Innovating new business models, products, and services

With these characteristics of business analytics, it is found a general pattern that

customers, consumers, and citizens benefit from the economic surplus enabled by data

- they are both direct and indirect beneficiaries of big-data-related innovation.[5] It is

also worth noticing that some sectors are expected to enjoy bigger gain when powering

by business analytics, as illustrated by the Figure 1-1.

Because of the abundance of data, computer and electronic product is no doubt

the leading sector for its strong productivity growth and potential gain from the use of

business analytics[4]. However, finance and insurance sector also posses huge potential,

as long as it could overcome the barrier of data collection and cleaning. For example,

insurance firms such as Prudential and AIG have worked on predictive models on health

risks, letting people applying for insurance avoid blood and urine testing, which the

insurance companies have to pay for[1].

16
* Cluster A Cluster D
Some sectors are positioned for greater gains from i Cluster B Cluster E
the use of big data Cluster C
Historical productivity growth in the United States, 2000-08
0 ubesizes denote
relaw szes of GDP
24.0
23.5
23.0
22.5C puter and eledronic products*
Information
9.0
3.5 Adrrnistraion, support, and b trA
ade
3.0 waste management Manufacturing

Professional services rentl

2.5 Retai rade


D ------ -- ------ -----------
--------- ------- -- G------t--- - -- ---------
-. 5Arts an entertain r
-1.0
-1.0 Managemnt ofconpanies
-1.5
-2-5 Other services Educational ser vwes
-3.0 Construdion
-3.5a
LOWN I High
Figure 1-1: Historical productivity growth rate versus the potential gain from big data[4]

1.1 Motivation & Objectives

This project focuses the finance sector, specifically aiming to manage accounts receiv-

able (AR) 1 more effectively, by predict the invoice payment outcomes when issuing

through data mining & machine learning algorithms.

For the modern companies, Order-to-Cash (02C) normally refers to the business

process for receiving and processing customer sales [6]. Although its number of steps

may vary from firm to firm, depending on the firm's type and size, a typical 02C process

can be illustrated as the work flow in Figure 1-2. In this project, we focus on the AR

collection (invoice-to-cash) part of this process, i.e. two highlighted steps of Figure 1-2.

The market volume of AR collection is huge. According to the statistics, the GDP

1
Money owed by customers (individuals or corporations) to another entity in exchange for goods
or services that have been delivered or used, but not yet paid for.

17
CusorerOrder Management op Credit Management ---- Order Fulfillment ---- customer Billing

Figure 1-2: Typical order-to-cash (02C) process

of the Canadian construction industry in 2012 is around $111 billion, and all of them

have been produced in the form of invoices[7]. In fact, AR has been the backbone of

modern corporate finance[8], a typical retailing company processes thousands of in-

voices for payment monthly.

We are interested in improving AR collection through machine learning for three

reasons. First of all, AR collection can easily be a source of financial difficulty of firms,

if not well managed. It is, therefore, of great interests to manage it more effectively.

Also, most of the AR collection actions nowadays are still manual, generic and ex-

pensive [9]. For instance, it seldom takes into account customer specifics, neither has

any prioritizing strategies. Lastly and most importantly, commercial firms now are

accumulating large amount of data about their customers, which makes the large-scale

data-driven AR collection possible.

1.2 Thesis Contribution

The main contribution of this thesis is it demonstrates that how to make accurate

predictions on invoice payments based on historical data, which includes:

e A new approach of detecting and analyzing invoice payment delay right at issuing

based on historical data

18
" A self-learning tool identifying problematic customers

" A detailed comparison of various state-of-art machine learning algorithms appli-

cation on invoice payment prediction.

" A comprehensive analysis on tailoring machine learning algorithm, Random Forests,

into solving the invoice prediction problem.

" An intuitive framework visualizing the invoices payments and their customers.

19
1.3 Thesis Outline

This thesis is structured as seven chapters given in their chronological order. The

remaining parts are organized as follows:

" Chapter 2 reviews the related literature, in both the fields of supply chain finance

and business analytics. It presents the previous works on invoice aging analysis

and account receivable management, as well as how machine learning has been

applied in similar fields of invoice payment prediction.

" Chapter 3 formulates the invoice payment prediction, a business case, into an

engineering problem. It shows how predict the invoice payment fits into the

supervised learning of artificial intelligence.

* Chapter 4 presents how the data is processed in this project, as well as preliminary

statistical analysis, which provides an overview of the data.

" Chapter 5 is our analysis of the dataset with the techniques of the unsupervised

learning.

" We built and calibrated the invoice payment prediction model in Chapter 6, as

well as comparing the performance of various machine learning algorithms in this

particular problem of invoice payment prediction.

" Chapter 7 presents the conclusion and gives some ideas of further work.

20
Chapter 2

Literature Review

Applying machine learning models to make predictions in business contains two parts:

formulating a quantitative model for the business problem and tailoring the machine

learning algorithms on the formulated model. Although little work has been done on

predicting the outcomes of business invoices, there exits large amount of literature in

business on supply chain finance (Section 2.1) and accounts receivable age analysis

(Section ??). Also, in the remaining parts of this section, I review the classic machine

learning, namely supervised classification, algorithms in Section6 and application ex-

amples in Section 2.4.2 and Section 2.4.1.

2.1 Supply Chain Finance (SCF)

Supply chain finance refers to a set of business and financing processes that connecting

various parties in a transaction, like the buyer, seller and financing institution, to lower

financing costs and therefore improved business efficiency. [10].

One of the typical practices of Supply chain finance (SCF) is to provide short-term

credit that optimizes cash flow of both sides of a transaction[11]. It usually involves

the use of a technology of automating transactions and tracking the invoice approval

21
and settlement process from initiation to completion. The growing popularity of SCF

has been largely driven by the increasing globalization and complexity of the supply

chain, especially in industries such as automotive, manufacturing and the retail sector.

According to Hofmann[10], there are various types of SCF transactions, including

buyers Accounts Payable terms extension and payables discounting. Therefore, SCF

solutions differ from traditional supply chain programs to enhance working capital in

two ways:

" SCF links transactions to value as it moves through the supply chain.

" SCF encourages collaboration between the buyer and seller, rather than the com-

petition that often pits buyer against seller and vice versa.

In the example given by Pfohl[11], for any financial transaction, usually the buyer

will try to delay payment as long as possible, while the seller wants to be paid soon.

SCF works well here when the buyer has a better credit rating than the seller and

can therefore take capital at a lower cost. Then the buyer can leverage this financial

advantage to negotiate better terms from the seller such as an extension of payment

terms, which enables the buyer to conserve cash or control the cash flow better. The

seller benefits by accessing (sharing) cheaper capital, while having the option to sell

its receivables to receive immediate payment.

2.2 Account Receivable & Invoice Outcome

Accounts Receivable (or invoice collection) is long regarded as one of the most essential

parts of the supply chain finance and companies' financial stability [12] [13] [14].

There are many metrics used to measure the collection effectiveness of a firm[13].

One of the most basic matrices is the Collection Effectiveness Index (CEI), which is

defined as:

22
Beginning Receivables + (Credit Sales/N) - Ending Total Receivables x
100
Beginning Receivables + (Credit Sales/N) - Ending Current Receivables

where N is number of months or days.

CEI mostly measures the number and ratio of the invoice collection in a certain

time. While Average Days Delinquent (ADD) measures the average time from invoice

due date to the paid date, i.e., the average days invoices are overdue. A related metric,

Days Sales Outstanding (DSO), expresses the average time in days that receivables are
outstanding, which is:

Ending Total Receivables x Nubmer of Dyas in PeriodAnalyzed


Credit Sales for PeriodAnalyzed

DSO helps to determine the reason of the change in receivables, is it due to a change

in sales, or to another factor such as a change in selling terms? One can compare

the days' sales outstanding with the company's credit terms as an indication of how

efficiently the company manages its account receivables.

There also exists upgraded versions of the DSO[15], such as the Sales Weighted

DSO (SWDSO), which is expressed as:

SWDSO = {($ in Current Age Bucket/Credit Sales of Current Period)


+ ($ in 1 - 30 Day Age Bucket/Credit Sales of one month prior)
+ ($ in 31 - 60 Day AgeBucket/Credit Sales of two months prior)
+ (etc.)} x 30

SWDSO also measures the average time that receivables are outstanding. However,

23
it is an improvement as it attempts to smooth out the bias of credit sales and terms

of sale[10]. It also gives a guidance on how the industry segments the different aging

periods of an invoice.

Gunasekaran [12] argues that, in the performance measurements and metrics in

SCM, the effectiveness of delivery invoice methods and number of faultless delivery
notes invoiced are two of the top 5 important measures, ranking by the survey ratings.

However, the commonly used metrics in invoice outcome measurement is still func-

tion of the time taken to collect on invoices.

As Zeng points out[6], if one can predict this type of outcome (payment overdue

time) of an invoice, he or she could use this information to drive the collection process
so as to improve on a desired collection metric. For example, if one can identify invoices

that are likely to be paid lately at the time of issuing, one can attempt to reduce the

time to collection by taking actions, like calling or sending a reminder to the customer.

Furthermore, even after an invoice is past due, it is beneficial to know which invoices

are likely to be paid sooner than later, if no action is taken. One should always pay

more attention to the invoice with potential long delayed payments.

24
2.3 Account Receivable Management

There are a number of software companies offering solutions package for order-to-cash

management, especially for account receivables. Examples from the big corporations

are Oracles e-Business Suite Special Edition Order Management and and SAPs Order-

to-Cash Management for Wholesale Distribution. There are also small & medium size

companies offering account receivable management softwares for middle-size customers.

Such type of softwares include WebAR and NetSuite[16].

Oracles solution provides information visibility and reporting capabilities. SAPs so-
lution supports collections and customer relationship management. And both WebAR

and NetSuite specialize in debt collection, which offer a platform to process payments

and manage accounts receivable[6]. To our best knowledge, none of such softwares or

solutions incorporates analytics capacity, especially the ability of predicting the out-

come of the invoice, although they much have accumulated a large amount of invoice

data.

2.4 Business Analytics & Machine Learning

Machine learning modeling approaches haven't been applied to invoice prediction and

collection very much yet, except the work of Zeng[6], which developed a decision-tree

based model and test in several datasets.

However, the machine learning algorithms, especially the state-of-art ones, are now

widely used in a number of other related fields, such as credit card transaction fraud

detection[17], dynamic pricing of online retailing[18], credit risk management[19] and

tax collection[20].Among them, both credit card transaction fraud detection and credit

risk modeling have attract a number of researchers and there are quite a few literatures

available.

25
2.4.1 Credit Card Fraud Detection Model

Detecting credit card fraud has been a difficult and labor-intense task without machine

learning. Therefore, the credit card fraud detection model has become of significance,

in both academia and industry. Although they are not called machine learning models

in the first place, these models are mostly data-driven or statistics-based.

Ghosh, Reilly [21] used a neural network based model to learn and detect credit

card account transactions of one card issuer. Comparing with the traditional rule-based

fraud detection mechanism, the model detected more fraud accounts with significantly
fewer false positives.

Hansen, McDonald, Messier, and Bell [22] developed a predictive model to for man-

agement fraud based on data from an accounting firm, which accumulated data in its

business. The approach included the logit model, which shared the similarity with lo-

gistic regression. The models showed a good predictive capability for both symmetric

and asymmetric cost assumptions of fraud detection.

Hanagandi, Dhar and Buescher [23] built a fraud score model based on historical

credit card transactions data. The work was based on a fraud/non-fraud classification

methodology using a radial basis function network with a density based clustering ap-

proach, which gave a fair prediction result on out-of-sample test.

Dorronsoro, Ginel, Sgnchez and Cruz [24] developed an online learning model, which

detected credit card transaction fraud based on a neural classifier. They also incorpo-

rated the discriminant analysis in the classification model. Their system is now fully

operational and currently handles more than 12 million operations per year with fine

results.

26
Shen, Tong and Deng [17] tested different classification methods, i.e. decision tree,
neural networks and logistic regression for the credit card transaction fraud detections.

And they further provided framework to choose the best model among these three

algorithms for the credit card fraud risk management. They also showed that neural

networks and logistic regression usually outperform decision tree in their case and

dataset.

2.4.2 Consumer Credit Rating

Another field of interests of business analytics application is the consumer credit risk

modeling.

Consumer spending is one of the important drivers of macroeconomics and system

risk of finance. Even since the business began, decisions in the consumer lending busi-

ness largely rely on the data and statistics[25]. These traditional models are usually

used to generate a "score" for each customer and provide a baseline for the lending

business.

Recently, there are increasing amount of research focusing on applying machine

learning methods into predicting consumer default and delinquency behavior. Be-

cause, as argued by Khandani et al.[19], the machine learning models are perfectly

fit for the credit rating modeling because of "the large sample sizes of data and the

complexity of possible relationships among consumer transactions and characteristics".

There were early work on applying algorithms like Neural networks and support vec-

tor machines (SVM) into the problem of predicting corporate bankruptcies[26][27] [28].

Atiya [26] proposed several novel features for the corporate financial stability for its

neural network model and outperformed the traditional scoring model. And Shin te

al. [28] demonstrated that support vector machine (SVM) is also a good candidate

27
algorithm, as it has good accuracy and generalization even the training sample size is
small. At last, Min et al.[27] explained the importance of applying cross-validatoin in

choosing the optimal parameters of the SVM and showed a good performance of SVM

in their dataset.

Huang et al.[29] provided an excellent survey on how machine learning methods

have been applied into assess consumer credit risks.

Kim [30] compared the neural network approach in bond rating with linear regres-

sion, discriminant analysis, logistic analysis, and a rule-based system, on the dataset

from Standard and Poors. It was found that neural networks achieved better perfor-

mance than others in prediction accuracy on six rating categories.

Maher and Sen [31] also compared the performance of neural networks on bond-

rating prediction with that of logistic regression.With the data from Moodys and Stan-

dard and Poors, the best performance came from neural network and it was around

70%.

Galindo and Tamayo [32] did a comparative analysis of various statistical and ma-

chine learning modeling methods of on a mortgage loan classification. They found

that that CART decision-tree models gave best prediction for default with an average

91.67% hit rate for a training sample of 2,000 records. Other models they have studied

included K-Nearest Neighbor, neural network and probit models.

28
Chapter 3

Problem Formulation

In this thesis, we ask two questions about a new business invoice when giving instances

of historical invoices and outcomes:

" Would the invoice payment delay or not?

" If it would delay, how long the delay would be?

To answer these two questions, we are going to build a classification model that

identifying to which of a set of outcome categories a new invoice belongs, on the basis

of a training set of data containing instances whose outcome is known.

We formulate the invoice outcome prediction task as a supervised learning prob-

lem: given instances of past invoices and their outcomes, build a model that can predict

when a newly-issued invoice will be paid, if no advanced actions are taken.

And this model shall help us understand the characteristics of delayed invoices and

problematic customers. In other words, it doesn't not only identify the payment delay,

but also evaluate the customers.

29
3.1 Invoice Outcome Definition

Usually in supervised learning, each instance, which is the invoice here, is a pair con-

sisting of an input object (typically a vector) and a desired output value (also called

the supervisory outcome).

In the case of invoice analytics, the input object is all relevant information except

the payment result of one invoice, while the desired output value is the payment result

of the invoice, which we called outcome of the invoice in this project.

Although there are various matrices to measure the outcome of invoice payment,

like Days Sales Outstanding or Collection Effectiveness Index. In our case invoices are

labeled based on the actual amounts collected in a specified time period.

Also, as we are interested in more than whether the invoice is going to delay or not,

we shall define the outcomes for two cases, like shown in Figure 3-1.

Figure 3-1: Two cases of invoice outcome definition

30
3.1.1 Binary outcome case

In this case, we only want to know whether a newly-issued invoice is going to be paid

lately or not. The problem therefore becomes a binary classification problem: we sim-

ply need to classify the given set of invoices into two groups, no-delay invoices and

delay invoices.

Accordingly, an important point is that the two groups are not symmetric rather

than overall accuracy, the relative proportion of different types of errors is of interest.

For example, a false positive (detecting a delay when it is no-delay) is considered


differently from a false negative (fail to detect the delay when it is actually going to

delay). And also, we may care about the balance of the dataset in terms of two groups,

which shall be discussed in Section 4.2.1.

3.1.2 Multiple outcome case

The purpose of setting multiple (more than two) outcomes, as shown in Figure 3-1, for

one invoice is to determine the delay level of the invoice. As shown in the figure, we

set the outcome to be four classes:

1. No delay

2. Delay within 30 days

3. Delay between 30 to 90 days

4. Delay more than 90 days

It is worth noticing that, these four classes are commonly used in the invoice col-

lection business, where each class corresponds to a customized collection strategy [33].

And it does not necessarily have a balanced numbers of invoices in each classes.

31
32
Chapter 4

Data and Pre-processing

The data of this project comes from a Fortune 500 firm, which specializes in oilfield

services. It operates in over 90 countries, providing the oil and gas industry with

products and services for drilling, formation evaluation, completion, production and

reservoir consulting. And at the same time, it generates a large amount of business

invoices all over the world.

The author has been able to access the databases of the company's closed invoices

for three consecutive months, September to November of 2014, with around 210,000

invoices in total.

The research was first conducted on the September data, as it is the only available

dataset at the very beginning of this project. And when the data of October and

November came, the author already has a machine learning model, which was further

tested and calibrated with the new data.

The author presents the a detailed description and preliminary analysis on the

September data in Section 4.1 and Section 4.2. Data of the other two months are very

similar, which shall be analyzed in the model robustness part.

33
4.1 Data Description

An invoice a commercial document issued by a seller, which is the oilfield service firm
here, to a buyer, relating to a sale transaction and indicating the products, quantities,

and agreed prices for products or services the seller had provided the buyer.

Name Meaning
Customer Number
Customer Name
Document Number
Reference Reference number in the database
Profit Center
Document Date Invoice generating date
Posting Date Invoice posting date
Document Currency Amount of the invoice
Currency Currency of the invoice amount
User Name
Clearing Date Invoice clearing date
Entry Date Invoice closing date
Division
Group
Payment Term The "buffer" time of payment after the invoice issuing
Credit Representative
Table 4.1: Information on a typical electronic invoice

A typical electronic invoice issued by the company contains information in Table 4.1.

It reveals essential information about the deal between the buyer and the seller, as well

as the invoice collection mechanism of the buyer, like the profit center and the division.

It is important to notice that, all the invoices given in the format of Table 4.1, have

been closed. In other words, the buyers have collected all the amount of the invoices

and put that into the database. And when we are talking about the data of different

month (September, October or November, 2014), it refers to the invoices closed in that

month - it could be issued in any time before or within the close date.

34
Also, although there are lots of interpretations of "Payment Term", which usually

refers to the "discounts and allowances are reductions to a basic price of goods or

services", it represents in how many days the seller is expected to pay since the invoice

issuing.

35
4.2 Preliminary Analysis

In this section, the author presents preliminary analysis on various dimensions of the

invoice dataset, mostly with traditional statistics tools.

Invoice delay segmentation

DeFgyd : ondeidayed
...............................................................................

Figure 4-1: Segmentation of invoices by delays

Some statistical facts of the dataset of September invoices:

* There are totally 72464 invoices in the database, 73% of them are paid lately

(delayed invoices)(See in Figure4-1).

* There are 4291 unique customers, which means average 17 invoices per customer

in that month.

e For the multi-outcome segmentation based on delay days, the distribution of

different classes is shown in Figure 4-2.

4.2.1 Invoice delay

Non-delayed invoices are all alike; every delayed invoice is delayed in its own way. The

average delayed dates of the delayed invoices are around 27 dates. However, out of the

36
60

50

40

20
0-

20

No delay 1-30 31-90 >90


Delay Days
Figure 4-2: Histogram of multiple invoice outcomes

73% delayed invoices, the actual delayed dates are very different, as shown in Figure 4-3.

The distribution of the delayed dates are very similar to the famous power law

distribution[34]. It shows there are large amount of delayed invoices only delayed for

less than a short period, like 15 days. However, there also exits a long tail of the

distribution, which represents the problematic invoices with very long delays.

This shall

37
14000

12000

10000

~8M0

E 6000
z

4000

2000

00 50 100 150 200 250 30O


Days of delay

Figure 4-3: Histogram of delayed days of delayed invoices

4.2.2 Payment terms

Another interesting information of an invoice is its payment term. As mentioned before,

the payment term in this database means the "buffer" time of payment after invoice

issuing. It is not easy to know how the seller assign the payment term for each invoice,

as it may be part of the business negotiation. However, a glimpse of the payment term

distribution in Figure 4-4 can help one better understand the invoice data.

It is found in the database that, the set of possible payment terms is {30 60 10 45

90 180 35 0 120 21 50 30 60 70 75 42 20}. It is quite clear from the Figure 4-4 that
most of the invoices have standard payment terms: 30 or 60 days.

4.2.3 Invoice amount & delay

One naive hypothesis on the reason of invoice delay is the invoice amount: the higher

invoice amount, the more likely it is going to delay. In other words, is it true that, for

the purpose of financial stability, buyers will delay the payment of invoice with large

38
7 X104

5-

0
0 20 40 60 80 100 120

Figure 4-4: Sample

amount?

To answer this question, we first use the box plot to analyze the trends. As men-

tioned in Section 3, there are two cases of outcomes for the invoice delay. The binary

outcome case is simply asking delay or not and is plotted in Figure4-5, while the mul-

tiple outcome case, which is asking how the delay would be, is plotted in Figure4-6.

To further verify the intuition from Figure 4-5 and Figure 4-6, we then plot the

average amount of the delayed invoices versus their delayed days in Figure 4-7. It is

clear that there is no obvious correlation between invoice amount and invoice delays in

this figure.

39
x 10 8
7
+

+
5
Q)
(.)
"(5
-~ 4
Q) +
£
0
c:3
:::J
0
+
E
<t:
2 +
+

'
No-delayed
Invoice outcome

Figure 4-5: Invoice amount and delay or not


Delayed

It also reminds us, it is hard to use a single variable to predict the delay of the
invoice. We shall collect more information of the invoice and adopt more advanced
models to understand and predict the invoices.

40
8
x10
7

6-

0
0
> 4-
3
+

+
0
E+
<2+

3I
1 + +:

+
0

No-delayed 1-30 31-90 >90


Invoice outcome (delay)

Figure 4-6: Invoice amount and delay level

4.3 Feature Construction


In the field of statistical learning, a feature (of an instance) is an individual measurable
property of a phenomenon being observed. Features are usually numeric, but categro-
cial or structural features are also possible, such as strings and graphs are used in
syntactic pattern recognition. One correlated concept of "feature" is the explanatory
variable used in regression.

In the case of business invoices, the initial set of raw features are the information
present in the previous section. However, there are three problems with raw features:

" Some of the information can be redundant and too large to manage.
" Some categorical information has been stored in numerical formats.

41
60000

I
50000

40000
7's ,

-
0 IV"'
E
i 30000
7./. 4,

/5,'~ K/K ~/fr- '/ ;W/f<I> ~K 5,

20000 4/T

~' '.~/'~ ~ ~K /'*~ ex'.


5, 7.5K~5,fr../K /.5/ .. ,5,/4.~Y"A
10000
j< ~
Kh A .4.'."
0
1-15 16-30 31-45 46-60 61-90 91-120 121-180 181-365 Over 365
Delayed days

Figure 4-7: Average amount of delayed invoice versus delayed days

e Customer information is not enough, but can be extracted.

Therefore, as shown in Figure 4-8, a preliminary step in the applications of machine


learning into the invoice delay prediction is to select a subset of features and construct
a new and reduced set of features to facilitate learning, at the same time, to improve
generalization and interpretability of the prediction result.

4.3.1 Selection of information

The first step of invoice data preprocessing is to select a subset of information from
the database. It is equivalent with asking, given one invoice, what information on it
might be relevant to its payment? The subset is shown in Table 4.2.

Basically, the subset in Table 4.2 keeps the amount, the owner and the dates of the
invoice, as well as the handler. It contains almost all the information of one invoice,
except the product or service, which unfortunately is not available due to data privacy.

42
Figure 4-8: Construction of an integrated database of invoice and customer statistics used in machine learning models

Name Meaning
Customer Number
Document Date Invoice generating date
Posting Date Invoice posting date
Document Currency Amount of the invoice
Clearing Date Invoice clearing date
Entry Date Invoice closing date
Division
Payment Term The "buffer" time of payment after the invoice issuing
Credit Representative
-

Table 4.2: Selected subset of invoice information

The payment term has also been kept, because it is crucial on see if the invoice is

delayed or not, as one shall see later.

4.3.2 Two levels of features

One may realize that, the subset, as in Table 4.2, provides limited information, espe-

cially on the customer that the invoice belongs to. However, to understand and predict

the payment of invoices, one needs to know more about the characteristics of the one

pays the invoice. In this case, the payer is the customer and which customer the invoice

43
belongs to contains a large amount of information of what will happen on this invoice

payment.

That's why there should be two levels of features of one invoice: invoice level and

customer level.

Invoice level features refer to the amount, the payment term, the division and the

various dates of the invoice. At the same time, the project aggregates the historical

invoices for each of the customers and builds a profile accordingly. The customer profile

then become the customer level features of certain invoice.

For one customer, its historical invoice data, even only for one month, can lead to

a rich profile with various characteristics. Some of the elements of the customer profile

include:

1. Number of paid invoices

2. Number of delayed invoices

3. Total amount of paid invoices

4. Total amount of delayed invoices

5. Delay ratio (Ratio between 1 & 2)

6. Delay amount ratio (Ratio between 3& 4)

7. Average payment term

8. Average delayed days

9. ...

They are mostly statistical facts about one customer and customer level features of

the invoice in the machine learning application.

As shown in Figure 4-9 and Figure4-10, the total number of invoices of customers

is similar to power law distribution: they are a large number of customer have small

44
number of invoices, but there is a long tail of customers with huge amount of invoices.

The distribution of customers' delayed days is similar - most of guys only delayed for

a short period of time, while a few delayed for really long time.

4500
-

4000

3500

3000

2500

2000

1500

1000

500

0
0 500 1000 1500 2000 2500 3000

Figure 4-9: Histograms of customers' total number of invoices

Another interesting dimension of the customer is its delay ratio, which is the num-

ber of delayed invoice over the number of paid invoices, as shown in Figure 4-11. It

reveals the customer's payment record: if one has a delay ratio near zero, this customer

is a "good" customer - it pays every bill within the payment term. And the pattern

in Figure 4-11 tells what kind of customers the machine is facing: there are large num-

bers of good and bad customers, but very few in between. In other words, if randomly

picking one customer out the database, it is very likely it has an extreme delay ratio.

The pattern is even more obvious if we look at Figure 4-12, which plots histogram

45
4500

4000

3500

3000

2500

2000

1500

1000

500

0
0 500 1000 1500 2000 2500 3000

Figure 4-10: Histograms of customers' average delay days

of the amount delay ratio 1 of each customer.

Therefore, the invoice collection mechanism is dealing with extreme count-parties

- they are either very well or very badly behaviored. For applying machine learning to

solve this problem, it is both a challenge and an opportunity.

4.3.3 Extra information & unexpected features

In the machine learning community, extracting or selecting features is regarding as a

combination of art and science. The subset selection and double-level feature structure

in previous section has done the "science" part. This section introduces the "art" part

of feature construction of invoices.

'Ratio between the customer's total amount of delayed invoice over its total amount of paid invoices

46
-i
2500 I I I I I I I I I

2000

E 1500
0
(0

E 1000
z

500

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


Delay Ratio

Figure 4-11: Histogram of delay ratio of customers

As the machine is supposed to detect the pattern of the invoice payments, extra

information on the invoice could come from the business logic behind the invoice pay-

ments and financial stability. One element of the financial stability is the stable cash

flow, especially at the end of the month, when the firm pays the salary and other fees.

Therefore, if one invoice is due at the month end, it might increase its change of delay.
We define a binary variable IME as follows:

1 if the invoice is due with three days of the month end


'ME 10 otherwise

There are 28.79% of the invoices are due at the month end (See in Figure 4-13). It

is quite surprising considering the narrow range of "month end" - only three days!

47
I
4000 4000 I I I
I I I I
I
II
I
I
I

3500
-

3000
-

0 2500 I-
E
0

,2000
0

E
z= 1500 l-

1000 1-

500

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Delay Amount Ratio of Invoice

Figure 4-12: Histogram of amount delay ratio of customers

Similarly, we also build a binary indicator IHM such as:

1 if the invoice is due with first half of the month end


IHM ~~
0 otherwise

It turns out there are more invoices due at the second half of the month, as shown
in Figure 4-13.
0

4.3.4 A full list of features

The full list of the features used in the machine learning algorithms is shown in Ta-
ble 4.3, there are fourteen of them.

48
6 x10

4.5
5
4
3.5

3
0 0

Z aS
C 2.5-
S3

E E 2
z 2
z
1.5

0.5

0.5 0 0.5 1 1.5 -0.5 0 .5 1 1.5


Month End indicator Hatf Month indicator

Figure 4-13: Distribution of binary indicators (features) of Month End and Half Month

It is worth noticing that, all the features are generated from the one-month data of

business invoices.

49
Feature Explanation
past-due The amount of the invoice
due-date-end Month end indicator
middle-month Middle month indictor
no invoice Total number of invoice of the invoice owner
no-delay Total number of delayed invoice of the invoice owner
suminvoice Total sum of invoice amount of the invoice owner
sum-delay Total sum of delayed invoice amount of the invoice owner
ave-delay Average delay of the delayed invoices of the invoice owner
ave-invoice Average delay of all the invoices of the invoice owner
ratio-no Ratio of no-delay and no invoice
ratio-sum Ratio of sum-delay and sum invoice
avebuffer Average payment term of the invoice owner
div Division of the invoice
sale-rep Sales representative of the invoice
Table 4.3: The full list of features of invoices

50
Chapter 5

Analysis with Unsupervised


Learning

Before applying supervised learning to do the invoice classification, it looks very inter-

esting to take a look at how much all these invoices with different outcomes actually

differ with each other. The way to do that is through the unsupervised learning, like

clustering and principle component analysis (PCA).

5.1 Principle Component Analysis (PCA)

We then apply the other unsupervised learning method, principal component analysis,

to understand the unlabeled data.

In our case, PCA analysis uses the orthogonal transformation to convert the set

of invoices of possibly correlated features into a set of values of linearly uncorrelated

variables called principal components (PC). The number of principal components is

less than or equal to the number of original variables. This transformation is defined

in such a way that the first principal component has the largest possible variance (ac-

counts for as much of the variability in the data as possible), and each succeeding

51
0

0)0

')
C-)

0)0

C) 0

~c0 0

I I I I I I I I I I
1 2 3 4 5 6 7 8 9 10

Figure 5-1: Total variance versus the number of principle components

component in turn has the highest variance possible under the constraint that it is

orthogonal to (i.e., uncorrelated with) the preceding components[35].

We plot the variance versus number of PC in Figure 5-1. It shows we need at lease

five PCs to account most of the variability in the data.

And the first two PCs only account for 22.4% and 20.1% of the variance. With only

these two PCs, one is unable to separate the two groups of invoices well, as shown in

Figure 5-2.

52
~+- No-delay + Delay

15

ca
10

C)J

CL -w asmms

cAi,

0 5 10 15 20
PC1 (22.4% explained var.)
Figure 5-2: Principle component analysis of invoice features

53
5.2 Clustering

We also apply clustering to group a set of instances in such a way that instances in

the same group (called a cluster) are more similar (in some sense or another) to each

other than to those in other groups (clusters).

Usually, the similarity between two instances are represented by the feature vector

distance. And therefore, to find the best number of clusters, we want the average intra-

distance of the cluster to be small, but the inter-distance to be large. One intuitive

way of finding the number of clusters, k, is shown in Figure 5-3:

" Try different k, looking at the change in the average distance to centroid, as k

increases.

* Average falls rapidly until right k, then changes little.

t Best value
Average kf k
distance to
centroid

Figure 5-3: How to pick the right number of clusters in unsupervised learning

In the invoice case, we are asking, if ignoring the known outcomes of the instants,

how many different types of invoices are there? And how good we can cluster them

through the features? Is the optimal number of grouping/clustering them same with

the number of classes we assigned in the Section 3.1?

54
0

Coo
C,
0
C)
C- LO
Co CO

E
o
Co oo
0
0) C)
oO

LO

2 4 6 8 10
Number of Clusters
Figure 5-4: Average distance to centroid (Within groups sum of squares, also called WSS) versus number of clusters

Therefore, we plot the average distance to centroid versus number of clusters in


Figure 5-4. Ideally, we shall see one "critical" point on k = 2, if the delay and on-time
invoices are two very different groups. However, it shows there is no obvious optimal
number of clusters in the unsupervised learning here, as the slope of the curve does
not change abruptly in any k.

We could certainly cluster the invoice data into two or four groups without knowing
the actual outcomes, which is visualized in Figure 5-5 and Figure 5-6.

55
LO 2 2

2
222

1 2 2 2 2 2
S1 11
1 2 22

0 5 10 15 20
dc 1
Figure 5-5: Clustering of invoices into two classes, plotted with first two discriminant components (DC)

56
44
o - 4

fl)
I

0 10 20 30 40
dc 1
Figure 5-6: Clustering of invoices into four classes, plotted with first two discriminant components (DC)

57
58
Chapter 6

Prediction with Supervised


Learning

The task we formulated in Section3 is a typical supervised classification problem: given

a set of data instances (invoices) expressed by a set of features and class labels (out-

puts), build a model that classifying a new invoice into two (or four) outcomes (6-1).

Figure 6-1: Supervised classification

For the given dataset, as shown in Figure 6-1, we divided it into two parts, training

59
set and test set:

* Training Set: 80% of the data, used to train and calibrate the data.

" Test (Prediction) Set: 20% of the data, the out of sample part, which is used

specifically to test the performance of the calibrated model.

In other words, we shall use the training data to teach the machine different types

of invoices, and then use the test day to simulate the new coming data.

6.1 Supervised Learning Algorithms

We applied following classification algorithms for this dataset:

" Classification tree[36]

" Random Forests[37]

" Adaptive boosting[38]

" Logistic regression[39]

" Support vector machine (SVM) [40]

6.2 Model Learning & Calibration

Learning, here and in other sections, were run using 10-fold cross validation. One

round of 10-fold cross validation involves partitioning the training data into 10 com-

plementary subsets, performing the analysis on 9 subset (called the training fold), and

validating the analysis on the other subset (testing fold). To reduce variability, multiple

rounds of cross-validation are performed using different partitions, and the validation

results are averaged over the rounds.

We use the cross validation result to choose the right parameter for our machine

learning model, before we test it in the testing set.

60
*
k folds (all instances)

* :1
*1 fold
II S SI

t 2f
U

,n 3
S

testing fold
k
Figure 6-2: K-fold cross validation

Also, as a point of reference, we also report the accuracy of the majority-class

predictor, i.e., a classifier that always predicts the class most represented in the training

data. We refer to this as the Baseline.

61
6.3 Model Outputs

We now turn to the prediction result of various learning models.

6.3.1 Binary Outcome Case

The general results are shown in Table 6.1.

Model Out of Sample Prediction Accuracy


Decision Tree 0.861
Random Forests 0.892
AdaBoost 0.863
Logistic Regression 0.864
Support Vector Machine (SVM) 0.869
Table 6.1: The prediction result of binary case with various machine learning algorithms

Decision tree We start the surprised learning with the most intuitive method - de-

cision tree.

Decision tree learning uses a decision tree as a predictive model which maps obser-

vations about an item to conclusions about the item's target value[36]. The general

procedure of applying decision tree to make supervised classification includes two steps:

1. Grow the decision tree

2. Prune the grown decision tree

In the first step, we build a tree-style decision model with different complexity pa-

rameter (cp), which controls the size and levels of the decision tree. And then, we

could the optimal cp by looking at Figure 6-3, which is generated by cross validation.

It turns out, cp = 0.016 is the best decision tree complexity parameter for this problem.

We then prune the grown tree with cp = 0.016. And the pruned tree, also the one

we used for prediction, is shown in Figure 6-4.

62
The detailed description of the decision process is:

1) root 60000 15685 1 (0.2614167 0.7385833)

2) delayratio< 0.65379 20136 6701 0 (0.6672130 0.3327870)

4) delay-ratio< 0.35322 11221 2126 0 (0.8105338 0.1894662)

*
5) delay-ratio>=0.35322 8915 4340 1 (0.4868200 0.5131800)

10) delayratio< 0.50532 3911 1694 0 (0.5668627 0.4331373)

11) delay-ratio>=0.50532 5004 2123 1 (0.4242606 0.5757394)

3) delay-ratio>=0.65379 39864 2250 1 (0.0564419 0.9435581)

*
Basically, it says, the decision tree is mostly using one feature of the invoice - the delay

ratio (see in Sec 4.3.4).

size of tree
1 2 4

0
W..

uJ
0) C:)
(0

cc
co

C:)

III
Inf 0.1 0.016
cp
Figure 6-3: Choose the best decision tree complexity (cp) with 10-fold cross validation

63
delayra < 0.65
0.65

delay_ra < 0.35


>= 0.35

0
delay_ra < 0.51
>= 0.51

0 1

Figure 6-4: Decision tree demo on binary case

The prediction accuracy of decision tree is 0.861. And the confusion matrix in the

out of sample prediction is in Table 6.2.

Actual no-delay Actual delay


Predicted no-delay 1424 573
Predicted delay 463 5004
Table 6.2: Confusion matrix of decision tree in binary outcome case

And the most important three features in the decision tree models are:

1. Delay ratio of the customer

2. Average delay days of the customer

64
3. Number of total invoices of the customer

Random Forests Random Forests is an ensemble learning method based on decision


trees. The idea behind is generate multiple decision trees and let them "vote" for the
classification result [37]. In some degree, random Forests corrects for decision trees'
habit of overfitting.

Therefore, one of the key parameter of random Forests algorithm is the number of
decision trees to grow. Figure 6-5 shows that, the training error becomes very stable
when the number of growing trees is more than 100 for the binary case.

LO)
C4
C0

CD
C~j
0

I- LC)
----
--
--
---
-----
----------- -
0.

uJ

C0
17
C6

I I I 1
0 50 100 150 200 250 300
trees
Figure 6-5: Training error versus number of tree growing in random Forests

We then apply the trained random Forests model into the out-of-sample test data.

The prediction accuracy is 0.892. And the confusion matrix in the out of sample pre-

65
diction is in Table 6.3.

Actual no-delay Actual delay


Predicted no-delay 1587 410
Predicted delay 398 5069
Table 6.3: Confusion matrix of decision tree in binary outcome case

Also, Random Forests gives us the ranking of feature importance, as shown in

Figure 6-6. It shows that, the delay ratio and average delay days of the customer who

the invoice belongs to are two most important features.

delayraio

aedelay-day

we~g eaverdge~delayday

pBALdee
aEdelay

ave-delayirvoiced4ay
nio~delay
no invoice
duedate-end

ratio-sumn
sumR_delay
. ................
ave-invoice

r~dn~e MOit
sum irersce

aeabufe

I I I I I1~~~ I

0 1000 2000 3000 4000 5000


Mean~ecreaseGini

Figure 6-6: Variable importance random Forests in binary outcome case

AdaBoost AdaBoost is a method of ensemble learning. It combines the output


of various algorithms (so called weak learners) into a weighted sum and give the

66
prediction[41].

We then apply the AdaBoost method into the out-of-sample test data. The predic-

tion accuracy is 0.863. And the confusion matrix in the out of sample prediction is in

Table 6.4.

Actual no-delay Actual delay


Predicted no-delay 1588 617
Predicted delay 409 4850
Table 6.4: Confusion matrix of decision tree in binary outcome case

Also, Adaboost gives certainty of the predication on each invoice, which is called

margin and calculated as the difference between the support of the correct class and

the maximum support of an incorrect class[42]. The cumulative distribution of margins

of predictions of both test and train data can be found in Figure 6-7. It shows that,

the AdaBoost is quite certain for around 50% of the invoice predictions.

Logistic Regression For the binary outcome case, we could also use the classic bi-

nomial logistic regression method[43].

It shows that, the prediction accuracy is 0.864, with the confusion matrix shown

in Table 6.5.

Actual no-delay Actual delay


Predicted no-delay 1414 583
Predicted delay 430 5037
Table 6.5: Confusion matrix of logistic regression in binary outcome case

And the statistically significant features are shown in Table 6.6.

Support Vector Machine (SVM) Another natural choice of supervised learning

algorithm is support vector machine (SVM), which actually turns the learning into an

67
Margin cumulative distribution graph
C0
-- test
CO
trai
0
CD
0 (O

CD
n
0 6
0
J-1
0t
C0

-1.0 -0.5 0.0 0.5 1.0


m
Figure 6-7: Margin (prediction certainty) cumulative distribution of the AdaBoost algorithm on invoice binary
outcome prediction

optimization problem[44].

The prediction accuracy of SVM is 0.869, with the confusion matrix shown in

Table 6.7.

68
Feature Explanation
due-date-end Month end indicator
middle-month Middle month indictor
no-invoice Total number of invoice of the invoice owner
no-delay Total number of delayed invoice of the invoice owner
ave-delay Average delay of the delayed invoices of the invoice owner
ratio-no Ratio of no-delay and no-invoice
Table 6.6: Statistically significant invoice features in logistic regression of binary outcome case

Actual no-delay Act ual delay


Predicted no-delay 1398 599
Predicted delay 381 5086
Table 6.7: Confusion matrix of SVM in binary outcome case

69
6.3.2 Multiple Outcome Case

We then present the prediction result of multiple (four) outcome case, with same ma-

chine learning algorithms above.

The general results are shown in Table 6.8.

Model Out of Sample Prediction Accuracy


Decision Tree 0.764
Random Forests 0.816
AdaBoost 0.770
Logistic Regression 0.755
Support Vector Machine (SVM) 0.773
Table 6.8: The prediction result of multiple outcome case with various machine learning algorithms

Decision tree Again, we grow the tree and prune the tree. However, the tree now

need to make decisions on four outcomes: no-delay, short delay (within 30 days),

medium delay (30-90 days) and long delay (more than 90 days).

As shown in Figure 6-8, the optimal cp = 0.018. We then use the optimal decision

tree to predict the out-of-sample data, it turns out that the overall accuracy is 0.764.

How the decision tree makes classifications is shown in Figure 6-9.

The detailed confusion matrix is in Table 6.9.

Actual no-delay Actual short delay Actual medium delay Actual long delay
Predicted no-delay 1712 229 46 10
Predicted short delay 724 3378 286 8
Predicted medium delay 100 227 495 18
Predicted long delay 29 23 61 118
Table 6.9: Confusion matrix of decision tree in multiple outcome case

We can visualize how the decision tree works in the two most important features:

delay ratio and average delay days. As shown in Figure 6-10, each dot in the figure is

an invoice, whose color represents different outcome:

* Red: No-delay

70
size of tree
1 2 3 4
1~

0-

0 C)
C

C5

...
...........................................................
LO)
C

Inf 0.15 0.048 0.018

cp
Figure 6-8: Choose the best decision tree complexity (cp) with 10-fold cross validation

" Green: Short delay

" Blue: Medium-delay

" Violet: Long-delay

Each invoice is attached to one customer, whose delay ratio and average delay days

of historical invoices are known. If we take them as two axises and plot, it shows clearly

the segmentation of invoices. And the black lines in Figure 6-10 is the segmentation

thresholds generated by machine - the decision tree algorithm.

71
delayjra < 0.66
>= 0.66

o dela
ave dela < 30
>= 30

-3
avedela < 94
>= 94

1-9

Figure 6-9: Decision tree algorithm demo on multi-outcome case

72
200

150
-

E
factor(delay class)

- *No delay

1 1-30
31-90

- >90
a)
0)
.

a)

50-.

<~40,

0
-

0.00 0.25 0.50 0.75 1.0


Ratio of delayed invoice of the customer
Figure 6-10: Visualization of decision tree in multiple outcome cases of invoice prediction

73
Random Forests We then apply the trained random Forests model into the out-of-

sample test data. Again, it is found out that, the classification result becomes very

stable when growing more than 100 decision trees. The prediction accuracy is 0.816.

And the confusion matrix in the out of sample prediction is in Table 6.10.

Actual no-delay Actual short delay Actual medium delay Actual long delay
Predicted no-delay 1625 350 17 5
Predicted short delay 367 3898 121 10
Predicted medium delay 64 305 449 22
Predicted long delay 18 44 49 120
Table 6.10: Confusion matrix of random Forests in multiple outcome case

Again, the ranking of feature importance, as shown in Figure 6-11. It shows that,

the delay ratio and average delay days of the customer who the invoice belongs to are

still two most important features.

. . . . .. . . .. .. .. .
ave-40w aya.

.
avse& ay-X mlceday .. . . . . . . . . . . . . . . .v . . . . . . . . . .

weighted avermgedelyay

pastdu. -0

no deay 0

avejdelay .. . . . .0. . . . . .. . . . . . . .
.

nokwnolod.. . ..
.

- - -
- --
-

due daoqend -

- - - - -
sum delay 0
-

0 -- - --
sumJinvoIce

01
avev ulner

0 1000 2000 3000 4000 5000


MeanDeoeaseGn

Figure 6-11: Variable importance random Forests in multiple outcome case

74
AdaBoost For AdaBoost, the prediction accuracy is 0.770, with error details in
Table 6.11. It doesn't work as well as Random Forests, although it is also one kind of
ensemble learning methods.
Actual no-delay Actual short delay Actual medium delay Actual long delay
Predicted no-delay 1442 408 83 33
Predicted short delay 525 3824 374 36
Predicted medium delay 24 154 360 39
Predicted long delay 6 10 23 123
Table 6.11: Confusion matrix of SVM in multiple outcome case

We look at the certainty of the predication on each invoice again. The cumulative

distribution of margins of predictions of both test and train data can be found in

Figure 6-12. It shows that, the AdaBoost is quite certain for around 40% of the invoice
predictions for multi-outcome case.

Margin cumulative distribution graph


C0
- test
CO
- trair

C0

C')
"qt
0
CD

0-
0M

C'J
C>

0)

-1.0 -0.5 0.0 0.5 1.0

m
Figure 6-12: Margin (prediction certainty) cumulative distribution of the AdaBoost algorithm on invoice
multi-outcome prediction

75
Logistic Regression We can also apply multinomial logistic regression on the mul-

tiple outcome case, which has the similar mathematical structure with the binary case.

It shows that, the prediction accuracy is 0.755, with details in Table 6.12.

Actual no-delay Actual short delay Actual medium delay Actual long delay
Predicted no-delay 1383 577 9 28
Predicted short delay 340 3961 64 31
Predicted medium delay 61 551 204 24
Predicted long delay 13 54 73 91
Table 6.12: Confusion matrix of logistic regression in multiple outcome case

And the statistical significant variables are shown in Table 6.13, which do not differ

with binary outcome case very much.

Feature Explanation
due-date-end Month end indicator
middle-month Middle month indictor
no-invoice Total number of invoice of the invoice owner
ave-delay Average delay of the delayed invoices of the invoice owner
ratio-no Ratio of no-delay and no-invoice
Table 6.13: Statistically significant invoice features in logistic regression of multiple outcome case

Support Vector Machine (SVM) The last method is again SVM. However, SVMs

are inherently two-class classifiers. The usual way of doing multi-class classification

with SVM in the practice has been to build a set of one-versus-one classifiers, and to

choose the class that is selected by the most classifiers. In other words, it is building

a voting mechanism for the classifiers[45].

The out-of-sample prediction accuracy of multi-class SVM is 0.773. And the con-

fusion matrix is in Table 6.14.

76
Actual no-delay Actual short delay Actual medium delay Actual long delay
Predicted no-delay 1521 464 5
Predicted short delay 425 3862 100 9
Predicted medium delay 76 471 275 48
Predicted long delay 17 54 48 112
Table 6.14: Confusion matrix of SVM in multiple outcome case

77
6.4 Results Analysis & Improvements

In this section, we analyze the result of our best preforming algorithm, Random Forests.

In both cases of binary outcome (Table 6.1) and multi-outcome (Table 6.8), we see

quite consistent performances of various machine learning algorithms. And Random

Forests has been the best algorithm in both.

If we compare the baseline and the performance of Random Forests, as shown in

Figure 6.15, the general predictability is quite significant. However, we need to go

to the confusion matrix of decision making to understand the algorithm performance

better.
Baseline Random Forests
Binary Outcome Case 0.731 0.892
Multiple Outcome Case 0.598 0.81
Table 6.15: Measuring the performance of Random Forests

One of the key message of the confusion matrix is actually the Type 1 and Type 2

Errors[46, as demonstrated in Figure 6-13.

True State of Nature


Ho is tie H, is tUn
Correct decismon Type II error
Accept Ho Probability = 1 - a Probability = A

Decision
Made
Type I error Correct decinm

Reject Ho Probability= a Probability = 1


-

(significance level) (power)

Figure 6-13: Two types of errors in decision making

In the binary outcome case of invoice prediction, Type 1 error is, given the invoice

is going to be paid on time, the machine predicts it will be delay. And Type 2 error

78
is, given the invoice payment is going to delay, the prediction says no-delay.

Obviously, for the invoice collector, different types of errors weight differently. Usu-

ally, Type 2 error is much more "expensive" than Type 1 error. Expand the idea to

multiple outcome case, it says the prediction accuracy on different classes of outcomes

weights differently.

Therefore, we show the prediction accuracies of each class in both cases in Ta-

ble 6.16 and Table 6.17. Table 6.16 basically tells us, given one invoice is going to be

paid lately, our algorithm can detect it when issuing with around 93% accuracy.

Prediction Accuracy
No-delay 0.794
Delay 0.927
Table 6.16: Class prediction accuracy of Random Forests in Binary Outcome Case

The Table 6.17 shows that, the algorithm (Random Forests) is quite good on detect

the delay, especially the short delay. However, it has difficulties to detect the medium

or long delays with very high accuracies.

Prediction Accuracy
No-delay 0.814
Short delay 0.887
Medium delay 0.535
Long delay 0.519
Table 6.17: Class prediction accuracy of Random Forests in Multiple Outcome Case

We shall address this problem in multiple outcome case in the next section.

79
6.5 Imbalanced Data & Solution

One may notice from the previous section that, the prediction accuracy of our best

algorithm, Random Forests, varies in different classes.

The main reason of this accuracy difference is because the data is imbalancec.There

are different numbers of invoices in different classes. In other words, there are more

invoice with outcome A than outcome B.

In the section, we try to address this problem in two ways. One is based sampling

technique, the other one is based on cost sensitive learning. [47].

6.5.1 Weighted Sampling

As mentioned by Breiman[48], Random Forests grows its trees with a bootstrap sample

of training data. However, in an imbalanced training set, there is high probability that

the bootstrap sample containing few or even non of the minority class, resulting a tree

with poor predicting capability of the minority class.

The intuitive way to fix this problem is with weighted sampling, which is also

called stratified bootstrap[49]. That is saying, sample with replacement from within

each class.

Therefore, we now sample each invoice outcome class with the weights (frequencies)

proportional to their class size. It is also called Balanced Random Forests. The result

is shown in Table 6.18.

The result shows that, by stratified sampling, one can significantly improve the

prediction accuracy of the minority class (long delay invoices here). However, it is with
the cost of the overall prediction accuracy.

80
Random Forests Balanced Random Forests
Overall 0.816 0.610
No-delay 0.814 0.758
Short delay 0.887 0.515
Medium delay 0.535 0.698
Long delay 0.519 0.8354
Table 6.18: A comparison of prediction accuracy of Random Forests and Balanced Random Forests

6.5.2 Cost-sensitive Learning

To address the different misclassification costs, we can also use instance re-weighting,
which is a common approach in cost-sensitive learning.

Since the Random Forests tends to be biased towards the majority class, which

is also the less important class[48], we can change the penalty function and place a

heavier penalty on misclassifying the minority (and more important or expensive, like

long delay invoices) class.

Therefore we assign a weight to each class, with the minority class given larger

weight (i.e., higher misclassification cost), as shown in the Table 6.19

Predicted no-delay Predicted short delay Predicted medium delay Predicted long delay
Actual no-delay 0 1 1 1
Actual short delay 2 0 1 1
Actual medium delay 3 2 0 1
Actual long delay 4 3 2 0
Table 6.19: Misclassification cost matrix C for cost-sensitive Random Forests

Table 6.19 basically is a cost matrix, C. It tells us, given a invoice is actually in

Class i, the penalty of classifying it into Class j is 0C . For example, given a invoice

has long delay, the penalty of classifying it into short delay is C42 = 3, which is larger

than classifying into medium delay. C43 = 2.

The Random Forests algorithm in MATLAB, TreeBagger, can directly incorporate

81
this cost matrix C to the learning. The resulting prediction accuracy is shown in Ta-

ble 6.20.

Random Forests Cost-sensitive Random Forests


Overall 0.816 0.812
No-delay 0.814 0.778
Short delay 0.887 0.889
Medium delay 0.535 0.600
Long delay 0.519 0.619
Table 6.20: A comparison of prediction accuracy of Random Forests and cost-sensitive Random Forests

One can see from Table 6.20 that, the overall accuracy of Cost-sensitive Random

Forests is quite consistent with the original one. And we see the improvement on the

accuracies on medium delay and long delay classes.

82
6.6 Model Robustness
We have also been able to test the performance of our Random Forests model in the
new two more month data.

Each of the two month datasets has similar volume of invoices, around 70k, and
exactly same information for each invoice. Therefore, we are able to update our model
parameters with new training data and test it again in the out-of-sample test data.
The result is shown in Table 6.21.

Original Dataset New Month A New Month B


Delay Detection Accuracy 0.927 0.914 0.909
Multi-Outcome Overall 0.816 0.794 0.803
Table 6.21: Model prediction accuracy in data of different donths

It shows that, the Random Forests model has a relatively consistent prediction

accuracy in different datasets of various months.

83
6.7 Why Random Forests Works

So far, the algorithm Random Forests has outperformed other classifiers in the invoice

prediction case. It agrees with the author's experience in machine learning - Random

Forests is often the winner for lots of problems in classification (usually slightly ahead

of SVMs), they're also fast and scalable.

In a paper of 2014, Fernandez-Delgado, Cernadas & Barro [50] evaluated 179 classi-

fiers of machine learning in 121 data sets from UCI data base. They found out that the

classifiers most likely to be the bests are the Random Forests versions. which achieves

94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. While

the second best, the SVM with Gaussian kernel, was close, which achieved 92.3% of

the maximum accuracy.

There is no definite answer why Random Forests works well widely in various cases.

However, its good performance is believed to strongly associated with several of its

features below[48]:

" Random Forests can handle thousands of input variables without variable dele-

tion.

" It generates an internal unbiased estimate of the generalization error as the forest

building progresses.

" Prototypes are computed that give information about the relation between the

variables and the classification.

84
Chapter 7

Conclusion

7.1 Summary

This thesis discusses the analytics of business invoices and how to detect problematic

payments and customers. Companies issued thousands of business invoices per month,

but only a small proportion of them are paid in time. Machine learning is a natural

fit to leverage the power of business analytics to understand the pattern of invoice and

improve the account receivable collection by predicting the delay in advance.

In this thesis, I analyze the account receivable management (invoice collection) of

a Fortune 500 company. It showed that a large proportion of the invoices issued are

not paid in time. And we observe invoices with very different payments delays.

This thesis proposes a supervised (machine) learning approach to solve this problem

and presented the corresponding results in invoice payment prediction. It shows that

the machine learning algorithms can generate accurate prediction on the delay of the

invoice payments, based the data historical invoices.

I also build a set of aggregated features of business invoices which capture the char-

85
acteristics of the invoice and the customer it belongs to. It shows, although no single

feature of the invoice can reveal its payment outcome, the aggregate information is

powerful in helping us understand the invoice outcomes. Having this set of features

enhances the prediction accuracy significantly.

The algorithm Random Forests has been the best predictor in the invoice payment

delay problem so far. More than that, the thesis demonstrates that by using cost-

sensitive learning, we are able to improve prediction accuracy particularly for long

delay invoices, which are the minorities in the data sets.

In addition, the thesis demonstrates the robustness of the machine learning model

applied in this problem. It receives consistent prediction accuracies across various in-

voice data sets.

In general, this thesis does a comprehensive research on the invoice payment out-

come prediction, from data processing to prediction model building and calibrating. It

offers a framework to understand the business invoices and the customers they belong

to. It also provides an actionable knowledge for the industry people to adopt.

7.2 Future work

Based on the framework built by this thesis, there are several directions one can go

further on this topic.

Building a better training dataset

One way to improve the prediction accuracy of the machine learning model is to

feed it better training dataset.

86
As for now, the training dataset is build with one-month historical invoice data. It

is a relatively short period considering the data a company may have accumulate. If

one can access the data with longer history, he or she may be able to improve the pre-

diction accuracy significantly and find the seasonality pattern of the invoice payments.

Incorporating extra information of the customer, like its revenue and margin, is

another way to improve the training set. It is always helpful to give the machine more

information of the object, even it may seem to be not relevant at the first place.

Calibrating cost-sensitive learning

The thesis explores the idea of using cost-sensitive learning to predict the invoice

payments. However, the misclassification error matrix, which is the core of this algo-

rithm, worth more careful studying.

In this thesis, the misclassification error matrix, as defined in Table 6.19, gives a

qualitative approximation of the real cost. For example, Table 6.19 says the cost of

misclassifying long delay invoice into no-delay class is twice of the cost of misclassifying

into medium delay class. This multiplier (2x) is not necessarily true and may be

improved if one can incorporate more business sense. One could even define a dynamic

cost matrix, if better prediction accuracy can be achieved.

Prioritizing invoice collection

Based on the predictability we achieved, we can further explore algorithms based

invoice grading to channelize work flow to maximize business value. That is saying,

how to use prediction result to optimize collection performance.

Ideally, we would take actions on all the "bad" invoices, as their payments would

87
be late if we do nothing. However, there are always resource constraints that prevent

taking actions on all. Therefore, it is of great interests to develop an algorithm to

prioritize the invoices. Of course, the implicit assumption here is, taking an action on

an invoice will reduce the time of delinquency.

A natural approach would be prioritize based on the prediction outcome, i.e. how

bad the delay would be. However, it goes more complicated as we set the objective to be
maximizing the revenue - we may need to weight the amount of the bill. Additionally,

an action may have different delinquency reduction on different clients. We surely need
models to quantify the result of collection actions. They are all worth modeling if we

manage to increase the effectiveness of invoice collection.

88
Bibliography

[1] V. Mayer-Sch6nberger and K. Cukier, Big data: A revolution that will transform
how we live, work, and think. Houghton Mifflin Harcourt, 2013.

[2] G. Piatetsky-Shapiro, "Data mining and knowledge discovery 1996 to 2005: over-
coming the hype and moving from university to business and analytics," Data
Mining and Knowledge Discovery, vol. 15, no. 1, pp. 99-105, 2007.

[3] K. P. Murphy, Machine learning: a probabilisticperspective. MIT press, 2012.

[4] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. H.


Byers, "Big data: The next frontier for innovation, competition, and productivity,"
2011.

[5] A. McAfee, E. Brynjolfsson, T. H. Davenport, D. Patil, and D. Barton, "Big data,"


The management revolution. Harvard Bus Rev, vol. 90, no. 10, pp. 61-67, 2012.

[6] S. Zeng, P. Melville, C. A. Lang, I. Boier-Martin, and C. Murphy, "Using pre-


dictive analysis to improve invoice-to-cash collection," in Proceedings of the 14th
ACM SIGKDD internationalconference on Knowledge discovery and data mining,
pp. 1043-1050, ACM, 2008.

[71 B. Younes, A. Bouferguene, M. Al-Hussein, and H. Yu, "Overdue invoice manage-


ment: Markov chain approach," Journal of Construction Engineering and Man-
agement, vol. 141, no. 1, 2014.

[8] S. L. Mian and C. W. Smith, "Accounts receivable management policy: theory


and evidence," The Journal of Finance, vol. 47, no. 1, pp. 169-200, 1992.

[9] M. Shao, S. Zoldi, G. Cameron, R. Martin, R. Drossu, J. G. Zhang, and


D. Shoham, "Enhancing delinquent debt collection using statistical models of debt
historical information and account events," Mar. 13 2007. US Patent 7,191,150.

[10] E. Hofmann, "Supply chain finance-some conceptual insights," Beitrdge Zu


Beschaffung Und Logistik, pp. 203-214, 2005.

[11] H.-C. Pfohl and M. Gomm, "Supply chain finance: optimizing financial flows in
supply chains," Logistics research, vol. 1, no. 3-4, pp. 149-161, 2009.

89
[12] A. Gunasekaran, C. Patel, and E. Tirtiroglu, "Performance measures and metrics
in a supply chain environment," Internationaljournal of operations & production
Management, vol. 21, no. 1/2, pp. 71-87, 2001.

[13] R. Bhagwat and M. K. Sharma, "Performance measurement of supply chain man-


agement: A balanced scorecard approach," Computers & Industrial Engineering,
vol. 53, no. 1, pp. 43-62, 2007.

[14] P. Kouvelis and W. Zhao, "Supply chain finance," The Handbook of Integrated
Risk Management in Global Supply Chains, pp. 247-288, 2011.

[15] "Measure and manage collection efficiency using dso."


http://www.abc-amega.com/articles/credit-management/
measure-and-manage-collection-ef f iciency-using-dso. Accessed: 2015-04-
30.

[16] "Top accounts receivable software products." http: //www. capterra. com/
accounts-receivable-software/. Accessed: 2015-04-30.

[17] A. Shen, R. Tong, and Y. Deng, "Application of classification models on credit card
fraud detection," in Service Systems and Service Management, 2007 International
Conference on, pp. 1-4, IEEE, 2007.

[18] V. F. Araman and R. Caldentey, "Dynamic pricing for nonperishable products


with demand learning," Operations research, vol. 57, no. 5, pp. 1169-1188, 2009.

[19] A. E. Khandani, A. J. Kim, and A. W. Lo, "Consumer credit-risk models via


machine-learning algorithms," Journal of Banking & Finance, vol. 34, no. 11,
pp. 2767-2787, 2010.

[20] J. Aizenman and Y. Jinjarak, "The collection efficiency of the value added tax:
Theory and international evidence," Journal of International Trade and Economic
Development, vol. 17, no. 3, pp. 391-410, 2008.

[21] S. Ghosh and D. L. Reilly, "Credit card fraud detection with a neural-network," in
System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International
Conference on, vol. 3, pp. 621-630, IEEE, 1994.

[22] J. Hansen, J. B. McDonald, W. Messier Jr, and T. B. Bell, "A generalized


qualitative-response model and the analysis of management fraud," Management
Science, vol. 42, no. 7, pp. 1022-1032, 1996.

[23] V. Hanagandi, A. Dhar, and K. Buescher, "Density-based clustering and radial


basis function modeling to generate credit card fraud scores," in Computational
Intelligencefor FinancialEngineering, 1996., Proceedings of the IEEE/IA FE 1996
Conference on, pp. 247-251, IEEE, 1996.

90
[24] J. R. Dorronsoro, F. Ginel, C. Sgnchez, and C. Cruz, "Neural fraud detection in
credit card operations," Neural Networks, IEEE Transactions on, vol. 8, no. 4,
pp. 827-834, 1997.

[25] D. J. Hand and W. E. Henley, "Statistical classification methods in consumer


credit scoring: a review," Journalof the Royal StatisticalSociety: Series A (Statis-
tics in Society), vol. 160, no. 3, pp. 523-541, 1997.

[26] A. F. Atiya, "Bankruptcy prediction for credit risk using neural networks: A
survey and new results," Neural Networks, IEEE Transactions on, vol. 12, no. 4,
pp. 929-935, 2001.

[27] J. H. Min and Y.-C. Lee, "Bankruptcy prediction using support vector machine
with optimal choice of kernel function parameters," Expert systems with applica-
tions, vol. 28, no. 4, pp. 603-614, 2005.

[28] K.-S. Shin, T. S. Lee, and H.-j. Kim, "An application of support vector machines
in bankruptcy prediction model," Expert Systems with Applications, vol. 28, no. 1,
pp. 127-135, 2005.

[29] Z. Huang, H. Chen, C.-J. Hsu, W.-H. Chen, and S. Wu, "Credit rating analysis
with support vector machines and neural networks: a market comparative study,"
Decision support systems, vol. 37, no. 4, pp. 543-558, 2004.

[30] J. W. Kim, H. R. Weistroffer, and R. T. Redmond, "Expert systems for bond rat-
ing: a comparative analysis of statistical, rule-based and neural network systems,"
Expert systems, vol. 10, no. 3, pp. 167-172, 1993.

[31] J. J. Maher and T. K. Sen, "Predicting bond ratings using neural networks: a
comparison with logistic regression," Intelligent Systems in Accounting, Finance
and Management, vol. 6, no. 1, pp. 59-72, 1997.

[32] J. Galindo and P. Tamayo, "Credit risk assessment using statistical and machine
learning: basic methodology and risk modeling applications," ComputationalEco-
nomics, vol. 15, no. 1-2, pp. 107-143, 2000.

[33] D. Bailey, B. Butler, T. Smith, T. Swift, J. Williamson, and W. Scherer, "Provid-


ian financial corporation: Collections strategy," in Systems Engineering Capstone
Conference, University of Virginia, 1999.

[34] L. A. Adamic and B. A. Huberman, "Power-law distribution of the world wide


web," Science, vol. 287, no. 5461, pp. 2115-2115, 2000.

[35] 1. Jolliffe, Principalcomponent analysis. Wiley Online Library, 2002.

[36] S. R. Safavian and D. Landgrebe, "A survey of decision tree classifier methodol-
ogy," 1990.

91
[37] A. Liaw and M. Wiener, "Classification and regression by randomforest," R news,
vol. 2, no. 3, pp. 18-22, 2002.

[381 Y. Freund, R. Schapire, and N. Abe, "A short introduction to boosting," Journal-
Japanese Society For Artificial Intelligence, vol. 14, no. 771-780, p. 1612, 1999.

[39] D. W. Hosmer Jr and S. Lemeshow, Applied logistic regression. John Wiley

&
Sons, 2004.

[40] J. A. Suykens and J. Vandewalle, "Least squares support vector machine classi-
fiers," Neural processing letters, vol. 9, no. 3, pp. 293-300, 1999.

[41] C. M. Bishop et al., Pattern recognition and machine learning, vol. 4. springer
New York, 2006.

[42] G. Ritsch, T. Onoda, and K.-R. Mnller, "Soft margins for adaboost," Machine
learning, vol. 42, no. 3, pp. 287-320, 2001.

[43] D. R. Cox, "The regression analysis of binary sequences," Journal of the Royal
Statistical Society. Series B (Methodological), pp. 215-242, 1958.

[44] S. Tong and D. Koller, "Support vector machine active learning with applications
to text classification," The Journal of Machine Learning Research, vol. 2, pp. 45-
66, 2002.

[45] T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, and R. Tibshirani,


The elements of statistical learning, vol. 2. Springer, 2009.

[46] J. Rice, Mathematical statistics and data analysis. Cengage Learning, 2006.

[47] C. Chen, A. Liaw, and L. Breiman, "Using random forest to learn imbalanced
data," 2005.

[48] L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5-32, 2001.

[49] P. J. Bickel and D. A. Freedman, "Asymptotic normality and the bootstrap in


stratified sampling," The annals of statistics, pp. 470-482, 1984.

[50] M. Fernindez-Delgado, E. Cernadas, S. Barro, and D. Amorim, "Do we need


hundreds of classifiers to solve real world classification problems?," The Journal
of Machine Learning Research, vol. 15, no. 1, pp. 3133-3181, 2014.

92

You might also like