Hu Mit
Hu Mit
Machine Learning
by
Hu Peiguang
B.Eng. Hon., Engineering Science Programme
National University of Singapore, 2013
Submitted to the Department of Civil and Environmental Engineering and the
Department of Electrical Engineering and Computer Science
in partial fulfillment of the requirements for the degrees of
Master of Science in Transportation
U)-J
and
S Lii
Master of Science in Electrical Engineering and Computer Science 00z
0Fi#
at the 3
<0
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2015
@ 2015 Massachusetts Institute of Technology. All rights reserved.
A uthor ..........................
Signature redacted...
Departmnt of Civil and Environmental Engineering
Depart ment oElectrical Engineering and Computer Science
MVdaLy 2 2
01 5U1L
ignature redacted
,
Certified by...............
C
David Simichi-Levi
Professor of Civil and Environmental Engineering
,Thesis Supervisor
Certified by ............ Signature redacted.
Ksuman Ozdaglar
Profe ssor of Electrical ngineering and Computer Science
1 6 le'j Thesis Supervisor
Accepted by ............... 14 ignature redacted Heidi Nepf
Chair, Depaytmental Committee for Graduate Students
Accepted by . Signature redacted .............
.
Abstract
Delinquent invoice payments can be a source of financial instability if it is poorly
managed. Research in supply chain finance shows that effective invoice collection is
positively correlated with the overall financial performance of companies. In this thesis
I address the problem of predicting the delinquent invoice payments in advance with
machine learning of historical invoice data. Specifically, this thesis demonstrates how
supervised learning models can be used to detect the invoices that would have delay
payments, as well as the problematic customers, which enables customized collection
actions from the firm. The model from this thesis can predict with high accuracy if
an invoice will be paid on time or not and also estimate the magnitude of the delay.
This thesis builds and trains its invoice delinquency prediction capability based on the
real-world invoice data from a Fortune 500 company.
Simchi-Levi and Prof. Asuman Ozdaglar, for their invaluable patience, guidance and
My plan to come to graduate school in this topic began many years back when I
was fortunate working with Prof. Daniel Abrams. I would like to thank him for en-
couraging me to pursue research of my interests even though it could have been (and
More personally, I am thankful to friends in MIT. Especially to Kim Yoo Joon, thank
you for listening to all my agonies in life and suggesting solutions. I am lucky to have
that helped me forming ideas for my thesis. To my lifelong friends, Huang Yijie and
Luo Feng, thank you for keeping me up-to-date on what lives outside of MIT are like.
Finally, to my parents, thank you for your love and support throughout this jour-
ney. None of this could have been possible without you and for that I am dedicating
this to you.
8
Contents
1 Introduction 15
2 Literature Review 21
3 Problem Formulation 29
9
4.2.2 Payment terms . . . . . . . . . . . . . . . . . . . . . . 38
.
4.2.3 Invoice amount & delay . . . . . . . . . . . . . . . . . 38
.
4.3 Feature Construction . . . . . . . . . . . . . . . . . . . . . . . 41
.
4.3.1 Selection of information . . . . . . . . . . . . . . . . . 42
.
4.3.2 Two levels of features . . . . . . . . . . . . . . . . . . . 43
.
4.3.3 Extra information & unexpected features . . . . . . . . 46
.
4.3.4 A full list of features . . . . . . . . . . . . . . . . . . . 48
.
5 Analysis with Unsupervised Learning 51
5.1 Principle Component Analysis (PCA) . . . . . . 51
.
5.2 Clustering . . . . . . . . . . . . . . . . . . . . . 54
.
6 Prediction with Supervised Learning 59
6.1 Supervised Learning Algorithms . 60
.
7 Conclusion 85
7.1 Summary . . . . . . . . . . . . . . . .. . . . .. . . .. . . .. . . 85
.
10
List of Figures
1-1 Historical productivity growth rate versus the potential gain from big
data[4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 17
.
1-2 Typical order-to-cash (02C) process . . . . . . . . . . . . . . . . 18
.
3-1 Two cases of invoice outcome definition..... 30
4-13 Distribution of binary indicators (features) of Month End and Half Month 49
11
5-3 How to pick the right number of clusters in unsupervised learning . . . 54
5-4 Average distance to centroid (Within groups sum of squares, also called
5-5 Clustering of invoices into two classes, plotted with first two discriminant
components (DC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5-6 Clustering of invoices into four classes, plotted with first two discrimi-
6-3 Choose the best decision tree complexity (cp) with 10-fold cross validation 63
6-8 Choose the best decision tree complexity (cp) with 10-fold cross validation 71
diction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
12
List of Tables
6.1 The prediction result of binary case with various machine learning algo-
rithm s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
outcom e case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.8 The prediction result of multiple outcome case with various machine
learning algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
outcom e case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
13
6.15 Measuring the performance of Random Forests . . . . . . . . . . . . . . 78
Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
14
Chapter 1
Introduction
We are entering the era of big data in the business analytics (BA) industry. For
example, Walmart handles more than 1M transactions per hour and has databases
containing more than 2.5 petabytes (2.5 * 101) of information. And it is estimate that,
by 2009, nearly all sectors in the US economy had at least an average of 200 terabytes
(2 * 1012) of stored data per company with more than 1,000 employees[1].
over the last 15 years the business analytics field went through a great transition, as
the amount of data available increased significantly. [2]. Specifically, this deluge of data
calls for automated methods of data analysis, where researchers looked machine for
help.
dominating force in the business analytics industry. According to Murphy [3], machine
learning is a set of methods that automatically detect patterns in data and then use
the uncovered data patterns to predict future data, or to perform decision making un-
15
der uncertainty based on knowledge from data.
Machine learning empower one's ability to transform the data into actionable knowl-
edge. McKinsey & Co, a consultancy, identified five broadly applicable ways to leverage
big data that offer transformational potential[4]:
formance
customers, consumers, and citizens benefit from the economic surplus enabled by data
also worth noticing that some sectors are expected to enjoy bigger gain when powering
the leading sector for its strong productivity growth and potential gain from the use of
business analytics[4]. However, finance and insurance sector also posses huge potential,
as long as it could overcome the barrier of data collection and cleaning. For example,
insurance firms such as Prudential and AIG have worked on predictive models on health
risks, letting people applying for insurance avoid blood and urine testing, which the
16
* Cluster A Cluster D
Some sectors are positioned for greater gains from i Cluster B Cluster E
the use of big data Cluster C
Historical productivity growth in the United States, 2000-08
0 ubesizes denote
relaw szes of GDP
24.0
23.5
23.0
22.5C puter and eledronic products*
Information
9.0
3.5 Adrrnistraion, support, and b trA
ade
3.0 waste management Manufacturing
This project focuses the finance sector, specifically aiming to manage accounts receiv-
able (AR) 1 more effectively, by predict the invoice payment outcomes when issuing
For the modern companies, Order-to-Cash (02C) normally refers to the business
process for receiving and processing customer sales [6]. Although its number of steps
may vary from firm to firm, depending on the firm's type and size, a typical 02C process
can be illustrated as the work flow in Figure 1-2. In this project, we focus on the AR
collection (invoice-to-cash) part of this process, i.e. two highlighted steps of Figure 1-2.
The market volume of AR collection is huge. According to the statistics, the GDP
1
Money owed by customers (individuals or corporations) to another entity in exchange for goods
or services that have been delivered or used, but not yet paid for.
17
CusorerOrder Management op Credit Management ---- Order Fulfillment ---- customer Billing
of the Canadian construction industry in 2012 is around $111 billion, and all of them
have been produced in the form of invoices[7]. In fact, AR has been the backbone of
reasons. First of all, AR collection can easily be a source of financial difficulty of firms,
if not well managed. It is, therefore, of great interests to manage it more effectively.
Also, most of the AR collection actions nowadays are still manual, generic and ex-
pensive [9]. For instance, it seldom takes into account customer specifics, neither has
any prioritizing strategies. Lastly and most importantly, commercial firms now are
accumulating large amount of data about their customers, which makes the large-scale
The main contribution of this thesis is it demonstrates that how to make accurate
e A new approach of detecting and analyzing invoice payment delay right at issuing
18
" A self-learning tool identifying problematic customers
" An intuitive framework visualizing the invoices payments and their customers.
19
1.3 Thesis Outline
This thesis is structured as seven chapters given in their chronological order. The
" Chapter 2 reviews the related literature, in both the fields of supply chain finance
and business analytics. It presents the previous works on invoice aging analysis
and account receivable management, as well as how machine learning has been
" Chapter 3 formulates the invoice payment prediction, a business case, into an
engineering problem. It shows how predict the invoice payment fits into the
* Chapter 4 presents how the data is processed in this project, as well as preliminary
" Chapter 5 is our analysis of the dataset with the techniques of the unsupervised
learning.
" We built and calibrated the invoice payment prediction model in Chapter 6, as
" Chapter 7 presents the conclusion and gives some ideas of further work.
20
Chapter 2
Literature Review
Applying machine learning models to make predictions in business contains two parts:
formulating a quantitative model for the business problem and tailoring the machine
learning algorithms on the formulated model. Although little work has been done on
predicting the outcomes of business invoices, there exits large amount of literature in
business on supply chain finance (Section 2.1) and accounts receivable age analysis
(Section ??). Also, in the remaining parts of this section, I review the classic machine
Supply chain finance refers to a set of business and financing processes that connecting
various parties in a transaction, like the buyer, seller and financing institution, to lower
One of the typical practices of Supply chain finance (SCF) is to provide short-term
credit that optimizes cash flow of both sides of a transaction[11]. It usually involves
the use of a technology of automating transactions and tracking the invoice approval
21
and settlement process from initiation to completion. The growing popularity of SCF
has been largely driven by the increasing globalization and complexity of the supply
chain, especially in industries such as automotive, manufacturing and the retail sector.
buyers Accounts Payable terms extension and payables discounting. Therefore, SCF
solutions differ from traditional supply chain programs to enhance working capital in
two ways:
" SCF links transactions to value as it moves through the supply chain.
" SCF encourages collaboration between the buyer and seller, rather than the com-
petition that often pits buyer against seller and vice versa.
In the example given by Pfohl[11], for any financial transaction, usually the buyer
will try to delay payment as long as possible, while the seller wants to be paid soon.
SCF works well here when the buyer has a better credit rating than the seller and
can therefore take capital at a lower cost. Then the buyer can leverage this financial
advantage to negotiate better terms from the seller such as an extension of payment
terms, which enables the buyer to conserve cash or control the cash flow better. The
seller benefits by accessing (sharing) cheaper capital, while having the option to sell
Accounts Receivable (or invoice collection) is long regarded as one of the most essential
parts of the supply chain finance and companies' financial stability [12] [13] [14].
There are many metrics used to measure the collection effectiveness of a firm[13].
One of the most basic matrices is the Collection Effectiveness Index (CEI), which is
defined as:
22
Beginning Receivables + (Credit Sales/N) - Ending Total Receivables x
100
Beginning Receivables + (Credit Sales/N) - Ending Current Receivables
CEI mostly measures the number and ratio of the invoice collection in a certain
time. While Average Days Delinquent (ADD) measures the average time from invoice
due date to the paid date, i.e., the average days invoices are overdue. A related metric,
Days Sales Outstanding (DSO), expresses the average time in days that receivables are
outstanding, which is:
DSO helps to determine the reason of the change in receivables, is it due to a change
in sales, or to another factor such as a change in selling terms? One can compare
the days' sales outstanding with the company's credit terms as an indication of how
There also exists upgraded versions of the DSO[15], such as the Sales Weighted
SWDSO also measures the average time that receivables are outstanding. However,
23
it is an improvement as it attempts to smooth out the bias of credit sales and terms
of sale[10]. It also gives a guidance on how the industry segments the different aging
periods of an invoice.
SCM, the effectiveness of delivery invoice methods and number of faultless delivery
notes invoiced are two of the top 5 important measures, ranking by the survey ratings.
However, the commonly used metrics in invoice outcome measurement is still func-
As Zeng points out[6], if one can predict this type of outcome (payment overdue
time) of an invoice, he or she could use this information to drive the collection process
so as to improve on a desired collection metric. For example, if one can identify invoices
that are likely to be paid lately at the time of issuing, one can attempt to reduce the
time to collection by taking actions, like calling or sending a reminder to the customer.
Furthermore, even after an invoice is past due, it is beneficial to know which invoices
are likely to be paid sooner than later, if no action is taken. One should always pay
24
2.3 Account Receivable Management
There are a number of software companies offering solutions package for order-to-cash
management, especially for account receivables. Examples from the big corporations
are Oracles e-Business Suite Special Edition Order Management and and SAPs Order-
to-Cash Management for Wholesale Distribution. There are also small & medium size
Oracles solution provides information visibility and reporting capabilities. SAPs so-
lution supports collections and customer relationship management. And both WebAR
and NetSuite specialize in debt collection, which offer a platform to process payments
and manage accounts receivable[6]. To our best knowledge, none of such softwares or
solutions incorporates analytics capacity, especially the ability of predicting the out-
come of the invoice, although they much have accumulated a large amount of invoice
data.
Machine learning modeling approaches haven't been applied to invoice prediction and
collection very much yet, except the work of Zeng[6], which developed a decision-tree
However, the machine learning algorithms, especially the state-of-art ones, are now
widely used in a number of other related fields, such as credit card transaction fraud
tax collection[20].Among them, both credit card transaction fraud detection and credit
risk modeling have attract a number of researchers and there are quite a few literatures
available.
25
2.4.1 Credit Card Fraud Detection Model
Detecting credit card fraud has been a difficult and labor-intense task without machine
learning. Therefore, the credit card fraud detection model has become of significance,
in both academia and industry. Although they are not called machine learning models
Ghosh, Reilly [21] used a neural network based model to learn and detect credit
card account transactions of one card issuer. Comparing with the traditional rule-based
fraud detection mechanism, the model detected more fraud accounts with significantly
fewer false positives.
Hansen, McDonald, Messier, and Bell [22] developed a predictive model to for man-
agement fraud based on data from an accounting firm, which accumulated data in its
business. The approach included the logit model, which shared the similarity with lo-
gistic regression. The models showed a good predictive capability for both symmetric
Hanagandi, Dhar and Buescher [23] built a fraud score model based on historical
credit card transactions data. The work was based on a fraud/non-fraud classification
methodology using a radial basis function network with a density based clustering ap-
Dorronsoro, Ginel, Sgnchez and Cruz [24] developed an online learning model, which
detected credit card transaction fraud based on a neural classifier. They also incorpo-
rated the discriminant analysis in the classification model. Their system is now fully
operational and currently handles more than 12 million operations per year with fine
results.
26
Shen, Tong and Deng [17] tested different classification methods, i.e. decision tree,
neural networks and logistic regression for the credit card transaction fraud detections.
And they further provided framework to choose the best model among these three
algorithms for the credit card fraud risk management. They also showed that neural
networks and logistic regression usually outperform decision tree in their case and
dataset.
Another field of interests of business analytics application is the consumer credit risk
modeling.
risk of finance. Even since the business began, decisions in the consumer lending busi-
ness largely rely on the data and statistics[25]. These traditional models are usually
used to generate a "score" for each customer and provide a baseline for the lending
business.
learning methods into predicting consumer default and delinquency behavior. Be-
cause, as argued by Khandani et al.[19], the machine learning models are perfectly
fit for the credit rating modeling because of "the large sample sizes of data and the
There were early work on applying algorithms like Neural networks and support vec-
tor machines (SVM) into the problem of predicting corporate bankruptcies[26][27] [28].
Atiya [26] proposed several novel features for the corporate financial stability for its
neural network model and outperformed the traditional scoring model. And Shin te
al. [28] demonstrated that support vector machine (SVM) is also a good candidate
27
algorithm, as it has good accuracy and generalization even the training sample size is
small. At last, Min et al.[27] explained the importance of applying cross-validatoin in
choosing the optimal parameters of the SVM and showed a good performance of SVM
in their dataset.
Kim [30] compared the neural network approach in bond rating with linear regres-
sion, discriminant analysis, logistic analysis, and a rule-based system, on the dataset
from Standard and Poors. It was found that neural networks achieved better perfor-
Maher and Sen [31] also compared the performance of neural networks on bond-
rating prediction with that of logistic regression.With the data from Moodys and Stan-
dard and Poors, the best performance came from neural network and it was around
70%.
Galindo and Tamayo [32] did a comparative analysis of various statistical and ma-
that that CART decision-tree models gave best prediction for default with an average
91.67% hit rate for a training sample of 2,000 records. Other models they have studied
28
Chapter 3
Problem Formulation
In this thesis, we ask two questions about a new business invoice when giving instances
To answer these two questions, we are going to build a classification model that
identifying to which of a set of outcome categories a new invoice belongs, on the basis
lem: given instances of past invoices and their outcomes, build a model that can predict
And this model shall help us understand the characteristics of delayed invoices and
problematic customers. In other words, it doesn't not only identify the payment delay,
29
3.1 Invoice Outcome Definition
Usually in supervised learning, each instance, which is the invoice here, is a pair con-
sisting of an input object (typically a vector) and a desired output value (also called
In the case of invoice analytics, the input object is all relevant information except
the payment result of one invoice, while the desired output value is the payment result
Although there are various matrices to measure the outcome of invoice payment,
like Days Sales Outstanding or Collection Effectiveness Index. In our case invoices are
Also, as we are interested in more than whether the invoice is going to delay or not,
we shall define the outcomes for two cases, like shown in Figure 3-1.
30
3.1.1 Binary outcome case
In this case, we only want to know whether a newly-issued invoice is going to be paid
lately or not. The problem therefore becomes a binary classification problem: we sim-
ply need to classify the given set of invoices into two groups, no-delay invoices and
delay invoices.
Accordingly, an important point is that the two groups are not symmetric rather
than overall accuracy, the relative proportion of different types of errors is of interest.
delay). And also, we may care about the balance of the dataset in terms of two groups,
The purpose of setting multiple (more than two) outcomes, as shown in Figure 3-1, for
one invoice is to determine the delay level of the invoice. As shown in the figure, we
1. No delay
It is worth noticing that, these four classes are commonly used in the invoice col-
lection business, where each class corresponds to a customized collection strategy [33].
And it does not necessarily have a balanced numbers of invoices in each classes.
31
32
Chapter 4
The data of this project comes from a Fortune 500 firm, which specializes in oilfield
services. It operates in over 90 countries, providing the oil and gas industry with
products and services for drilling, formation evaluation, completion, production and
reservoir consulting. And at the same time, it generates a large amount of business
The author has been able to access the databases of the company's closed invoices
for three consecutive months, September to November of 2014, with around 210,000
invoices in total.
The research was first conducted on the September data, as it is the only available
dataset at the very beginning of this project. And when the data of October and
November came, the author already has a machine learning model, which was further
The author presents the a detailed description and preliminary analysis on the
September data in Section 4.1 and Section 4.2. Data of the other two months are very
33
4.1 Data Description
An invoice a commercial document issued by a seller, which is the oilfield service firm
here, to a buyer, relating to a sale transaction and indicating the products, quantities,
and agreed prices for products or services the seller had provided the buyer.
Name Meaning
Customer Number
Customer Name
Document Number
Reference Reference number in the database
Profit Center
Document Date Invoice generating date
Posting Date Invoice posting date
Document Currency Amount of the invoice
Currency Currency of the invoice amount
User Name
Clearing Date Invoice clearing date
Entry Date Invoice closing date
Division
Group
Payment Term The "buffer" time of payment after the invoice issuing
Credit Representative
Table 4.1: Information on a typical electronic invoice
A typical electronic invoice issued by the company contains information in Table 4.1.
It reveals essential information about the deal between the buyer and the seller, as well
as the invoice collection mechanism of the buyer, like the profit center and the division.
It is important to notice that, all the invoices given in the format of Table 4.1, have
been closed. In other words, the buyers have collected all the amount of the invoices
and put that into the database. And when we are talking about the data of different
month (September, October or November, 2014), it refers to the invoices closed in that
month - it could be issued in any time before or within the close date.
34
Also, although there are lots of interpretations of "Payment Term", which usually
refers to the "discounts and allowances are reductions to a basic price of goods or
services", it represents in how many days the seller is expected to pay since the invoice
issuing.
35
4.2 Preliminary Analysis
In this section, the author presents preliminary analysis on various dimensions of the
DeFgyd : ondeidayed
...............................................................................
* There are totally 72464 invoices in the database, 73% of them are paid lately
* There are 4291 unique customers, which means average 17 invoices per customer
in that month.
Non-delayed invoices are all alike; every delayed invoice is delayed in its own way. The
average delayed dates of the delayed invoices are around 27 dates. However, out of the
36
60
50
40
20
0-
20
73% delayed invoices, the actual delayed dates are very different, as shown in Figure 4-3.
The distribution of the delayed dates are very similar to the famous power law
distribution[34]. It shows there are large amount of delayed invoices only delayed for
less than a short period, like 15 days. However, there also exits a long tail of the
distribution, which represents the problematic invoices with very long delays.
This shall
37
14000
12000
10000
~8M0
E 6000
z
4000
2000
the payment term in this database means the "buffer" time of payment after invoice
issuing. It is not easy to know how the seller assign the payment term for each invoice,
as it may be part of the business negotiation. However, a glimpse of the payment term
distribution in Figure 4-4 can help one better understand the invoice data.
It is found in the database that, the set of possible payment terms is {30 60 10 45
90 180 35 0 120 21 50 30 60 70 75 42 20}. It is quite clear from the Figure 4-4 that
most of the invoices have standard payment terms: 30 or 60 days.
One naive hypothesis on the reason of invoice delay is the invoice amount: the higher
invoice amount, the more likely it is going to delay. In other words, is it true that, for
the purpose of financial stability, buyers will delay the payment of invoice with large
38
7 X104
5-
0
0 20 40 60 80 100 120
amount?
To answer this question, we first use the box plot to analyze the trends. As men-
tioned in Section 3, there are two cases of outcomes for the invoice delay. The binary
outcome case is simply asking delay or not and is plotted in Figure4-5, while the mul-
tiple outcome case, which is asking how the delay would be, is plotted in Figure4-6.
To further verify the intuition from Figure 4-5 and Figure 4-6, we then plot the
average amount of the delayed invoices versus their delayed days in Figure 4-7. It is
clear that there is no obvious correlation between invoice amount and invoice delays in
this figure.
39
x 10 8
7
+
+
5
Q)
(.)
"(5
-~ 4
Q) +
£
0
c:3
:::J
0
+
E
<t:
2 +
+
'
No-delayed
Invoice outcome
It also reminds us, it is hard to use a single variable to predict the delay of the
invoice. We shall collect more information of the invoice and adopt more advanced
models to understand and predict the invoices.
40
8
x10
7
6-
0
0
> 4-
3
+
+
0
E+
<2+
3I
1 + +:
+
0
In the case of business invoices, the initial set of raw features are the information
present in the previous section. However, there are three problems with raw features:
" Some of the information can be redundant and too large to manage.
" Some categorical information has been stored in numerical formats.
41
60000
I
50000
40000
7's ,
-
0 IV"'
E
i 30000
7./. 4,
20000 4/T
The first step of invoice data preprocessing is to select a subset of information from
the database. It is equivalent with asking, given one invoice, what information on it
might be relevant to its payment? The subset is shown in Table 4.2.
Basically, the subset in Table 4.2 keeps the amount, the owner and the dates of the
invoice, as well as the handler. It contains almost all the information of one invoice,
except the product or service, which unfortunately is not available due to data privacy.
42
Figure 4-8: Construction of an integrated database of invoice and customer statistics used in machine learning models
Name Meaning
Customer Number
Document Date Invoice generating date
Posting Date Invoice posting date
Document Currency Amount of the invoice
Clearing Date Invoice clearing date
Entry Date Invoice closing date
Division
Payment Term The "buffer" time of payment after the invoice issuing
Credit Representative
-
The payment term has also been kept, because it is crucial on see if the invoice is
One may realize that, the subset, as in Table 4.2, provides limited information, espe-
cially on the customer that the invoice belongs to. However, to understand and predict
the payment of invoices, one needs to know more about the characteristics of the one
pays the invoice. In this case, the payer is the customer and which customer the invoice
43
belongs to contains a large amount of information of what will happen on this invoice
payment.
That's why there should be two levels of features of one invoice: invoice level and
customer level.
Invoice level features refer to the amount, the payment term, the division and the
various dates of the invoice. At the same time, the project aggregates the historical
invoices for each of the customers and builds a profile accordingly. The customer profile
For one customer, its historical invoice data, even only for one month, can lead to
a rich profile with various characteristics. Some of the elements of the customer profile
include:
9. ...
They are mostly statistical facts about one customer and customer level features of
As shown in Figure 4-9 and Figure4-10, the total number of invoices of customers
is similar to power law distribution: they are a large number of customer have small
44
number of invoices, but there is a long tail of customers with huge amount of invoices.
The distribution of customers' delayed days is similar - most of guys only delayed for
a short period of time, while a few delayed for really long time.
4500
-
4000
3500
3000
2500
2000
1500
1000
500
0
0 500 1000 1500 2000 2500 3000
Another interesting dimension of the customer is its delay ratio, which is the num-
ber of delayed invoice over the number of paid invoices, as shown in Figure 4-11. It
reveals the customer's payment record: if one has a delay ratio near zero, this customer
is a "good" customer - it pays every bill within the payment term. And the pattern
in Figure 4-11 tells what kind of customers the machine is facing: there are large num-
bers of good and bad customers, but very few in between. In other words, if randomly
picking one customer out the database, it is very likely it has an extreme delay ratio.
The pattern is even more obvious if we look at Figure 4-12, which plots histogram
45
4500
4000
3500
3000
2500
2000
1500
1000
500
0
0 500 1000 1500 2000 2500 3000
- they are either very well or very badly behaviored. For applying machine learning to
combination of art and science. The subset selection and double-level feature structure
in previous section has done the "science" part. This section introduces the "art" part
'Ratio between the customer's total amount of delayed invoice over its total amount of paid invoices
46
-i
2500 I I I I I I I I I
2000
E 1500
0
(0
E 1000
z
500
As the machine is supposed to detect the pattern of the invoice payments, extra
information on the invoice could come from the business logic behind the invoice pay-
ments and financial stability. One element of the financial stability is the stable cash
flow, especially at the end of the month, when the firm pays the salary and other fees.
Therefore, if one invoice is due at the month end, it might increase its change of delay.
We define a binary variable IME as follows:
There are 28.79% of the invoices are due at the month end (See in Figure 4-13). It
is quite surprising considering the narrow range of "month end" - only three days!
47
I
4000 4000 I I I
I I I I
I
II
I
I
I
3500
-
3000
-
0 2500 I-
E
0
,2000
0
E
z= 1500 l-
1000 1-
500
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Delay Amount Ratio of Invoice
It turns out there are more invoices due at the second half of the month, as shown
in Figure 4-13.
0
The full list of the features used in the machine learning algorithms is shown in Ta-
ble 4.3, there are fourteen of them.
48
6 x10
4.5
5
4
3.5
3
0 0
Z aS
C 2.5-
S3
E E 2
z 2
z
1.5
0.5
Figure 4-13: Distribution of binary indicators (features) of Month End and Half Month
It is worth noticing that, all the features are generated from the one-month data of
business invoices.
49
Feature Explanation
past-due The amount of the invoice
due-date-end Month end indicator
middle-month Middle month indictor
no invoice Total number of invoice of the invoice owner
no-delay Total number of delayed invoice of the invoice owner
suminvoice Total sum of invoice amount of the invoice owner
sum-delay Total sum of delayed invoice amount of the invoice owner
ave-delay Average delay of the delayed invoices of the invoice owner
ave-invoice Average delay of all the invoices of the invoice owner
ratio-no Ratio of no-delay and no invoice
ratio-sum Ratio of sum-delay and sum invoice
avebuffer Average payment term of the invoice owner
div Division of the invoice
sale-rep Sales representative of the invoice
Table 4.3: The full list of features of invoices
50
Chapter 5
Before applying supervised learning to do the invoice classification, it looks very inter-
esting to take a look at how much all these invoices with different outcomes actually
differ with each other. The way to do that is through the unsupervised learning, like
We then apply the other unsupervised learning method, principal component analysis,
In our case, PCA analysis uses the orthogonal transformation to convert the set
less than or equal to the number of original variables. This transformation is defined
in such a way that the first principal component has the largest possible variance (ac-
counts for as much of the variability in the data as possible), and each succeeding
51
0
0)0
')
C-)
0)0
C) 0
~c0 0
I I I I I I I I I I
1 2 3 4 5 6 7 8 9 10
component in turn has the highest variance possible under the constraint that it is
We plot the variance versus number of PC in Figure 5-1. It shows we need at lease
And the first two PCs only account for 22.4% and 20.1% of the variance. With only
these two PCs, one is unable to separate the two groups of invoices well, as shown in
Figure 5-2.
52
~+- No-delay + Delay
15
ca
10
C)J
CL -w asmms
cAi,
0 5 10 15 20
PC1 (22.4% explained var.)
Figure 5-2: Principle component analysis of invoice features
53
5.2 Clustering
We also apply clustering to group a set of instances in such a way that instances in
the same group (called a cluster) are more similar (in some sense or another) to each
Usually, the similarity between two instances are represented by the feature vector
distance. And therefore, to find the best number of clusters, we want the average intra-
distance of the cluster to be small, but the inter-distance to be large. One intuitive
" Try different k, looking at the change in the average distance to centroid, as k
increases.
t Best value
Average kf k
distance to
centroid
Figure 5-3: How to pick the right number of clusters in unsupervised learning
In the invoice case, we are asking, if ignoring the known outcomes of the instants,
how many different types of invoices are there? And how good we can cluster them
through the features? Is the optimal number of grouping/clustering them same with
54
0
Coo
C,
0
C)
C- LO
Co CO
E
o
Co oo
0
0) C)
oO
LO
2 4 6 8 10
Number of Clusters
Figure 5-4: Average distance to centroid (Within groups sum of squares, also called WSS) versus number of clusters
We could certainly cluster the invoice data into two or four groups without knowing
the actual outcomes, which is visualized in Figure 5-5 and Figure 5-6.
55
LO 2 2
2
222
1 2 2 2 2 2
S1 11
1 2 22
0 5 10 15 20
dc 1
Figure 5-5: Clustering of invoices into two classes, plotted with first two discriminant components (DC)
56
44
o - 4
fl)
I
0 10 20 30 40
dc 1
Figure 5-6: Clustering of invoices into four classes, plotted with first two discriminant components (DC)
57
58
Chapter 6
a set of data instances (invoices) expressed by a set of features and class labels (out-
puts), build a model that classifying a new invoice into two (or four) outcomes (6-1).
For the given dataset, as shown in Figure 6-1, we divided it into two parts, training
59
set and test set:
* Training Set: 80% of the data, used to train and calibrate the data.
" Test (Prediction) Set: 20% of the data, the out of sample part, which is used
In other words, we shall use the training data to teach the machine different types
of invoices, and then use the test day to simulate the new coming data.
Learning, here and in other sections, were run using 10-fold cross validation. One
round of 10-fold cross validation involves partitioning the training data into 10 com-
plementary subsets, performing the analysis on 9 subset (called the training fold), and
validating the analysis on the other subset (testing fold). To reduce variability, multiple
rounds of cross-validation are performed using different partitions, and the validation
We use the cross validation result to choose the right parameter for our machine
60
*
k folds (all instances)
* :1
*1 fold
II S SI
t 2f
U
,n 3
S
testing fold
k
Figure 6-2: K-fold cross validation
predictor, i.e., a classifier that always predicts the class most represented in the training
61
6.3 Model Outputs
Decision tree We start the surprised learning with the most intuitive method - de-
cision tree.
Decision tree learning uses a decision tree as a predictive model which maps obser-
vations about an item to conclusions about the item's target value[36]. The general
procedure of applying decision tree to make supervised classification includes two steps:
In the first step, we build a tree-style decision model with different complexity pa-
rameter (cp), which controls the size and levels of the decision tree. And then, we
could the optimal cp by looking at Figure 6-3, which is generated by cross validation.
It turns out, cp = 0.016 is the best decision tree complexity parameter for this problem.
We then prune the grown tree with cp = 0.016. And the pruned tree, also the one
62
The detailed description of the decision process is:
*
5) delay-ratio>=0.35322 8915 4340 1 (0.4868200 0.5131800)
*
Basically, it says, the decision tree is mostly using one feature of the invoice - the delay
size of tree
1 2 4
0
W..
uJ
0) C:)
(0
cc
co
C:)
III
Inf 0.1 0.016
cp
Figure 6-3: Choose the best decision tree complexity (cp) with 10-fold cross validation
63
delayra < 0.65
0.65
0
delay_ra < 0.51
>= 0.51
0 1
The prediction accuracy of decision tree is 0.861. And the confusion matrix in the
And the most important three features in the decision tree models are:
64
3. Number of total invoices of the customer
Therefore, one of the key parameter of random Forests algorithm is the number of
decision trees to grow. Figure 6-5 shows that, the training error becomes very stable
when the number of growing trees is more than 100 for the binary case.
LO)
C4
C0
CD
C~j
0
I- LC)
----
--
--
---
-----
----------- -
0.
uJ
C0
17
C6
I I I 1
0 50 100 150 200 250 300
trees
Figure 6-5: Training error versus number of tree growing in random Forests
We then apply the trained random Forests model into the out-of-sample test data.
The prediction accuracy is 0.892. And the confusion matrix in the out of sample pre-
65
diction is in Table 6.3.
Figure 6-6. It shows that, the delay ratio and average delay days of the customer who
delayraio
aedelay-day
we~g eaverdge~delayday
pBALdee
aEdelay
ave-delayirvoiced4ay
nio~delay
no invoice
duedate-end
ratio-sumn
sumR_delay
. ................
ave-invoice
r~dn~e MOit
sum irersce
aeabufe
I I I I I1~~~ I
66
prediction[41].
We then apply the AdaBoost method into the out-of-sample test data. The predic-
tion accuracy is 0.863. And the confusion matrix in the out of sample prediction is in
Table 6.4.
Also, Adaboost gives certainty of the predication on each invoice, which is called
margin and calculated as the difference between the support of the correct class and
of predictions of both test and train data can be found in Figure 6-7. It shows that,
the AdaBoost is quite certain for around 50% of the invoice predictions.
Logistic Regression For the binary outcome case, we could also use the classic bi-
It shows that, the prediction accuracy is 0.864, with the confusion matrix shown
in Table 6.5.
algorithm is support vector machine (SVM), which actually turns the learning into an
67
Margin cumulative distribution graph
C0
-- test
CO
trai
0
CD
0 (O
CD
n
0 6
0
J-1
0t
C0
optimization problem[44].
The prediction accuracy of SVM is 0.869, with the confusion matrix shown in
Table 6.7.
68
Feature Explanation
due-date-end Month end indicator
middle-month Middle month indictor
no-invoice Total number of invoice of the invoice owner
no-delay Total number of delayed invoice of the invoice owner
ave-delay Average delay of the delayed invoices of the invoice owner
ratio-no Ratio of no-delay and no-invoice
Table 6.6: Statistically significant invoice features in logistic regression of binary outcome case
69
6.3.2 Multiple Outcome Case
We then present the prediction result of multiple (four) outcome case, with same ma-
Decision tree Again, we grow the tree and prune the tree. However, the tree now
need to make decisions on four outcomes: no-delay, short delay (within 30 days),
medium delay (30-90 days) and long delay (more than 90 days).
As shown in Figure 6-8, the optimal cp = 0.018. We then use the optimal decision
tree to predict the out-of-sample data, it turns out that the overall accuracy is 0.764.
Actual no-delay Actual short delay Actual medium delay Actual long delay
Predicted no-delay 1712 229 46 10
Predicted short delay 724 3378 286 8
Predicted medium delay 100 227 495 18
Predicted long delay 29 23 61 118
Table 6.9: Confusion matrix of decision tree in multiple outcome case
We can visualize how the decision tree works in the two most important features:
delay ratio and average delay days. As shown in Figure 6-10, each dot in the figure is
* Red: No-delay
70
size of tree
1 2 3 4
1~
0-
0 C)
C
C5
...
...........................................................
LO)
C
cp
Figure 6-8: Choose the best decision tree complexity (cp) with 10-fold cross validation
Each invoice is attached to one customer, whose delay ratio and average delay days
of historical invoices are known. If we take them as two axises and plot, it shows clearly
the segmentation of invoices. And the black lines in Figure 6-10 is the segmentation
71
delayjra < 0.66
>= 0.66
o dela
ave dela < 30
>= 30
-3
avedela < 94
>= 94
1-9
72
200
150
-
E
factor(delay class)
- *No delay
1 1-30
31-90
- >90
a)
0)
.
a)
50-.
<~40,
0
-
73
Random Forests We then apply the trained random Forests model into the out-of-
sample test data. Again, it is found out that, the classification result becomes very
stable when growing more than 100 decision trees. The prediction accuracy is 0.816.
And the confusion matrix in the out of sample prediction is in Table 6.10.
Actual no-delay Actual short delay Actual medium delay Actual long delay
Predicted no-delay 1625 350 17 5
Predicted short delay 367 3898 121 10
Predicted medium delay 64 305 449 22
Predicted long delay 18 44 49 120
Table 6.10: Confusion matrix of random Forests in multiple outcome case
Again, the ranking of feature importance, as shown in Figure 6-11. It shows that,
the delay ratio and average delay days of the customer who the invoice belongs to are
. . . . .. . . .. .. .. .
ave-40w aya.
.
avse& ay-X mlceday .. . . . . . . . . . . . . . . .v . . . . . . . . . .
weighted avermgedelyay
pastdu. -0
no deay 0
avejdelay .. . . . .0. . . . . .. . . . . . . .
.
nokwnolod.. . ..
.
- - -
- --
-
due daoqend -
- - - - -
sum delay 0
-
0 -- - --
sumJinvoIce
01
avev ulner
74
AdaBoost For AdaBoost, the prediction accuracy is 0.770, with error details in
Table 6.11. It doesn't work as well as Random Forests, although it is also one kind of
ensemble learning methods.
Actual no-delay Actual short delay Actual medium delay Actual long delay
Predicted no-delay 1442 408 83 33
Predicted short delay 525 3824 374 36
Predicted medium delay 24 154 360 39
Predicted long delay 6 10 23 123
Table 6.11: Confusion matrix of SVM in multiple outcome case
We look at the certainty of the predication on each invoice again. The cumulative
distribution of margins of predictions of both test and train data can be found in
Figure 6-12. It shows that, the AdaBoost is quite certain for around 40% of the invoice
predictions for multi-outcome case.
C0
C')
"qt
0
CD
0-
0M
C'J
C>
0)
m
Figure 6-12: Margin (prediction certainty) cumulative distribution of the AdaBoost algorithm on invoice
multi-outcome prediction
75
Logistic Regression We can also apply multinomial logistic regression on the mul-
tiple outcome case, which has the similar mathematical structure with the binary case.
It shows that, the prediction accuracy is 0.755, with details in Table 6.12.
Actual no-delay Actual short delay Actual medium delay Actual long delay
Predicted no-delay 1383 577 9 28
Predicted short delay 340 3961 64 31
Predicted medium delay 61 551 204 24
Predicted long delay 13 54 73 91
Table 6.12: Confusion matrix of logistic regression in multiple outcome case
And the statistical significant variables are shown in Table 6.13, which do not differ
Feature Explanation
due-date-end Month end indicator
middle-month Middle month indictor
no-invoice Total number of invoice of the invoice owner
ave-delay Average delay of the delayed invoices of the invoice owner
ratio-no Ratio of no-delay and no-invoice
Table 6.13: Statistically significant invoice features in logistic regression of multiple outcome case
Support Vector Machine (SVM) The last method is again SVM. However, SVMs
are inherently two-class classifiers. The usual way of doing multi-class classification
with SVM in the practice has been to build a set of one-versus-one classifiers, and to
choose the class that is selected by the most classifiers. In other words, it is building
The out-of-sample prediction accuracy of multi-class SVM is 0.773. And the con-
76
Actual no-delay Actual short delay Actual medium delay Actual long delay
Predicted no-delay 1521 464 5
Predicted short delay 425 3862 100 9
Predicted medium delay 76 471 275 48
Predicted long delay 17 54 48 112
Table 6.14: Confusion matrix of SVM in multiple outcome case
77
6.4 Results Analysis & Improvements
In this section, we analyze the result of our best preforming algorithm, Random Forests.
In both cases of binary outcome (Table 6.1) and multi-outcome (Table 6.8), we see
better.
Baseline Random Forests
Binary Outcome Case 0.731 0.892
Multiple Outcome Case 0.598 0.81
Table 6.15: Measuring the performance of Random Forests
One of the key message of the confusion matrix is actually the Type 1 and Type 2
Decision
Made
Type I error Correct decinm
In the binary outcome case of invoice prediction, Type 1 error is, given the invoice
is going to be paid on time, the machine predicts it will be delay. And Type 2 error
78
is, given the invoice payment is going to delay, the prediction says no-delay.
Obviously, for the invoice collector, different types of errors weight differently. Usu-
ally, Type 2 error is much more "expensive" than Type 1 error. Expand the idea to
multiple outcome case, it says the prediction accuracy on different classes of outcomes
weights differently.
Therefore, we show the prediction accuracies of each class in both cases in Ta-
ble 6.16 and Table 6.17. Table 6.16 basically tells us, given one invoice is going to be
paid lately, our algorithm can detect it when issuing with around 93% accuracy.
Prediction Accuracy
No-delay 0.794
Delay 0.927
Table 6.16: Class prediction accuracy of Random Forests in Binary Outcome Case
The Table 6.17 shows that, the algorithm (Random Forests) is quite good on detect
the delay, especially the short delay. However, it has difficulties to detect the medium
Prediction Accuracy
No-delay 0.814
Short delay 0.887
Medium delay 0.535
Long delay 0.519
Table 6.17: Class prediction accuracy of Random Forests in Multiple Outcome Case
We shall address this problem in multiple outcome case in the next section.
79
6.5 Imbalanced Data & Solution
One may notice from the previous section that, the prediction accuracy of our best
The main reason of this accuracy difference is because the data is imbalancec.There
are different numbers of invoices in different classes. In other words, there are more
In the section, we try to address this problem in two ways. One is based sampling
As mentioned by Breiman[48], Random Forests grows its trees with a bootstrap sample
of training data. However, in an imbalanced training set, there is high probability that
the bootstrap sample containing few or even non of the minority class, resulting a tree
The intuitive way to fix this problem is with weighted sampling, which is also
called stratified bootstrap[49]. That is saying, sample with replacement from within
each class.
Therefore, we now sample each invoice outcome class with the weights (frequencies)
proportional to their class size. It is also called Balanced Random Forests. The result
The result shows that, by stratified sampling, one can significantly improve the
prediction accuracy of the minority class (long delay invoices here). However, it is with
the cost of the overall prediction accuracy.
80
Random Forests Balanced Random Forests
Overall 0.816 0.610
No-delay 0.814 0.758
Short delay 0.887 0.515
Medium delay 0.535 0.698
Long delay 0.519 0.8354
Table 6.18: A comparison of prediction accuracy of Random Forests and Balanced Random Forests
To address the different misclassification costs, we can also use instance re-weighting,
which is a common approach in cost-sensitive learning.
Since the Random Forests tends to be biased towards the majority class, which
is also the less important class[48], we can change the penalty function and place a
heavier penalty on misclassifying the minority (and more important or expensive, like
Therefore we assign a weight to each class, with the minority class given larger
Predicted no-delay Predicted short delay Predicted medium delay Predicted long delay
Actual no-delay 0 1 1 1
Actual short delay 2 0 1 1
Actual medium delay 3 2 0 1
Actual long delay 4 3 2 0
Table 6.19: Misclassification cost matrix C for cost-sensitive Random Forests
Table 6.19 basically is a cost matrix, C. It tells us, given a invoice is actually in
Class i, the penalty of classifying it into Class j is 0C . For example, given a invoice
has long delay, the penalty of classifying it into short delay is C42 = 3, which is larger
81
this cost matrix C to the learning. The resulting prediction accuracy is shown in Ta-
ble 6.20.
One can see from Table 6.20 that, the overall accuracy of Cost-sensitive Random
Forests is quite consistent with the original one. And we see the improvement on the
82
6.6 Model Robustness
We have also been able to test the performance of our Random Forests model in the
new two more month data.
Each of the two month datasets has similar volume of invoices, around 70k, and
exactly same information for each invoice. Therefore, we are able to update our model
parameters with new training data and test it again in the out-of-sample test data.
The result is shown in Table 6.21.
It shows that, the Random Forests model has a relatively consistent prediction
83
6.7 Why Random Forests Works
So far, the algorithm Random Forests has outperformed other classifiers in the invoice
prediction case. It agrees with the author's experience in machine learning - Random
Forests is often the winner for lots of problems in classification (usually slightly ahead
In a paper of 2014, Fernandez-Delgado, Cernadas & Barro [50] evaluated 179 classi-
fiers of machine learning in 121 data sets from UCI data base. They found out that the
classifiers most likely to be the bests are the Random Forests versions. which achieves
94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. While
the second best, the SVM with Gaussian kernel, was close, which achieved 92.3% of
There is no definite answer why Random Forests works well widely in various cases.
However, its good performance is believed to strongly associated with several of its
features below[48]:
" Random Forests can handle thousands of input variables without variable dele-
tion.
" It generates an internal unbiased estimate of the generalization error as the forest
building progresses.
" Prototypes are computed that give information about the relation between the
84
Chapter 7
Conclusion
7.1 Summary
This thesis discusses the analytics of business invoices and how to detect problematic
payments and customers. Companies issued thousands of business invoices per month,
but only a small proportion of them are paid in time. Machine learning is a natural
fit to leverage the power of business analytics to understand the pattern of invoice and
a Fortune 500 company. It showed that a large proportion of the invoices issued are
not paid in time. And we observe invoices with very different payments delays.
This thesis proposes a supervised (machine) learning approach to solve this problem
and presented the corresponding results in invoice payment prediction. It shows that
the machine learning algorithms can generate accurate prediction on the delay of the
I also build a set of aggregated features of business invoices which capture the char-
85
acteristics of the invoice and the customer it belongs to. It shows, although no single
feature of the invoice can reveal its payment outcome, the aggregate information is
powerful in helping us understand the invoice outcomes. Having this set of features
The algorithm Random Forests has been the best predictor in the invoice payment
delay problem so far. More than that, the thesis demonstrates that by using cost-
sensitive learning, we are able to improve prediction accuracy particularly for long
In addition, the thesis demonstrates the robustness of the machine learning model
applied in this problem. It receives consistent prediction accuracies across various in-
In general, this thesis does a comprehensive research on the invoice payment out-
come prediction, from data processing to prediction model building and calibrating. It
offers a framework to understand the business invoices and the customers they belong
to. It also provides an actionable knowledge for the industry people to adopt.
Based on the framework built by this thesis, there are several directions one can go
One way to improve the prediction accuracy of the machine learning model is to
86
As for now, the training dataset is build with one-month historical invoice data. It
is a relatively short period considering the data a company may have accumulate. If
one can access the data with longer history, he or she may be able to improve the pre-
diction accuracy significantly and find the seasonality pattern of the invoice payments.
Incorporating extra information of the customer, like its revenue and margin, is
another way to improve the training set. It is always helpful to give the machine more
information of the object, even it may seem to be not relevant at the first place.
The thesis explores the idea of using cost-sensitive learning to predict the invoice
payments. However, the misclassification error matrix, which is the core of this algo-
In this thesis, the misclassification error matrix, as defined in Table 6.19, gives a
qualitative approximation of the real cost. For example, Table 6.19 says the cost of
misclassifying long delay invoice into no-delay class is twice of the cost of misclassifying
into medium delay class. This multiplier (2x) is not necessarily true and may be
improved if one can incorporate more business sense. One could even define a dynamic
invoice grading to channelize work flow to maximize business value. That is saying,
Ideally, we would take actions on all the "bad" invoices, as their payments would
87
be late if we do nothing. However, there are always resource constraints that prevent
prioritize the invoices. Of course, the implicit assumption here is, taking an action on
A natural approach would be prioritize based on the prediction outcome, i.e. how
bad the delay would be. However, it goes more complicated as we set the objective to be
maximizing the revenue - we may need to weight the amount of the bill. Additionally,
an action may have different delinquency reduction on different clients. We surely need
models to quantify the result of collection actions. They are all worth modeling if we
88
Bibliography
[1] V. Mayer-Sch6nberger and K. Cukier, Big data: A revolution that will transform
how we live, work, and think. Houghton Mifflin Harcourt, 2013.
[2] G. Piatetsky-Shapiro, "Data mining and knowledge discovery 1996 to 2005: over-
coming the hype and moving from university to business and analytics," Data
Mining and Knowledge Discovery, vol. 15, no. 1, pp. 99-105, 2007.
[11] H.-C. Pfohl and M. Gomm, "Supply chain finance: optimizing financial flows in
supply chains," Logistics research, vol. 1, no. 3-4, pp. 149-161, 2009.
89
[12] A. Gunasekaran, C. Patel, and E. Tirtiroglu, "Performance measures and metrics
in a supply chain environment," Internationaljournal of operations & production
Management, vol. 21, no. 1/2, pp. 71-87, 2001.
[14] P. Kouvelis and W. Zhao, "Supply chain finance," The Handbook of Integrated
Risk Management in Global Supply Chains, pp. 247-288, 2011.
[16] "Top accounts receivable software products." http: //www. capterra. com/
accounts-receivable-software/. Accessed: 2015-04-30.
[17] A. Shen, R. Tong, and Y. Deng, "Application of classification models on credit card
fraud detection," in Service Systems and Service Management, 2007 International
Conference on, pp. 1-4, IEEE, 2007.
[20] J. Aizenman and Y. Jinjarak, "The collection efficiency of the value added tax:
Theory and international evidence," Journal of International Trade and Economic
Development, vol. 17, no. 3, pp. 391-410, 2008.
[21] S. Ghosh and D. L. Reilly, "Credit card fraud detection with a neural-network," in
System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International
Conference on, vol. 3, pp. 621-630, IEEE, 1994.
90
[24] J. R. Dorronsoro, F. Ginel, C. Sgnchez, and C. Cruz, "Neural fraud detection in
credit card operations," Neural Networks, IEEE Transactions on, vol. 8, no. 4,
pp. 827-834, 1997.
[26] A. F. Atiya, "Bankruptcy prediction for credit risk using neural networks: A
survey and new results," Neural Networks, IEEE Transactions on, vol. 12, no. 4,
pp. 929-935, 2001.
[27] J. H. Min and Y.-C. Lee, "Bankruptcy prediction using support vector machine
with optimal choice of kernel function parameters," Expert systems with applica-
tions, vol. 28, no. 4, pp. 603-614, 2005.
[28] K.-S. Shin, T. S. Lee, and H.-j. Kim, "An application of support vector machines
in bankruptcy prediction model," Expert Systems with Applications, vol. 28, no. 1,
pp. 127-135, 2005.
[29] Z. Huang, H. Chen, C.-J. Hsu, W.-H. Chen, and S. Wu, "Credit rating analysis
with support vector machines and neural networks: a market comparative study,"
Decision support systems, vol. 37, no. 4, pp. 543-558, 2004.
[30] J. W. Kim, H. R. Weistroffer, and R. T. Redmond, "Expert systems for bond rat-
ing: a comparative analysis of statistical, rule-based and neural network systems,"
Expert systems, vol. 10, no. 3, pp. 167-172, 1993.
[31] J. J. Maher and T. K. Sen, "Predicting bond ratings using neural networks: a
comparison with logistic regression," Intelligent Systems in Accounting, Finance
and Management, vol. 6, no. 1, pp. 59-72, 1997.
[32] J. Galindo and P. Tamayo, "Credit risk assessment using statistical and machine
learning: basic methodology and risk modeling applications," ComputationalEco-
nomics, vol. 15, no. 1-2, pp. 107-143, 2000.
[36] S. R. Safavian and D. Landgrebe, "A survey of decision tree classifier methodol-
ogy," 1990.
91
[37] A. Liaw and M. Wiener, "Classification and regression by randomforest," R news,
vol. 2, no. 3, pp. 18-22, 2002.
[381 Y. Freund, R. Schapire, and N. Abe, "A short introduction to boosting," Journal-
Japanese Society For Artificial Intelligence, vol. 14, no. 771-780, p. 1612, 1999.
&
Sons, 2004.
[40] J. A. Suykens and J. Vandewalle, "Least squares support vector machine classi-
fiers," Neural processing letters, vol. 9, no. 3, pp. 293-300, 1999.
[41] C. M. Bishop et al., Pattern recognition and machine learning, vol. 4. springer
New York, 2006.
[42] G. Ritsch, T. Onoda, and K.-R. Mnller, "Soft margins for adaboost," Machine
learning, vol. 42, no. 3, pp. 287-320, 2001.
[43] D. R. Cox, "The regression analysis of binary sequences," Journal of the Royal
Statistical Society. Series B (Methodological), pp. 215-242, 1958.
[44] S. Tong and D. Koller, "Support vector machine active learning with applications
to text classification," The Journal of Machine Learning Research, vol. 2, pp. 45-
66, 2002.
[46] J. Rice, Mathematical statistics and data analysis. Cengage Learning, 2006.
[47] C. Chen, A. Liaw, and L. Breiman, "Using random forest to learn imbalanced
data," 2005.
[48] L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5-32, 2001.
92