Bolton and Hand
Bolton and Hand
Bolton and Hand
17, No. 3 (Aug., 2002), pp. 235-249 Published by: Institute of Mathematical Statistics Stable URL: http://www.jstor.org/stable/3182781 Accessed: 25/09/2009 00:21
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=ims. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].
Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to Statistical Science.
http://www.jstor.org
Statistical
Fraud
Detection:
Review
Abstract. Fraud is increasing dramaticallywith the expansion of modem technology and the global superhighwaysof communication,resulting in the loss of billions of dollars worldwide each year. Although prevention technologies are the best way to reduce fraud, fraudsters are adaptive and, given time, will usually find ways to circumvent such measures. Methodologies for the detection of fraud are essential if we are to catch fraudstersonce fraudpreventionhas failed. Statistics and machine learning provide effective technologies for fraud detection and have been applied e-commercecredit successfully to detect activitiessuch as money laundering, card fraud,telecommunicationsfraudand computerintrusion,to name but a few. We describe the tools available for statistical fraud detection and the areasin which frauddetectiontechnologies are most used. Key words and phrases: Fraud detection, fraud prevention, statistics, machinelearning,money laundering, computerintrusion,e-commerce,credit cards, telecommunications. 1. INTRODUCTION The Concise Oxford Dictionary defines fraud as to "criminaldeception;the use of false representations an unjustadvantage." Fraudis as old as humanity gain itself and can take an unlimited variety of different forms. However, in recent years, the development of new technologies (which have made it easier for us to communicateandhelped increaseour spendingpower) has also providedyet furtherways in which criminals may commit fraud. Traditional forms of fraudulent behaviorsuch as money laundering have become easier to perpetrateand have been joined by new kinds of fraud such as mobile telecommunicationsfraud and computerintrusion. We begin by distinguishing between fraud prevention and fraud detection. Fraud prevention describes measures to stop fraud from occurringin the first place. These include elaborate designs, fluorescent fibers,multitonedrawings,watermarks, laminated metal strips and holographs on banknotes, personal RichardJ. Bolton is ResearchAssociate in the Statistics Section of the Departmentof Mathematicsat Imperial College. David J. Hand is Professor of Statistics in the Departmentof Mathematics at Imperial College, London SW7 2BZ, United Kingdom (e-mail: r.bolton, ic.ac.uk). d.j.hand@ 235 identificationnumbersfor bankcards,Internetsecurity systems for credit card transactions,SubscriberIdentity Module (SIM) cards for mobile phones, and passwords on computer systems and telephone bank accounts. Of course, none of these methods is perfect and,in general,a compromisehas to be struckbetween expense and inconvenience(e.g., to a customer)on the one hand, and effectiveness on the other. In contrast,frauddetectioninvolves identifyingfraud as quickly as possible once it has been perpetrated. Fraud detection comes into play once fraud prevention has failed. In practice, of course fraud detection must be used continuously,as one will typically be unaware that fraud preventionhas failed. We can try to preventcredit card fraudby guardingour cards assiduously, but if neverthelessthe card's details are stolen, then we need to be able to detect, as soon as possible, thatfraudis being perpetrated. Fraud detection is a continuously evolving discipline. Wheneverit becomes known that one detection method is in place, criminals will adapt their strategies and try others. Of course, new criminals are also constantlyenteringthe field. Many of them will not be awareof the frauddetectionmethods which have been successful in the past and will adopt strategieswhich lead to identifiablefrauds.This means that the earlier detection tools need to be applied as well as the latest developments.
236
R. J. BOLTONAND D. J. HAND
The development of new fraud detection methods is made more difficult by the fact that the exchange of ideas in fraud detection is severely limited. It does not make sense to describe frauddetection techniques in great detail in the public domain, as this gives criminals the informationthat they require to evade detection.Data sets are not made availableand results are often censored, making them difficult to assess (e.g., Leonard,1993). Many fraud detection problems involve huge data sets that are constantly evolving. For example, the carriesapproximately creditcardcompanyBarclaycard 350 million transactionsa year in the United Kingdom alone (Hand, Blunt, Kelly and Adams, 2000), The Royal Bank of Scotland, which has the largest credit card merchant acquiring business in Europe, carries over a billion transactionsa year and AT&T carriesaround275 million calls each weekday (Cortes and Pregibon, 1998). Processing these data sets in a search for fraudulenttransactionsor calls requires more than mere novelty of statistical model, and also needs fast and efficient algorithms:data mining techniques are relevant. These numbers also indicate the potential value of fraud detection: if 0.1% of a 100 million transactionsare fraudulent,each losing the companyjust ?10, then overall the company loses ?1 million. Statistical tools for fraud detection are many and varied, since data from different applications can be diverse in both size and type, but there are common themes. Such tools are essentially based on comparing the observed data with expected values, but expected values can be derived in various ways, depending on the context. They may be single numericalsummaries of some aspect of behavior and they are often simple graphical summariesin which an anomaly is readily but apparent, they are also often more complex (multibehaviorprofiles. Such behaviorprofiles may variate) be based on past behaviorof the system being studied (e.g., the way a bankaccounthas been previouslyused) or be extrapolatedfrom other similar systems. Things are often furthercomplicatedby the fact that, in some domains(e.g., tradingon the stock market)a given acmannersome of the time tormay behavein a fraudulent and not at othertimes. Statistical fraud detection methods may be superIn vised or unsupervised. supervisedmethods,samples recordsare used of both fraudulentand nonfraudulent to constructmodels which allow one to assign new observationsinto one of the two classes. Of course, this requiresone to be confident about the true classes of
the original data used to build the models. It also requires that one has examples of both classes. Furthermore, it can only be used to detect frauds of a type which have previouslyoccurred. In contrast,unsupervisedmethodssimply seek those accounts, customers and so forth which are most dissimilarfrom the norm.These can then be examined more closely. Outliersare a basic form of nonstandard observation.Tools used for checking data quality can be used, butthe detectionof accidentalerrorsis a rather different problem from the detection of deliberately falsified data or data which accurately describe a fraudulent pattern. This leads us to note the fundamentalpoint that we can seldom be certain, by statistical analysis alone, that a fraudhas been perpetrated. Rather,the analysis shouldbe regardedas alertingus to the fact that an observationis anomalous,or more likely to be fraudulent than others, so that it can then be investigatedin more detail. One can think of the objective of the statistical analysis as being to returna suspicion score (where we will regarda higher score as more suspicious than a lower one). The higher the score is, then the more unusualis the observationor the more like previously fraudulentvalues it is. The fact that there are many and different ways in which fraud can be perpetrated many differentscenarios in which it can occur means that there are many different ways to compute suspicion scores. Suspicion scores can be computedfor each recordin the database(for each customerwith a bankaccountor creditcard,for each owner of a mobile phone, for each desktopcomputerand so on), and these can be updated as time progresses. These scores can then be rank orderedand investigativeattentioncan be focussed on those with the highest scores or on those which exhibit a suddenincrease.Here issues of cost enter:given that it is too expensive to undertakea detailedinvestigation of all records,one concentratesinvestigationon those thoughtmost likely to be fraudulent. One of the difficulties with fraud detection is that typically there are many legitimate records for each fraudulentone. A detection method which correctly identifies 99% of the legitimate records as legitimate and 99% of the fraudulentrecordsas fraudulentmight be regardedas a highly effective system. However, if only 1 in 1000 recordsis fraudulent,then, on average, in every 100 that the system flags as fraudulent,only about 9 will in fact be so. In particular,this means thatto identifythose 9 requiresdetailedexaminationof all 100-at possibly considerablecost. This leads us to
237
a more general point: fraudcan be reducedto as low a level as one likes, but only by virtueof a corresponding level of effort and cost. In practice, some compromise has to be reached, often a commercial compromise, between the cost of detecting a fraud and the savings to be made by detecting it. Sometimes the issues are complicated by, for example, the adverse publicity accompanying fraud detection. At a business level, revealing that a bank is a significant target for fraud, even if much has been detected, does little to inspire confidence,and at a personallevel, takingactionwhich implies to an innocent customer that they may be suspected of fraud is obviously detrimentalto good customerrelations. The body of this paper is structuredaccording to different areas of fraud detection. Clearly we cannot hope to cover all areasin which statisticalmethodscan be applied.Instead,we have selected a few areaswhere such methods are used and where there is a body of expertise and of literaturedescribing them. However, before looking at the details of different application areas, Section 2 provides a brief overview of some tools for frauddetection. 2. FRAUDDETECTION TOOLS As we mentioned above, fraud detection can be supervised or unsupervised. Supervised methods use a database of known fraudulent/legitimate cases from which to construct a model which yields a suspicion score for new cases. Traditionalstatistical classification methods (Hand, 1981; McLachlan, 1992), such as linear discriminantanalysis and logistic discrimination, have proved to be effective tools for many applications, but more powerful tools (Ripley, 1996; Hand, 1997; Webb, 1999), especially neural networks, have also been extensively applied.Rule-basedmethods are supervisedlearningalgorithmsthatproduceclassifiers using rules of the form If {certain conditions}, Then {a consequent}. Examples of such algorithmsinclude BAYES (Clark and Niblett, 1989), FOIL (Quinlan, 1990) and RIPPER (Cohen, 1995). Tree-based algorithmssuch as CART (Breiman,Friedman,Olshen and Stone, 1984) and C4.5 (Quinlan, 1993) produceclassifiers of a similarform. Combinationsof some or all of these algorithmscan be createdusing meta-learing algorithmsto improvepredictionin frauddetection(e.g., Chan,Fan, Prodromidisand Stolfo, 1999). Major considerations when building a supervised tool for fraud detection include those of uneven class sizes and differentcosts of differenttypes of misclassification. We must also take into consideration the
costs of investigatingobservationsand the benefits of identifying fraud. Moreover, often class membership is uncertain.For example, credit transactionsmay be labelled incorrectly:a fraudulenttransactionmay remain unobservedand thus be labeled legitimate (and the extent of this may remain unknown) or a legitimate transactionmay be misreportedas fraudulent. Some work has addressedmisclassificationof training samples (e.g., Lachenbruch,1966, 1974; Chhikaraand McKeon, 1984), but not in the context of frauddetection as far as we are aware. Issues such as these were discussed by Chan and Stolfo (1998) and Provost and Fawcett(2001). Link analysis relates known fraudsters to other individuals using record linkage and social network methods (Wassermanand Faust, 1994). For example, in telecommunications networks,securityinvestigators have found that fraudstersseldom work in isolation from each other. Also, after an account has been disconnectedfor fraud,the fraudsterwill often call the same numbersfrom anotheraccount(Cortes,Pregibon and Volinsky,2001). Telephonecalls from an account can thus be linked to fraudulentaccounts to indicate intrusion.A similarapproachhas been takenin money laundering(Goldbergand Senator,1995, 1998; Senator et al., 1995). Unsupervisedmethods are used when there are no prior sets of legitimate and fraudulentobservations. Techniquesemployed here are usually a combination of profiling and outlier detection methods. We model a baseline distributionthat representsnormal behavior and then attemptto detect observationsthat show the greatest departurefrom this norm. There are similarities to authoridentificationin text analysis. Digit analysis using Benford's law is an example of such a method.Benford'slaw (Hill, 1995) says thatthe distribution of the first significantdigits of numbersdrawn from a wide variety of randomdistributionswill have (asymptotically)a certainform. Until recently,this law was regardedas merely a mathematicalcuriosity with no apparentuseful application.However, Nigrini and Mittermaier(1997) and Nigrini (1999) showed that Benford'slaw can be used to detect fraudin accounting data. The premise behind fraud detection using tools such as Benford's law is that fabricatingdata which conformto Benford'slaw is difficult. Fraudstersadapt to new prevention and detection measures,so frauddetection needs to be adaptiveand evolve over time. However, legitimate account users may gradually change their behavior over a longer period of time and it is importantto avoid spurious
238
R. J. BOLTONAND D. J. HAND
alarms.Models can be updatedat fixed time points or continuously over time; see, for example, Burge and Shawe-Taylor(1997), Fawcett and Provost (1997a), Cortes, Pregibon and Volinsky (2001) and Senator (2000). Althoughthe basic statisticalmodels for frauddetection can be categorizedas supervisedor unsupervised, the applicationareas of fraud detection cannot be described so conveniently.Their diversity is reflected in their particularoperationalcharacteristicsand the variety and quantityof data available,both featuresthat drivethe choice of a suitablefrauddetectiontool. 3. CREDIT CARDFRAUD The extent of credit card fraud is difficult to quantify, partly because companies are often loath to release fraud figures in case they frighten the spending public and partly because the figures change (probably grow) over time. Various estimates have been given. For example,Leonard(1993) suggestedthe cost of Visa/Mastercardfraud in Canada in 1989, 1990 and 1991 was $19, 29 and 46 million (Canadian),respectively. Ghosh and Reilly (1994) suggested a figure of $850 million (U.S.) per year for all types of credit card fraud in the United States, and Aleskerov, FreislebenandRao (1997) cited estimatesof $700 million in the United States each year for Visa/Mastercard and $10 billion worldwide in 1996. Microsoft's Expedia set aside $6 million for credit card fraud in 1999 (Patient,2000). Total losses throughcredit card fraud in the United Kingdomhave been growingrapidlyover the last 4 years [1997, ?122 million; 1998, ?135 million; 1999, ?188 million; 2000, ?293 million. Source: Association for Payment Clearing Services, London (APACS)] and recently APACS reported?373.7 million losses in the 12 months ending August 2001. Jenkins (2000) says "for every ?100 you spend on a Mattersare card in the UK, 13p is lost to fraudsters." issues of exactly what one includes in complicatedby fraudarises the fraudfigures.For example,bankruptcy makespurchasesfor which he/she when the cardholder has no intentionof paying and then files for personal bankruptcy, leaving the bank to cover the losses. Since these are generally regardedas charge-offlosses, they often are not included in fraudfigures. However,they can be substantial:Ghosh and Reilly (1994) cited one fraudin 1992. estimateof $2.65 billion for bankruptcy It is in a company and card issuer's interests to prevent fraud or, failing this, to detect fraud as soon as possible. Otherwiseconsumertrustin both the card
and the company decreases and revenue is lost, in addition to the direct losses made throughfraudulent sales. Because of the potentialfor loss of sales due to loss of confidence, in general, the merchantsassume responsibilityfor fraud losses, even when the vendor has obtainedauthorization from the cardissuer. Creditcardfraudmay be perpetrated variousways in of the credit card industry and how it (a description functionsis given in Blunt and Hand, 2000), including simple theft, applicationfraud and counterfeitcards. In all of these, the fraudsteruses a physical card, but credit physical possession is not essential to perpetrate cardfraud:one of the majorfraudareasis "cardholdernot-present"fraud, where only the card's details are given (e.g., over the phone). Use of a stolen card is perhapsthe most straightforward type of credit card fraud.In this case, the fraudster typically spends as much as possible in as short a space of time as possible, before the theft is detected and the cardis stopped;hence, detectingthe theft early can preventlarge losses. Application fraud arises when individuals obtain new credit cards from issuing companies using false personal information. Traditional credit scorecards (Hand and Henley, 1997) are used to detect customers who are likely to default, and the reasons for this may include fraud. Such scorecards are based on the details given on the applicationforms and perhapsalso on other details such as bureauinformation.Statistical models which monitorbehaviorover time can be used to detect cardswhich have been obtainedfrom a fraudulent application(e.g., a firsttime cardholderwho runs out and rapidly makes many purchasesshould arouse suspicion). With applicationfraud, however, urgency is not as importantto the fraudsterand it might not be until accountsare sent out or repaymentdates begin to pass thatfraudis suspected. fraudoccurs when the transCardholder-not-present action is made remotely,so that only the card's details are needed, and a manual signatureand card imprint are not requiredat the time of purchase.Such transactions include telephone sales and on-line transactions, of and this type of fraudaccountsfor a high proportion such fraudit is necessaryto obtain losses. To undertake the details of the card without the cardholder'sknowledge. This is done in various ways, including "skimming," where employees illegally copy the magnetic strip on a credit card by swiping it through a small handheld card reader, "shoulder surfers,"who enter card details into a mobile phone while standing bein hind a purchaser a queue,andpeople posing as credit
239
card company employees taking details of credit card transactionsfrom companies over the phone. Counterfeit cards, currentlythe largest source of credit card fraud in the United Kingdom (source: APACS), can also be created using this information. Transactions made by fraudstersusing counterfeitcards and making cardholder-not-present purchases can be detected through methods which seek changes in transaction patterns, as well as checking for particularpatterns which are known to be indicativeof counterfeiting. Credit card databases contain informationon each transaction.This informationincludes such things as merchantcode, account number,type of credit card, type of purchase,client name, size of transactionand date of transaction.Some of these data are numerical (e.g., transactionsize) and others are nominal categorical (e.g., merchantcode, which can have hundredsof thousandsof categories) or symbolic. The mixed data types have led to the applicationof a wide variety of statistical,machine learningand data mining tools. Suspicion scores to detect whether an account has been compromisedcan be based on models of individual customers' previous usage patterns, standardexpected usage patterns, particularpatterns which are known to be often associated with fraud, and on supervised models. A simple example of the patternsexhibitedby individualcustomersis given in Figure 16 of HandandBlunt (2001), which shows how the slopes of cumulativecredit card spending over time are remarkably linear. Sudden jumps in these curves or sudden changes of slope (transactionor expenditurerate suddenly exceeding some threshold) merit investigation. Likewise, some customers practice "jam jarring"restrictingparticularcards to particulartypes of purchases (e.g., using a given card for petrol purchases only and a different one for supermarket purchases), so thatusage of a cardto make an unusualtype of purchase can trigger an alarm for such customers. At a more general level, suspicion scores can also be based on expected overall usage profiles. For example, first time credit card users are typically initially fairly tentative in their usage, whereas those transferring loans from another card are generally not so reticent. Finally, examples of overall transactionpatternsknown to be intrinsicallysuspicious are the sudden purchase of many small electricalitems orjewelry (goods which permiteasy black marketresale) andthe immediateuse of a new card in a wide range of differentlocations. We commented above that, for obvious reasons, there is a dearth of published literatureon fraud detection. Much of that which has been published appears in the methodological data analytic literature,
where the aim is to illustratenew dataanalytictools by applying them to the detection of fraud, rather than to describe methods of fraud detection per se. Furthermore,since anomaly detection methods are very context dependent, much of the published literature in the area concentrates on supervised classification methods. In particular,rule-basedsystems and neural networkshave attracted interest.Researcherswho have used neural networks for credit card fraud detection include Ghosh and Reilly (1994), Aleskerov et al. (1997), Dorronsoro,Ginel, Sanchez and Cruz (1997) and Brause,Langsdorfand Hepp (1999), mainly in the contextof supervisedclassification.HNC Softwarehas developedFalcon, a softwarepackage thatrelies heavily on neuralnetworktechnology to detect credit card fraud. Supervisedmethods, using samples from the fraudulent/nonfraudulent classes as the basis to construct classificationrules to detect futurecases of fraud,suffer from the problem of unbalancedclass sizes mentioned above: the legitimate transactionsgenerally far outnumber fraudulentones. Brause,Langsdorfand the Hepp (1999) said that, in their database of credit card transactions, "the probability of fraud is very low (0.2%) and has been lowered in a preprocessing step by a conventional fraud detecting system down to 0.1%."Hassibi (2000) remarkedthat "out of some 12 billion transactionsmade annually,approximately 10 million-or one out of every 1200 transactionsturnout to be fraudulent.Also, 0.04% (4 out of every 10,000) of all monthly active accountsare fraudulent." It follows from this sort of figure that simple misclassification rate cannot be used as a performance measure:with a bad rate of 0.1%, simply classifying every transactionas legitimate will yield an errorrate of only 0.001. Instead, one must either minimize an appropriate cost-weighted loss or fix some parameter (such as the numberof cases one can affordto investigate in detail) and then try to maximize the numberof fraudulent cases detected subjectto the constraints. Stolfo et al. (1997a, b) outlined a meta-classifier system for detecting credit card fraud that is based on the idea of using different local fraud detection tools within each differentcorporateenvironmentand merging the results to yield a more accurate global tool. This work was elaborated in Chan and Stolfo (1998), Chan,Fan, Prodromidisand Stolfo (1999) and Stolfo et al. (1999), who described a more realistic cost model to accompany the different classification outcomes. Wheeler and Aitken (2000) also explored the combinationof multipleclassificationrules.
R. J. BOLTONAND D. J. HAND
Money launderingis the process of obscuring the source, ownership or use of funds, usually cash, that arethe profitsof illicit activity.The size of the problem is indicated in a 1995 U.S. Office of Technology Assessment (OTA) report (U.S. Congress, 1995): "Federal agencies estimate that as much as $300 billion is laundered annually, worldwide. From $40 billion to $80 billion of this may be drug profits made in the United States."Preventionis attempted means of leby constraintsandrequirements-the burdenof which gal is graduallyincreasing-and there has been much debate recently aboutthe use of encryption.However,no preventionstrategyis foolproof and detectionis essential. In particular, September 11th terroristattacks the on New YorkCity and the Pentagonhave focused attention on the detection of money laundering in an networksof funds. attemptto starveterrorist Wire transfersprovidea naturaldomainfor laundering: according to the OTA report, each day in 1995 abouthalf a million wire transfers,valued at more than $2 trillion (U.S.), were carriedout using the Fedwire and CHIPS systems, along with almost a quarterof a million transfersusing the SWIFT system. It is estimated that around0.05-0.1% of these transactionsinvolved laundering.Sophisticated statistical and other on-line data analytic proceduresare needed to detect such launderingactivity.Since it is now becoming a leto gal requirement show thatall reasonablemeanshave been used to detect fraud, we may expect to see even greaterapplicationof such tools. Wire transferscontainitems such as date of transfer, identityof sender,routingnumberof originatingbank, identity of recipient,routingnumberof recipientbank and amount transferred.Sometimes those fields not needed for transferare left blank, free text fields may be completed in different ways and, worse still, but inevitable, sometimes the data have errors.Automatic error detection (and correction) software has been developed,based on semanticand syntacticconstraints on possible content, but, of course, this can never be a complete solution. Matters are also complicated by the fact that banks do not share their data. Of course, banks are not the only bodies that transfer money electronically,and other businesses have been establishedprecisely for this purpose [the OTA report (U.S. Congress, 1995) estimates the number of such businesses as 200,000]. The detectionof money launderingpresentsdifficulties not encounteredin areas such as, for example, the
credit card industry.Whereascredit card fraud comes to light fairly early on, in money launderingit may be years before individualtransfersor accountsare definitively and legally identified as part of a laundering process. While, in principle (assuming records have been kept), one could go back and trace the relevant in transactions, practicenot all of them would be idenso detractingfrom their use in supervised detified, tection methods. Furthermore, there is typically less extensive informationavailable for the account holders in investmentbanks than there is in retail banking operations.Developing more detailedcustomerrecord systems might be a good way forward. As with other areas of fraud, money laundering detectionworkshandin handwith prevention.In 1970, for example, in the United States the Bank Secrecy Act requiredthatbanksreportall currencytransactions of over $10,000 to the authorities. However, also as in other areas of fraud, the perpetratorsadapt their modus operandi to match the changing tactics of the authorities. So, following the requirementof banks to reportcurrencytransactionsof over $10,000, the obvious strategy was developed to divide larger sums into multiple amounts of less than $10,000 and deposit them in different banks (a practice termed smurfingor structuring).In the United States, this is now illegal, but the way the money launderersadapt to the prevailing detection methods can lead one to the pessimistic perspectivethat only the incompetent money launderers are detected. This, clearly, also limits the value of supervised detection methods: the patterns detected will be those patterns which were characteristicof fraud in the past, but which may no longer be so. Other strategies used by money which limit the value of supervisedmethods launderers include switching between wire and physical cash movements, the creation of shell businesses, false invoicing and, of course, the fact that a single transfer, in itself, is unlikely to appear to be a laundering transaction.Furthermore,because of the large sums are involved,money launderers highly professionaland often have contacts in the banks who can feed back details of the detection strategiesbeing applied. The numberof currencytransactionsover $10,000 in value increased dramaticallyafter the mid-1980s, to the extent that the numberof reportsfiled is huge (over 10 million in 1994, with total worth of around $500 billion), and this in itself can cause difficulties. In an attemptto cope with this, the Financial Crimes EnforcementNetwork (FinCEN) of the U.S. Department of the Treasuryprocesses all such reportsusing
241
the FinCEN artificial intelligence system (FAIS) describedbelow. More generally,banks are also required to reportany suspicious transactions,and about 0.5% of currencytransaction reportsare so flagged. involves three steps: Money laundering 1. Placement: the introductionof the cash into the banking system or legitimate business (e.g., transferring the banknotes obtained from retail drugs transactions into a cashier's cheque). One way to do this is to pay vastly inflated amounts for goods imported across internationalfrontiers. Pak and Zdanowicz (1994) described statistical analysis of trade databases to detect anomalies in governmenttrade data such as charging$1694 a gram for importsof the drugerythromycin comparedwith $0.08 a gramfor exports. 2. Layering: carrying out multiple transactions throughmultiple accounts with differentowners at different financial institutions in the legitimate financial system. 3. Integration: mergingthe fundswith money obtained from legitimate activities. Detection strategies can be targetedat various levels. In general (and in common with some other areas in which fraudis perpetrated), is very difficultor it impossible to characterizean individualtransactionas fraudulent.Rather transactionpatterns must be identified as fraudulentor suspicious. A single deposit of just under$10,000 is not suspicious,but multiple such deposits are; a large sum being deposited is not suspicious, but a large sum being deposited and instantly withdrawn In fact, one can distinguishseverallevels is. of (potential)analysis:the individualtransactionlevel, the accountlevel, the business level (and, indeed, individuals may have multiple accounts) and the "ring"of businesses level. Analyses can be targetedat particular levels, but more complex approachescan examine several levels simultaneously.(There is an analogy here with speech recognition systems: simple systems focused at the individual phoneme and word levels are not as effective as those which try to recognize these elements in a higher level context of the way words are put together when used.) In general, link analysis, which identifies groups of participantsinvolved in transactions,plays a key role in most money laundering detection strategies. Senator et al. (1995) said "Money launderingtypically involves a multitude of transactions,perhapsby distinct individuals,into multiple accounts with differentowners at differentbanks andotherfinancialinstitutions.Detection of large-scale
money launderingschemes requires the ability to reconstructthese patternsof transactionsby linking potentially related transactionsand then to distinguish the legitimatesets of transactionsfrom the illegitimate ones. This technique of finding relationshipsbetween elements of information, called linkanalysis, is the primary analytic technique used in law enforcementintelligence (Andrews and Peterson, 1990)." An obvious and simplistic illustrationis the fact that a transaction with a known criminalmay rouse suspicion.More subtle methods are based on recognition of the sort of businesses with which money launderingoperations transact.Of course, these are all supervised methods and are subject to the weaknesses that those responsible may evolve their strategies.Similar tools are used to detect telecom fraud, as outlined in the following section. Rule-based systems have been developed, often with the rules based on experience ("flag transactions from countries X and Y"; "flag accounts showing a large deposit followed immediately by a similar sized withdrawal").Structuringcan be detected by computing the cumulative sum of amounts entering an account over a short window, such as a day. Other methodshave been developedbased on straightforward descriptive statistics, such as rate of transactionsand proportionof transactionswhich are suspicious. The use of the Benford distributionis an extension of this idea. Although one may not usually be interested in detecting changes in an account's behavior, methods such as peer group analysis (Bolton and Hand, 2001) and breakdetection (Goldbergand Senator, 1997) can be appliedto detect money laundering. One of the most elaboratemoney launderingdetection systems is the U.S. FinancialCrimesEnforcement Network AI system (FAIS) describedin Senatoret al. (1995) and Goldbergand Senator(1998). This system allows users to follow trails of linked transactions.It is built arounda "blackboard" architecture,in which modules can read and write to a centraldataprogram base that contains details of transactions,subjects and accounts. A key componentof the system is its suspicion score. This is a rule-basedsystem based on an earlier system developed by the U.S. Customs Service in the mid-1980s. The system computes suspicion scores for various differenttypes of transactionand activity. Simple Bayesian updatingis used to combine evidence that suggests that a transactionor activity is illicit to yield an overall suspicion score. Senatoret al. (1995) included a brief but interestingdiscussion of an investigation of whether case-based reasoning (cf. nearest
242
R. J. BOLTONAND D. J. HAND
neighbor methods) and classification tree techniques could usefully be addedto the system. The American National Association of Securities Dealers,Inc., uses an advanceddetectionsystem(ADS; Kirklandet al., 1998; Senator,2000) to flag "patterns or practices of regulatoryconcern."ADS uses a rule patternmatcherand a time-sequence patternmatcher, and (like FAIS) places greatemphasison visualization tools. Also as with FAIS, data mining techniques are used to identify new patternsof potentialinterest. A different approach to detecting similar fraudulent behavior is taken by SearchSpace Ltd. (www. searchspace.com),which has developed a system for the LondonStock Exchangecalled MonITARS(monitoring insider tradingand regulatorysurveillance)that combines genetic algorithms, fuzzy logic and neural networktechnology to detect insider dealing and market manipulation.Chartierand Spillane (2000) also described an applicationof neural networks to detect money laundering. FRAUD 5. TELECOMMUNICATIONS The telecommunications industryhas expandeddrain the last few years with the development matically of affordablemobile phone technology. With the increasing number of mobile phone users, global mobile phone fraud is also set to rise. Variousestimates have been presentedfor the cost of this fraud.For example, Cox, Eick, Wills and Brachman(1997) gave a figureof $1 billion a year. Telecomand NetworkSecurityReview [4(5) April 1997] gave a figure of between 4 and 6% of U.S. telecom revenue lost due to fraud. Cahill, Lambert,Pinheiro and Sun (2002) suggested that international figures are worse, with "severalnew service providersreportinglosses over 20%."Moreau et al. (1996) gave a value of "severalmillion ECUs per year."Presumablythis refers to within the European Union and, given the size of the other estimates, we wonder if this should be billions. According to a recent report(NeuralTechnologies, 2000), "theindustry alreadyreportsa loss of ?13 billion each year due to Mobile Europe(2000) gave a figure of $13 bilfraud." lion (U.S.). The latterarticle also claimed that it is estimatedthat fraudsterscan steal up to 5% of some operators'revenues, and that some expect telecom fraud as a whole to reach$28 billion per year within 3 years. Despite the variety in these figures, it is clear that they are all very large. Apart from the fact that they are simply estimates, and hence subject to expected inaccuraciesand variabilitybased on the information
used to derive them, there are other reasons for the differences. One is the distinction between hard and soft currency.Hard currency is real money, paid by someone other than the perpetratorfor the service has the perpetrator stolen. Hynninen (2000) gave the of the sum one mobile phone operatorwill example pay anotherfor the use of theirnetwork.Soft currency has is the value of the service the perpetrator stolen. At least part of this is only a loss if one assumes that the thief would have used the same service even if he or she had had to pay for it. Another reason for the differences derives from the fact that such estimates may be used for differentpurposes. Hynninen (2000) gave the examples of operators giving estimates on the high side, hoping for more stringent antifraud legislation, and operatorsgiving estimates on the low side to encouragecustomerconfidence. We need to distinguish between fraud aimed at the service providerand fraud enabled by the service provider. An example of the former is the resale of stolen call time and an example of the latter is interferingwith telephone banking instructions.(It is the possibility of the latter sort of fraud which makes the public wary of using their credit cards over the Interet.) We can also distinguish between revenue fraudand nonrevenuefraud.The aim of the formeris while the aim of the to makemoney for the perpetrator, latter is simply to obtain a service free of charge (or, as with computerhackers, e.g., the simple challenge by represented the system). Thereare manydifferenttypes of telecom fraud(see, e.g., Shawe-Tayloret al., 2000) and these can occur at various levels. The two most prevalenttypes are subscription fraud and superimposedor "surfing"fraud. obtainsa Subscriptionfraudoccurs when the fraudster subscriptionto a service, often with false identity details, with no intention of paying. This is thus at the level of a phone number-all transactionsfrom this numberwill be fraudulent.Superimposedfraud is the use of a service withouthavingthe necessaryauthority and is usually detected by the appearanceof phantom calls on a bill. There are several ways to carryout superimposedfraud,includingmobile phone cloning and obtaining calling card authorizationdetails. Superimposed fraud will generally occur at the level of indicalls will be mixed in with vidualcalls-the fraudulent the legitimate ones. Subscriptionfraud will generally be detectedat some point throughthe billing processalthoughthe aim is to detect it well before that, since large costs can quickly be run up. Superimposedfraud can remainundetectedfor a long time. The distinction
243
between these two types of fraudfollows a similardistinction in credit card fraud. Other types of telecom fraud include "ghosting" (technology thattricks the networkso as to obtainfree calls) and insider fraud, where telecom company employees sell informationto criminals that can be exploited for fraudulentgain. This, of course, is a universal cause of fraud, whateverthe domain. "Tumbling" is a type of superimposedfraud in which rolling fake serial numbers are used on cloned handsets, so that successive calls are attributedto different legitimate phones. The chance of detection by spotting unusual patternsis small and the illicit phone will operateuntil all of the assumed identitieshave been spotted.The term "spoofing"is sometimes used to describe users pretendingto be someone else. networksgeneratevast quantiTelecommunications ties of data, sometimes on the order of several gigabytes per day, so that data mining techniques are of importance.The 1998 databaseof AT&T,for particular contained350 million profiles and processed example, 275 million call recordsper day (Cortes and Pregibon, 1998). As with other fraud domains, apartfrom some domain specific tools, methodsfor detectionhinge around outlier detection and supervised classification, either using rule-basedmethods or based on comparingstatistically derived suspicion scores with some threshold. At a low level, simple rule-based detection systems use rules such as the apparentuse of the same phone in two very distant geographical locations in quick succession, calls which appearto overlapin time, and very high value and very long calls. At a higher level, statistical summaries of call distributions(often called profiles or signatures at the user level) are comparedwith thresholdsdeterminedeitherby experts or by application of supervised learning methods to known fraud/nonfraud cases. Muradand Pinkas(1999) and Rosset et al. (1999) distinguishedbetween profiling at the levels of individualcalls, daily call patterns and overall call patterns,and describedwhat are effectively outlier detection methods for detecting anomalous behavior. A particularlyinteresting description of profilingmethodswas given by Cortesand Pregibon (1998). Cortes,Fisher,PregibonandRogers (2000) described the Hancock language for writing programs for processing profiles, basing the signatureson such quantities as average call duration,longest call duration, number of calls to particularregions in the last day and so on. Profiling and classification techniques
also were describedby Fawcettand Provost(1997a, b, 1999) and Moreau, Verrelst and Vandewalle (1997). Some work (see, e.g., Fawcettand Provost, 1997a) has focused on detectingchanges in behavior. A generalcomplicationis thatsignaturesand thresholds may need to depend on time of day, type of account and so on, and that they will probablyneed to be updated over time. Cahill et al. (2002) suggested excluding the very suspicious scores in this updating process, althoughmore work is needed in this area. Once again, neuralnetworkshave been widely used. The main fraud detection software of the Fraud Solutions Unit of Nortel Networks (Nortel, 2000) uses a combinationof profiling and neuralnetworks.Likewise, ASPeCT (Moreau et al., 1996; Shawe-Taylor et al., 2000), a project of the European Commission, Vodaphone, other European telecom companies and academics, developed a combined rule-based profiling and neural network approach. Taniguchi, Haft, Hollmen and Tresp (1998) describedneuralnetworks, mixture models and Bayesian networks in telecom fraud detection based on call records stored for billing. Link analysis, with links updatedover time, establishes the "communities of interest" (Cortes, Pregibon and Volinsky,2001) that can indicate networksof fraudsters. These methodsarebased on the observation that fraudstersseldom change their calling habits, but are often closely linked to otherfraudsters.Using similar patternsof transactionsto infer the presence of a fraudsteris in the spirit of phenomenaldata particular 2000). mining (McCarthy, Visualizationmethods (Cox et al., 1997), developed for mining very large data sets, have also been developed for use in telecom fraud detection. Here human patternrecognition skills interactwith graphicalcomputer display of quantities of calls between different subscribersin variousgeographicallocations. A possible futurescenario would be to code into software the patternswhich humansdetect. The telecom marketwill become even more complicated over time-with more opportunityfor fraud.At present the extent of fraud is measured by considering factors such as call lengths and tariffs. The third generationof mobile phone technology will also need to take into account such things as the content of the calls (becauseof the packetswitchingtechnologyused, equally long data transmissionsmay contain very different numbersof data packets) and the priorityof the call.
R. J. BOLTONAND D. J. HAND
On Thursday,September 21, 2000, a 16-year-old boy was jailed for hacking into both the Pentagonand NASA computersystems. Between the 14th and 25th of October2000 Microsoft securitytrackedthe illegal activity of a hacker on the Microsoft CorporateNetwork. These examples illustratethat even exceptionally well protected domains can have their computer securitycompromised. Computerintrusionfraud is big business and computer intrusiondetection is a hugely intensive area of research.Hackerscan find passwords,read and change files, alter source code, read e-mails and so on. Denning (1997) listed eight kinds of computerintrusion.If the the hackerscan be preventedfrompenetrating comthen such putersystem or can be detectedearlyenough, crime can be virtuallyeliminated.However,as with all fraudwhen the prizes are high, the attacksare adaptive and once one kind of intrusionhas been recognizedthe hackerwill try a differentroute. Because of its importance, a great deal of effort has been put into developing intrusiondetection methods, and there are several commercialproductsavailable,includingCisco secure intrusion detection system (CSIDS, 1999) and nextgenerationintrusiondetection expert system (NIDES; Anderson,Frivoldand Valdes, 1995). Since the only record of a hacker's activities is the sequence of commandsthat is used when compromising the system, analystsof computerintrusiondatapredominantlyuse sequence analysis techniques.As with other fraud situations, both supervised and unsupervised methods are used. In the context of intrusiondetection, supervisedmethods are sometimes called misuse detection, while the unsupervisedmethods used are generally methods of anomaly detection, based on profiles of usage patternsfor each legitimate user. Supervised methods have the problemdescribedin other contexts, that they can, of course, only work on intrusion patternswhich have already occurred(or partial matches to these). Lee and Stolfo (1998) appliedclassification techniques to data from a user or program that has been identifiedas either normal or abnormal. et Lippmann al. (2000) concludedthatemphasisshould be placed on developing methods for detecting new patternsof intrusionratherthan old patterns,but Kumar and Spafford(1994) remarkedthat "a majorityof break-ins... are the resultof a small numberof known attacks, as evidenced by reportsfrom response teams (e.g., CERT). Automating detection of these attacks should thereforeresult in the detection of a significant
numberof break-inattempts." Shieh and Gligor (1991, describeda pattern-matching methodandargued 1997) that it is more effective than statisticalmethods at detecting known types of intrusion,but is unable to detect novel kinds of intrusionpatterns,which could be detectedby statisticalmethods. Since intrusionrepresentsbehavior and the aim is to distinguish between intrusion behavior and usual behaviorin sequences, Markovmodels have naturally been applied (e.g., Ju and Vardi, 2001). Qu et al. (1998) also used probabilities of events to define the profile. Forrest,Hofmeyr,Somayaji and Longstaff (1996) described a method based on how natural immune systems distinguish between self and alien patterns. As with telecom data, both individual user patterns and overall network behavior change over time, so that a detection system must be able to adapt to changes, but not adaptso rapidlythat it also accepts intrusions as legitimate changes. Lane and Brodley (1998) and Kosoresow and Hofmeyr (1997) also used similarity of sequences that can be interpretedin a probabilisticframework. Inevitably,neural networks have been used: Ryan, Lin and Miikkulainen(1997) performedprofiling by training a neural network on the process data and also referencedother neuralapproaches.In one of the more careful studies in the area, Schonlauet al. (2001) described a comparative study of six statistical approaches for detecting impersonationof other users where they took real usage data from (masquerading), 50 users and planted contaminatingdata from other users to serve as the masquerade targetsto be detected. A nice overview of statisticalissues in computerintrusion detectionwas given by Marchette(2001), and the Networks[34(4)] is October2000 edition of Computer a special issue on (relatively)recent advancesin intrusion detection systems, including several examples of new approachesto computerintrusiondetection.
7. MEDICALAND SCIENTIFICFRAUD
Medical fraudcan occur at variouslevels. It can occur in clinical trials(see, e.g., Buyse et al., 1999). It can also occur in a more commercialcontext:for example, prescriptionfraud, submittingclaims for patients who are dead or who do not exist, and upcoding, where a doctor performsa medical procedure,but charges the insurerfor one thatis more expensive, or perhapsdoes not even performone at all. Allen (2000) gave an example of bills submittedfor more than 24 hours in a working day. He, Wang, Graco and Hawkins (1997)
245
and He, Graco and Yao (1999) described the use of neuralnetworks,genetic algorithmsand nearestneighbor methods to classify the practiceprofiles of general practitionersin Australiainto classes from normal to abnormal. Medical fraud is often linked to insurance fraud: Terry Allen, a statistician with the Utah Bureau of Medicaid Fraud, estimated that up to 10% of the $800 million annual claims may be stolen (Allen, 2000). Major and Riedinger (1992) created a knowledge/statistical-based system to detect healthcare fraud by comparing observations with those with which they should be most similar (e.g., having similar geodemographics).Brockett,Xia andDerrig(1998) used neural networks to classify fraudulentand nonfraudulent claims for automobile bodily injury in healthcare insurance claims. Glasgow (1997) gave a short discussion of risk and fraud in the insuranceindustry.A glossary of several of the differenttypes of medical fraud is available at http://www.motherjones. com/mother_jones/MA95/davis2.html. Of course, medicine is not the only scientific area where data have sometimes been fabricated,falsified or carefully selected to supporta pet theory.Problems of fraud in science are attractingincreased attention, but they have always been with us: errant scientists have been known to massage figuresfrom experiments to push throughdevelopment of a productor reach a magical significance level for a publication. Dmitriy Yuryev described such a case on his webpages at Moreover, there http://www.orc.ru/~yur77/statfr.htm. are many classical cases in which the data have been suspected of being massaged (including the work of Galileo, Newton, Babbage, Kepler, Mendel, Millikan and Burt). Press and Tanur(2001) presented a fascinating discussion of the role of subjectivityin the scientific process, illustratingwith many examples. The borderlinebetween subconsciousselection of data and out-and-outdistortionis a fine one. 8. CONCLUSIONS The areas we have outlined are perhaps those in which statistical and other data analytic tools have made the most impact on frauddetection.This is typically because there are large quantitiesof information, and this informationis numericalor can easily be convertedinto the numericalin the form of counts andproportions. However, other areas, not mentioned above, have also used statisticaltools for fraud detection. Irregularitiesin financialstatementscan be used to detect
accountingand managementfraudin contexts broader than those of money laundering.Digit analysis tools have found favor in accountancy (e.g., Nigrini and Mittermaier, 1997; Nigrini, 1999). Statisticalsampling methods are importantin financial audit, and screening tools are appliedto decide which tax returnsmerit detailed investigation.We mentioned insurancefraud in the context of medicine, but it clearly occurs more widely. Artis, Ayuso and Guillen (1999) described an approachto modelling fraudbehaviorin car insurance, and Fanning,Cogger and Srivastava(1995) and Green and Choi (1997) examined neural network classification methods for detecting managementfraud. Statistical tools for fraud detection have also been applied to sporting events. For example, Robinson and Tawn (1995), Smith (1997) and Barao and Tawn (1999) examinedthe results of runningevents to see if some exceptional times were out of line with what might be expected. Plagiarismis also a type of fraud.We brieflyreferred to the use of statistical tools for author verification and such methods can be applied here. However, statistical tools can also be applied more widely. For example, with the evolution of the Internetit is extremely easy for students to plagiarize articles and pass them off as their own in school or university coursework. The website http://www.plagiarism.org describes a system that can take a manuscript and compare it against their "substantial database" of articles from the Web. A statistical measure of the originalityof the manuscriptis returned. As we commented in the Introduction,frauddetection is a post hoc strategy,being applied after fraud preventionhas failed. Statisticaltools are also applied in some fraud preventionmethods. For example, socalled biometricmethodsof frauddetection are slowly becoming more widespread.These include computerized fingerprint retinalidentification,and also face and recognition (althoughthis has received most publicity in the context of recognizingfootball hooligans). In manyof the applicationswe have discussed, speed of processing is of the essence. This is particularly the case in transaction with telecom processing,especially and intrusiondata, where vast numbersof records are processed every day, but also applies in credit card, bankingand retail sectors. A key issue in all of this work is how effective the statisticaltools are in detectingfraudand a fundamental problem is that one typically does not know how many fraudulentcases slip throughthe net. In applications such as banking fraud and telecom fraud,where
246
R. J. BOLTON AND D. J. HAND BARAO, M. I. and TAWN, J. A. (1999). Extremal analysis of short series with outliers: Sea-levels and athletics records. Appl. Statist. 48 469-487. BLUNT, G. and HAND, D. J. (2000). The UK credit card market. Technical report, Dept. Mathematics, Imperial College, London. BOLTON, R. J. and HAND, D. J. (2001). Unsupervised profiling methods for fraud detection. In Conference on Credit Scoring and Credit Control 7, Edinburgh, UK, 5-7 Sept. BRAUSE, R., LANGSDORF, T. and HEPP, M. (1999). Neural data mining for credit card fraud detection. In Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence 103-106. IEEE Computer Society Press, Silver Spring, MD. BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A. and STONE, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA. BROCKETT, P. L., XIA, X. and DERRIG, R. A. (1998). Using Kohonen's self-organising feature map to uncover automobile bodily injury claims fraud. The Journal of Risk and Insurance 65 245-274. BURGE, P. and SHAWE-TAYLOR, J. (1997). Detecting cellular fraud using adaptive prototypes. In AAAI Workshop on AI Approaches to Fraud Detection and Risk Management 9-13. AAAI Press, Menlo Park, CA. BUYSE, M., GEORGE, S. L., EVANS, S., GELLER, N. L., RANSTAM, J., SCHERRER,B., LESAFFRE,E., MURRAY,G., EDLER, L., HUTTON, J., COLTON, T., LACHENBRUCH,P. and VERMA, B. L. (1999). The role of biostatistics in the prevention, detection and treatment of fraud in clinical trials. Statistics in Medicine 18 3435-3451. CAHILL, M. H., LAMBERT,D., PINHEIRO,J. C. and SUN, D. X. (2002). Detecting fraud in the real world. In Handbook of Massive Datasets (J. Abello, P. M. Pardalos and M. G. C. Resende, eds.). Kluwer, Dordrecht. CHAN, P. K., FAN, W., PRODROMIDIS,A. L. and STOLFO, S. J. (1999). Distributed data mining in credit card fraud detection. IEEE Intelligent Systems 14(6) 67-74. CHAN, P. and STOLFO, S. (1998). Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining 164-168. AAAI Press, Menlo Park, CA. CHARTIER, B. and SPILLANE, T. (2000). Money laundering detection with a neural network. In Business Applications of Neural Networks (P. J. G. Lisboa, A. Vellido and B. Edisbury, eds.) 159-172. World Scientific, Singapore. CHHIKARA, R. S. and MCKEON, J. (1984). Linear discriminant analysis with misallocation in training samples. J. Amer. Statist. Assoc. 79 899-906. CLARK, P. and NIBLETT, T. (1989). The CN2 induction algorithm. Machine Learning 3 261-285. COHEN, W. (1995). Fast effective rule induction. In Proceedings of the 12th International Conference on Machine Learning 115123. Morgan Kaufmann, Palo Alto, CA. CORTES, C., FISHER, K., PREGIBON, D. and ROGERS, A. (2000). Hancock: A language for extracting signatures from data streams. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 9-17. ACM Press, New York.
speed of detection matters, measures such as average time to detection after fraud starts (in minutes, numbers of transactions, etc.) should also be reported. Measures of this aspect interact with measures of final detection rate: in many situations an account, telephone and so forth, will have to be used for several fraudulent transactions before it is detected as fraudulent, so that several false negative classifications will necessarily be made. An appropriate overall strategy is to use a graded system of investigation. Accounts with very high suspicion scores merit immediate and intensive (and expensive) investigation, while those with large but less dramatic scores merit closer (but not expensive) observation. Once again, it is a matter of choosing a suitable compromise. Finally, it is worth repeating the conclusions reached by Schonlau et al. (2001), in the context of statistical tools for computer intrusion detection: "statistical methods can detect intrusions, even in difficult circumstances," but also "many challenges and opportunities for statistics and statisticians remain." We believe this positive conclusion holds more generally. Fraud detection is an important area, one in many ways ideal for the application of statistical and data analytic tools, and one where statisticians can make a very substantial and important contribution. ACKNOWLEDGMENT The work of Richard Bolton was supported by a ROPA award from the Engineering and Physical Sciences Research Council of the United Kingdom. REFERENCES
ALESKEROV,E., FREISLEBEN, B. and RAO, B. (1997). CARDWATCH: A neural network based database mining system for credit card fraud detection. In Computational Intelligence for Financial Engineering. Proceedings of the IEEE/IAFE 220226. IEEE, Piscataway, NJ. ALLEN, T. (2000). A day in the life of a Medicaid fraud statistician. Stats 29 20-22. ANDERSON, D., FRIVOLD, T. and VALDES, A. (1995). Next-
generationintrusiondetectionexpertsystem (NIDES):A summary. Technical Report SRI-CSL-95-07, Computer Science Laboratory, SRI International, Menlo Park, CA. ANDREWS, P. P. and PETERSON, M. B., eds. (1990). Criminal Intelligence Analysis. Palmer Enterprises, Loomis, CA. ARTIS, M., AYUSO, M. and GUILLtN, M. (1999). Modelling different types of automobile insurance fraud behaviour in the Spanish market. Insurance Mathematics and Economics 24 67-81.
STATISTICAL FRAUD DETECTION D. C. CORTES, and PREGIBON, (1998). Giga-mining.In Proceedthe Fourth International Conference on Knowledge ings of Discovery and Data Mining 174-178. AAAI Press, Menlo Park,CA. D. C. CORTES, PREGIBON, and VOLINSKY, (2001). CommuC, nities of interest.LectureNotes in Comput.Sci. 2189 105-114. Cox, K. C., EICK, S. G. and WILLS,G. J. (1997). Visual data mining:Recognizingtelephonecalling fraud.Data Miningand KnowledgeDiscovery 1 225-231. CSIDS (1999). Cisco secure intrusion detection system technical overview. Available at http://www.wheelgroup.com/ warp/public/cc/cisco/mkt/security/nranger/tech/ntran_tc.htm. D. DENNING, E. (1997). Cyberspaceattacksandcountermeasures. In InternetBesieged (D. E. Denning and P. J. Denning, eds.) 29-55. ACM Press, New York. C. J. DORRONSORO, R., GINEL,F., SANCHEZ, and CRUZ,C. S. (1997). Neural frauddetection in credit card operations.IEEE Transactionson Neural Networks8 827-834.
FANNING, K., COGGER, K. 0. and SRIVASTAVA,R. (1995).
247
system: Finding financial crimes in a large database of cash transactions.In Agent Technology: Foundations,Applications, and Markets(N. Jenningsand M. Wooldridge,eds.) 283-302. Springer,Berlin. GREEN, B. P. and CHOI, J. H. (1997). Assessing the risk of management fraud through neural network technology. Auditing16 14-28. HAND, D. J. (1981). Discrimination and Classification. Wiley, Chichester. HAND, D. J. (1997). Constructionand Assessment of Classification Rules. Wiley, Chichester. G. HAND,D. J. andBLUNT, (2001). Prospectingfor gems in credit carddata. IMAJournal of ManagementMathematics12 173200.
HAND, D. J., BLUNT, G., KELLY, M. G. and ADAMS, N. M.
(2000). Data mining for fun and profit (with discussion). Statist.Sci. 15 111-131.
HAND, D. J. and HENLEY, W. E. (1997). Statistical classification
Detection of managementfraud:A neural network approach. International Journal of Intelligent Systems in Accounting, Finance and Management4 113-126.
FAWCETT,T. and PROVOST,F. (1997a). Adaptive fraud detection.
and machine learning for effective fraud detection. In AAAI Workshopon AI Approaches to Fraud Detection and Risk Management14-19. AAAI Press, Menlo Park,CA. T. F. FAWCETT, and PROVOST, (1999). Activity monitoring: Noticing interestingchanges in behavior.In Proceedingsof the FifthACM SIGKDDInternationalConferenceon Knowledge Discovery and Data Mining 53-62. ACM Press, New York.
FORREST, S., HOFMEYR, S., SOMAYAJI,A. and LONGSTAFF,T.
methods in consumercredit scoring: A review. J. Roy. Statist. Soc. Ser A 160 523-541. HASSIBI,K. (2000). Detecting payment card fraud with neural networks. In Business Applications of Neural Networks (P. J. G. Lisboa, A. Vellido and B. Edisbury,eds.). WorldScientific, Singapore. W. HE, H., GRACO, and YAO,X. (1999). Applicationof genetic algorithm and k-nearest neighbour method in medical fraud detection.LectureNotes in Comput.Sci. 1585 74-81. Springer, Berlin.
HE, H. X., WANG, J. C., GRACO, W. and HAWKINS, S. (1997).
(1996). A sense of self for UNIX processes. In Proceedings of the 1996 IEEE Symposiumon Securityand Privacy 120-128. IEEEComputerSociety Press, Silver Spring,MD. D. GHOSH,S. and REILLY, L. (1994). Creditcardfrauddetection with a neural network. In Proceedings of the 27th Hawaii InternationalConferenceon SystemSciences (J. F Nunamaker and R. H. Sprague,eds.) 3 621-630. IEEE ComputerSociety Press, Los Alamitos, CA. B. GLASGOW, (1997). Risk and fraud in the insurance industry. In AAAIWorkshop AI Approachesto Fraud Detection and on Risk Management20-21. AAAI Press, Menlo Park,CA. H. T. dataGOLDBERG, and SENATOR, E. (1995). Restructuring bases for knowledge discovery by consolidation and link formation. In Proceedings of the First InternationalConference on Knowledge Discovery and Data Mining 136-141. AAAI Press, Menlo Park,CA.
GOLDBERG, H. and SENATOR, T. E. (1997). Break detection
Applicationof neuralnetworksto detection of medical fraud. ExpertSystemswithApplications13 329-336. HILL,T. P. (1995). A statisticalderivationof the significant-digit law. Statist.Sci. 10 354-363. J. HYNNINEN, (2000). Experiencesin mobile phone fraud.Seminar on Network Security.ReportTik-110.501, Helsinki Univ. Technology. JENKINS,P. (2000). Getting smart with fraudsters. Financial Times,September23. JENSEN,D. (1997). Prospective assessment of AI technologies for fraud detection: a case study. In AAAI Workshopon AI Approachesto FraudDetection and Risk Management34-38. AAAI Press, Menlo Park,CA. Ju, W.-H. and VARDI,Y. (2001). A hybrid high-orderMarkov chain model for computer intrusion detection. J. Comput. Graph.Statist. 10 277-295.
KIRKLAND, J. D., SENATOR,T. E., HAYDEN, J. J., DYBALA, T., GOLDBERG, H. G. and SHYR, P. (1998). The NASD regula-
tion advanceddetection system (ADS). In Proceedings of the 15th National Conferenceon ArtificialIntelligence (AAAI-98) and of the 10th Conferenceon InnovativeApplicationsof Artificial Intelligence (IAAI-98)1055-1062. AAAI Press, Menlo Park,CA.
KOSORESOW, A. P. and HOFMEYR, S. A. (1997). Intrusion
systems. In AAAI Workshopon AI Approaches to Fraud Detection and Risk Management22-28. AAAI Press, Menlo Park,CA.
for misuse intrusion detection. In Proceedings of the 17th National Computer SecurityConference11-21.
LACHENBRUCH, P. A. (1966). Discriminant analysis when the
248
R. J. BOLTONAND D. J. HAND
PATIENT, S. (2000). Reducing online credit card fraud. Journal. Available at http://www. Web Developer's webdevelopersjoumal.com/articles/card_fraud.html PRESS, S. J. and TANUR, J. M. (2001). The Subjectivity of
tial samples are misclassified. II: Non-randommisclassifica16 tion models. Technometrics 419-424. C. LANE,T. and BRODLEY, E. (1998). Temporalsequence learning and data reductionfor anomaly detection. In Proceedings and Communications of the 5th ACMConferenceon Computer Security(CCS-98) 150-158. ACM Press, New York. LEE, W. and STOLFO,S. (1998). Data mining approaches for intrusiondetection.In Proceedingsof the 7th USENIX Security Symposium,San Antonio, TX 79-93. USENIX Association, Berkeley,CA. K. LEONARD, J. (1993). Detecting credit card fraudusing expert and IndustrialEngineering25 103-106. systems. Computers
GRAF, I., HAINES, J., LIPPMANN, R., FRIED, D., KENDALL, K., MCCLUNG, D., WEBER, D., WEBSTER, S., WYSCHOGROD, D., CUNNINGHAM, R. and ZISSMAN, M. (2000). Evaluating intrusion detection systems: The 1998 DARPA off-line intrusion-detection evaluation. Unpublished manuscript, MIT Lincoln Laboratory. MAJOR, J. A. and RIEDINGER, D. R. (1992). EFD: A hybrid knowledge/statistical-based system for the detection of fraud.
ings of the Sixth InternationalConferenceon NetworkProtocols 62-70. IEEE Computer Society Press, Los Alamitos, CA. QUINLAN, J. R. (1990). Learning logical definitions from relations. Machine Learning 5 239-266.
InternationalJournalof IntelligentSystems7 687-703. D. MARCHETTE, J. (2001). ComputerIntrusion Detection and Network Monitoring:A Statistical Viewpoint.Springer,New
York. MCCARTHY, J. (2000). Phenomenal data mining. Comm. ACM 43
ceedings of the FifthACMSIGKDDInternationalConference on Knowledge Discovery and Data Mining 409-413. ACM
Press, New York. RYAN, J., LIN, M. and MIIKKULAINEN, R. (1997). Intrusion detection with neural networks. In AAAI Workshop on AI
75-79.
MCLACHLAN, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York. MOBILE EUROPE (2000). New IP world, new dangers. Mobile Europe, March. MOREAU, Y., PRENEEL, B., BURGE, P., SHAWE-TAYLOR,J., STOERMANN, C. and COOKE, C. (1996). Novel techniques for fraud detection in mobile communications. In ACTS Mobile
Summit,Grenada.
MOREAU, Y., VERRELST, H. and VANDEWALLE,J. (1997). Detection of mobile phone fraud using supervised neural networks: A first prototype. In Proceedings of 7th International
Statist.Sci. 16 58-74. Detecting masquerades. T. SENATOR, E. (2000). Ongoing managementand application of discovered knowledge in a large regulatoryorganization: A case study of the use and impact of NASD regulation's advanced detection system (ADS). In Proceedings of the SixthACM SIGKDDInternationalConferenceon Knowledge Discovery and Data Mining 44-53. ACM Press, New York.
SENATOR, T. E., GOLDBERG, H. G., WOOTON, J., COTTINI, M. A., UMAR KHAN, A. F., KLINGER, C. D., LLA-
Conferenceon ArtificialNeural Networks(ICANN'97) 10651070. Springer, Berlin. MURAD, U. and PINKAS, G. (1999). Unsupervised profiling for identifying superimposed fraud. Principles of Data Mining
and KnowledgeDiscovery. LectureNotes in ArtificialIntelligence 1704 251-261. Springer, Berlin. NEURAL TECHNOLOGIES(2000). Reducing telecoms fraud and chur. Report, Neural Technologies, Ltd., Petersfield, U.K. NIGRINI, M. J. (1999). I've got your number. Journal of Accountancy May 79-83. NIGRINI, M. J. and MITTERMAIER, L. J. (1997). The use of Benford's law as an aid in analytical procedures. Auditing: A
STATISTICAL FRAUD DETECTION SMITH,R. L. (1997). Comment on "Statistics for exceptional athletics records,"by M. E. Robinson and J. A. Tawn. Appl. Statist.46 123-128.
STOLFO, S. J., FAN, D. W., LEE, W., PRODROMIDIS,A. L. and
249
CHAN,P. K. (1997a). Creditcard frauddetection using metaon learning: Issues and initial results. In AAAI Workshop AI Approachesto FraudDetection and Risk Management83-90. AAAI Press, Menlo Park,CA.
STOLFO, S., FAN, W., LEE, W., PRODROMIDIS, A. L. and
CHAN,P. (1999). Cost-basedmodeling for fraudand intrusion detection:Resultsfromthe JAMProject.In Proceedingsof the DARPAInformationSurvivabilityConferenceand Exposition 2 130-144. IEEE ComputerPress, New York.
STOLFO, S. J., PRODROMIDIS,A. L., TSELEPIS, S., LEE, W.,
FAN, D. W. and CHAN,P. K. (1997b). JAM:Java agents for meta-learningover distributeddatabases. In AAAI Workshop on AI Approachesto Fraud Detection and Risk Management 91-98. AAAI Press, Menlo Park,CA.
(1998). Fraud detection in communication networks using neural and probabilisticmethods. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'98)2 1241-1244. IEEEComputer Society Press, Silver Spring,MD. U.S. CONGRESS (1995). Informationtechnologies for the control of money laundering.Office of Technology Assessment, Report OTA-ITC-630,U.S. GovernmentPrinting Office, Washington, DC. S. K. WASSERMAN, and FAUST, (1994). Social NetworkAnalysis: Methodsand Applications.CambridgeUniv. Press. WEBB, A. R. (1999). Statistical Pattern Recognition. Arnold, London.
WHEELER, R. and AITKEN, S. (2000). Multiple algorithms
Comment
Foster Provost
The state of research on fraud detection recalls John Godfrey Saxe's 19th-century poem "The Blind Men and the Elephant" (Felleman, 1936, page 521). Based on a Hindu fable, each blind man experiences only a part of the elephant, which shapes his opinion of the nature of the elephant: the leg makes it seem like a tree, the tail a rope, the trunk a snake and so on. In fact, "... though each was partly in the right... all were in the wrong." Saxe's poem was a criticism of theological debates, and I do not intend such a harsh criticism of research on fraud detection. However, because the problem is so complex, each research project takes a particular angle of attack, which often obscures the view of other parts of the problem. So, some researchers see the problem as one of classification, others of temporal pattern discovery; to some it is a problem perfect for a hidden Markov model and so on. So why is fraud detection not simply classification or a member of some other already well-understood problem class? Bolton and Hand outline several characteristics of fraud detection problems that differentiate them [as did Tom Fawcett and I in our review of the problems and techniques of fraud detection (FawFoster Provost is Associate Professor, Leonard N. Stern School of Business, New York University, New York, New York10012 (e-mail: [email protected]). cett and Provost, 2002)]. Consider fraud detection as a classification problem. Fraud detection certainly must be "cost-sensitive"-rather than minimizing error rate, some other loss function must be minimized. In addition, usually the marginal class distribution is skewed strongly toward one class (legitimate behavior). Therefore, modeling for fraud detection at least is a difficult problem of estimating class membership probability, rather than simple classification. However, this still is an unsatisfying attempt to transform the true problem into one for which we have existing tools (practical and conceptual). The objective function for fraud detection systems actually is much more complicated. For example, the value of detection is a function of time. Immediate detection is much more valuable than delayed detection. Unfortunately, evidence builds up over time, so detection is easier the longer it is delayed. In cases of self-revealing fraud, eventually, detection is trivial (e.g., a defrauded customer calls to complain about fraudulent transactions on his or her bill). In most research on modeling for fraud detection, a subproblem is extracted (e.g., classifying transactions or accounts as being fraudulent) and techniques are compared for solving this subproblem-without moving on to compare the techniques for the greater problem of detecting fraud. Each particular subproblem naturally will abstract away those parts that are